SQL Server – Query Plan – When Sorts precede “Nested Loop” & “Merge” Joins


Reviewing  a Query Plan and saw a Sort Operator standing between an Index Seek and the Inner Join that joins that table with another table.

Let us find out why.


Query Plan

Here is what the Query Plan looks like:

Nested Loops


Merge Join


The Query

Here is what the Query looks like:

        declare @dateBegin datetime
        declare @dateEnd datetime

        set @dateBegin = '1/1/2015' 
        set @dateEnd   = '1/15/2015'

			, sp.ref_int_id
			, pd.productCD
			, pd.productID
			, pa.payment_date
			, pa.amount revenue

	from [dbo].StudentProduct sp

	join [dbo].ProductDetails pd 
		on  sp.schoolID   = pd.schoolID 
		and sp.productID  = pd.productID

	join [dbo].PaymentAccount pa 
		on  sp.student_int_id = pa.student_int_id 
		and sp.ref_int_id    = pa.ref_int_id

	and pa.payment_code not in

	where pa.payment_date between @dateBegin and @dateEnd


Query Plan – Indexed Used

Table – dbo.PaymentAccount

As we are filtering on dates…







Here are the main points

  1. Seek Predicates
    • Begin and End Dates (@dateBegin and @dateEnd)
  2. ( Additional ) predicates
    • payment_code
      • GIFTCARD


Table – dbo.StudentProduct




IndexSeek-Image (cropped)

Indexed Definition




Table – dbo.StudentProduct



Query Plan – Join – Merge Join





MergeJoin-Image (Cropped)



We are joining on two columns ( student_int_id and ref_int_id).


Facts Check

Nested Loop and Merge Joins require that the source tables be sorted along the lines of the columns being joined on.

In our sample, the index used for accessing the dbo.StudentProduct is PK_StudentProduct.  PK_StudentProduct is the primary key and it is a composite combination of two columns (student_int_id and ref_int_id).

On the other hand, the index used for accessing dbo.PaymentAccount is INDX_DBA_PaymentDate_PaymentCode and it is anchored  on PaymentDate and PaymentCode.

That index was chosen due to us filtering on payment date and paymentCode.

We are joining on student_int_id and ref_int_id, but unfortunately the dbo.PaymentAccount table returns us data based on a different column combination, the data that we will be judged unsorted and unsuitable for our subsequent Nested Loops \ Merge Join operations.

Prior to usage ( by the Join operator), that data will have to be sorted.


What does Microsoft Have to Say?

Here is a worthy write-up:

Join Type Level
Nested Loop Join Link
The nested loops join, also called nested iteration, uses one join input as the outer input table (shown as the top input in the graphical execution plan) and one as the inner (bottom) input table. The outer loop consumes the outer input table row by row. The inner loop, executed for each outer row, searches for matching rows in the inner input table.
In the simplest case, the search scans an entire table or index; this is called a naive nested loops join. If the search exploits an index, it is called an index nested loops join. If the index is built as part of the query plan (and destroyed upon completion of the query), it is called a temporary index nested loops join. All these variants are considered by the query optimizer.
A nested loops join is particularly effective if the outer input is small and the inner input is preindexed and large. In many small transactions, such as those affecting only a small set of rows, index nested loops joins are superior to both merge joins and hash joins. In large queries, however, nested loops joins are often not the optimal choice.
Merge Join Link
The merge join requires both inputs to be sorted on the merge columns, which are defined by the equality (ON) clauses of the join predicate. The query optimizer typically scans an index, if one exists on the proper set of columns, or it places a sort operator below the merge join.
Merge join itself is very fast, but it can be an expensive choice if sort operations are required. However, if the data volume is large and the desired data can be obtained presorted from existing B-tree indexes, merge join is often the fastest available join algorithm.
 Hash Join Link
Hash joins can efficiently process large, unsorted, nonindexed inputs. They are useful for intermediate results in complex queries because:
    Intermediate results are not indexed (unless explicitly saved to disk and then indexed) and often are not suitably sorted for the next operation in the query
    Query optimizers estimate only intermediate result sizes. Because estimates can be very inaccurate for complex queries, algorithms to process intermediate
results not only must be efficient, but also must degrade gracefully if an intermediate result turns out to be much larger than anticipated.





Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s