SQL Server – Table Spool – Causation – Indexes on Non-Persisted Columns (along with an Output Clause)

Background

Table Spools is a fact of Life in SQL Server.

One of the cases that I recently discovered is when one has an index on a non-persisted computed column and one introduces an Output Clause to capture the effects of the change.

 

Create Tables

if DB_ID('Demo') is null
begin

    exec('create database [Demo]');

end
go

USE [Demo]
GO

if OBJECT_ID('accountActivity') is not null
begin

    drop table [accountActivity]

end
go

if OBJECT_ID('accountActivityPersisted') is not null
begin

    drop table [accountActivityPersisted]

end
go

CREATE TABLE [dbo].[accountActivity]
(
	[AccountID] [bigint] NOT NULL , --IDENTITY(1,1) NOT NULL,
	[AccountCreationDate] [datetime] NULL,
	[TransactionDate] [datetime] NULL,
	[ageCalc]  
            AS (datediff(day,[AccountCreationDate],[TransactionDate])),
     CONSTRAINT PK_AccountActivity PRIMARY KEY CLUSTERED 
    (
	    [AccountID] ASC
    )
) 
ON [PRIMARY]
GO

CREATE NONCLUSTERED INDEX [idx_AgeCalc] 
ON [dbo].[accountActivity] 
(
	[ageCalc] ASC
)
go

CREATE TABLE [dbo].[accountActivityPersisted]
(
	[AccountID] [bigint] NOT NULL , --IDENTITY(1,1) NOT NULL,
	[AccountCreationDate] [datetime] NULL,
	[TransactionDate] [datetime] NULL,
	[ageCalc]  AS 
          (datediff(day,[AccountCreationDate],[TransactionDate])) 
           persisted  ,
    CONSTRAINT PK_AccountActivityPersisted PRIMARY KEY CLUSTERED 
    (
	    [AccountID] ASC
    )
) 
ON [PRIMARY]
GO

CREATE NONCLUSTERED INDEX [idx_AgeCalc] 
ON [dbo].[accountActivityPersisted] 
(
	[ageCalc] ASC
)
go

</code></pre>
<h2></h2>
<h1>Test SQL</h1>
<pre><code>

use [Demo]
go

set statistics io on
set nocount on
go

begin tran

    DECLARE @Results AS TABLE
    (
      AccountID BIGINT
    )

    declare	
           @AccountID bigint
	 , @AccountCreationDate datetime
    	 , @TransactionDate datetime

    set @AccountID = RAND() * 1E10
    set @AccountCreationDate = '1/1/2001'
    set @TransactionDate = GETUTCDATE()

    Insert into dbo.[accountActivity]
    (
         [AccountID]
       , [AccountCreationDate]
       , [TransactionDate]

   )
   OUTPUT 
      INSERTED.AccountID

  INTO @Results		
   (
       AccountID
   )
   select
              @AccountID
	    , @AccountCreationDate
	    , @TransactionDate

   Insert into dbo.[accountActivityPersisted]
   (
        [AccountID]
      , [AccountCreationDate]
      , [TransactionDate]
   )
   OUTPUT 
      INSERTED.AccountID
   INTO @Results		
   (
       AccountID
   )
   select
             @AccountID
           , @AccountCreationDate
	   , @TransactionDate

rollback tran

Execution Plan

  • On a scale of 100, the cost of maintaining an index when the columns are not persisted is about 59%
  • Whereas for non-persisted columns, it is 41%

Table Spool - Index on Non-Persisted Columns

Statistics I/O

  • The worktable maintenance cost is quite a bit 

Non-Persisted Columns



Table '#744F2D60'. Scan count 0, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 2, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'accountActivity'. Scan count 0, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Persisted Columns


Table '#744F2D60'. Scan count 0, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'accountActivityPersisted'. Scan count 0, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Screen Shot:
Table Spool - Index on Non-Persisted Columns (Statistics IO)

 

Moral of the Story

Persist your computed columns.

 

References

Microsoft – SQL Server – Table Inserts – Table Spool [Eager Spool] – Is misaligned partitioning really the issue?

This is a follow-up to a couple of other posts:

Table Design

Create Database – demo


  if db_id('Demo') is not null
  begin

     create database [Demo];

  end
  go

Define Partition Scheme and Function

use [Demo]
go

if not exists   
                (

                    select *
                    from   sys.partition_functions
                    where  name = 'pfnYear'

                )
begin

    print 'Creating partition function pfnYear(smallint) ...'

    create partition function pfnYear(smallint)
    as range right for values
    (
          2010   
        , 2011
        , 2012
        , 2013
        , 2014
        , 2015

    )

        print 'Created partition function pfnYear(smallint) '

end

go

if not exists   
                (

                    select *
                    from   sys.partition_schemes
                    where  name = 'partitionSchemeYear'

                )
begin

    print 'Creating partition scheme partitionSchemeYear  ...'

        create partition scheme partitionSchemeYear 
        as partition pfnYear
        ALL TO ([default])

    print 'Created partition scheme partitionSchemeYear'

end

Define Table with misaligned index partition

To satisfy half-way correction, we will do the following:

  • Create a new table dbo.customer
  • The Primary Key will be a Non-clustered Index and it will be named PK_customer and it will be partitioned
  • We will create a Clustered Index (idx_DBA_pf_customerIDChecksumModulo20__customerID)
  • An Index on InsertTime (idx_InsertTime) will not be partitioned
  • An Index on regionName (idx_Region) will not be partitioned


/*

    exec sp_helpindex 'dbo.customer'
*/

if OBJECT_ID('dbo.customer') is not null
begin
    drop table [dbo].[customer] 
end    
go	

CREATE TABLE [dbo].[customer]
(

	[customerID] [varchar](30) NOT NULL,
	[regionName] [varchar](50) NULL,
	[InsertTime] [datetime] CONSTRAINT defaultCustomerInsertTime DEFAULT (getutcdate()) NULL,
	[salesDate] [datetime] not null constraint defaultCustomerSalesDate DEFAULT (getutcdate()),
	[SalesYear] as cast(isNull(datepart(year,[salesDate]), 1990) as smallint) persisted not null,

		CONSTRAINT [PK_customer] PRIMARY KEY NONCLUSTERED 
		(
			  [customerID] ASC
			, [salesYear]
		)
		WITH (
		              PAD_INDEX  = OFF
		            , STATISTICS_NORECOMPUTE  = OFF
		            , IGNORE_DUP_KEY = OFF
		            , ALLOW_ROW_LOCKS  = ON
		            , ALLOW_PAGE_LOCKS  = ON
		            , FILLFACTOR = 50
            ) 
		ON [partitionSchemeYear]([SalesYear])
) ON [partitionSchemeYear]([SalesYear])

GO

CREATE CLUSTERED INDEX [idx_pfnYear_SalesYear__customerID] 
ON [dbo].[customer] 
(
	[salesYear] ASC,
	[customerID] ASC
)
WITH 
(
      PAD_INDEX  = OFF
    , STATISTICS_NORECOMPUTE  = OFF
    , SORT_IN_TEMPDB = OFF
    , IGNORE_DUP_KEY = OFF
    , DROP_EXISTING = OFF
    , ONLINE = OFF
    , ALLOW_ROW_LOCKS  = ON
    , ALLOW_PAGE_LOCKS  = ON
    , FILLFACTOR = 50
) 
ON [partitionSchemeYear]([SalesYear])
GO

CREATE NONCLUSTERED INDEX [idx_InsertTime] 
ON [dbo].[customer] 
    (
        [InsertTime] ASC
    )
    INCLUDE ( [customerID]) 
    WITH (

              ONLINE = ON
            , ALLOW_ROW_LOCKS  = ON
            , ALLOW_PAGE_LOCKS  = ON
          ) 
          ON [PRIMARY]

GO

CREATE NONCLUSTERED INDEX [idx_Region] 
ON [dbo].[customer] 
    (
        [regionName] ASC
    )
    INCLUDE ( [customerID]) 
    WITH (

              ONLINE = ON
            , ALLOW_ROW_LOCKS  = ON
            , ALLOW_PAGE_LOCKS  = ON
          ) 
          ON [PRIMARY]

GO

Insert into Misaligned partition



set statistics time off
set statistics io off

truncate table [dbo].[customer];
go

set statistics profile on; 
set statistics time off;   
set statistics io on;
go

insert into [dbo].[customer]
(
    [customerID], [salesDate]
)
values
(
   'A1', GETUTCDATE()
)

Results of misalignment:

Misaligned Index - Full

Misaligned Index - Full - Statistics IO

Explanation:

  • Expensive Table Spool (Eager Spool) – Taking up 31%
  • Expensive Work Table using up 7 logical reads

Define Table with corrected index partition

To satisfy full correction, we will do the following:

  • Create a new table dbo.customerRevised
  • The Primary Key will be a Non-clustered Index and it will be named PK_customerRevised and it will be partitioned
  • We will create a Clustered Index (idx_DBA_pf_customerIDChecksumModulo20__customerID)
  • An Index on InsertTime (idx_InsertTime) will be partitioned
  • An Index on regionName (idx_Region) will be partitioned


/*
    exec sp_helpindex 'dbo.customerRevised'
*/

if OBJECT_ID('dbo.customerRevised') is not null
begin
    drop table [dbo].[customerRevised] 
end    
go	

CREATE TABLE [dbo].[customerRevised]
(

	[customerID] [varchar](30) NOT NULL,
	[regionName] [varchar](50) NULL,
	[InsertTime] [datetime] 
             CONSTRAINT defaultcustomerRevisedInsertTime 
                DEFAULT (getutcdate()) NULL,
	[salesDate] [datetime] not null 
             constraint defaultcustomerRevisedSalesDate 
                DEFAULT (getutcdate()),
	[SalesYear] as 
              cast(isNull(datepart(year,[salesDate]), 1990) as smallint) 
                  persisted not null,

		CONSTRAINT [PK_customerRevised] PRIMARY KEY NONCLUSTERED 
		(
			  [customerID] ASC
			, [salesYear]
		)
		WITH (
		              PAD_INDEX  = OFF
		            , STATISTICS_NORECOMPUTE  = OFF
		            , IGNORE_DUP_KEY = OFF
		            , ALLOW_ROW_LOCKS  = ON
		            , ALLOW_PAGE_LOCKS  = ON
		            , FILLFACTOR = 50
            ) 
		ON [partitionSchemeYear]([SalesYear])
) ON [partitionSchemeYear]([SalesYear])

GO

CREATE CLUSTERED INDEX [idx_DBA_pf_customerIDChecksumModulo20__customerID] 
ON [dbo].[customerRevised] 
(
	[salesYear] ASC,
	[customerID] ASC
)
WITH 
(
      PAD_INDEX  = OFF
    , STATISTICS_NORECOMPUTE  = OFF
    , SORT_IN_TEMPDB = OFF
    , IGNORE_DUP_KEY = OFF
    , DROP_EXISTING = OFF
    , ONLINE = OFF
    , ALLOW_ROW_LOCKS  = ON
    , ALLOW_PAGE_LOCKS  = ON
    , FILLFACTOR = 50
) 
ON [partitionSchemeYear]([SalesYear])
GO

CREATE NONCLUSTERED INDEX [idx_InsertTime] 
ON [dbo].[customerRevised] 
    (
        [InsertTime] ASC
    )
    INCLUDE ( [customerID]) 
    WITH (

              ONLINE = ON
            , ALLOW_ROW_LOCKS  = ON
            , ALLOW_PAGE_LOCKS  = ON
          ) 
          --ON [PRIMARY]
          ON [partitionSchemeYear]([SalesYear])

GO

CREATE NONCLUSTERED INDEX [idx_Region] 
ON [dbo].[customerRevised] 
    (
        [regionName] ASC
    )
    INCLUDE ( [customerID]) 
    WITH (

              ONLINE = ON
            , ALLOW_ROW_LOCKS  = ON
            , ALLOW_PAGE_LOCKS  = ON
          ) 
          --ON [PRIMARY]
          ON [partitionSchemeYear]([SalesYear])
GO


Misalignment Correction (Insert Statement)


set statistics time off
set statistics io off

truncate table [dbo].[customer];
truncate table [dbo].[customerRevised];
go

set statistics profile on; 
set statistics time off;   
set statistics io on;
go

insert into [dbo].[customer]
(
    [customerID], [salesDate]
)
values
(
   'A1', GETUTCDATE()
)

insert into [dbo].[customerRevised]
(
    [customerID], [salesDate]
)
values
(
    'B1', getutcdate()
)

Misalignment Correction:

Misaligned Index - Correction - Full

Misalignmed Index - Correction - Statistics IO

Explanation:

  • With proper partition correction we now have a much cleaner Plan (At a base of 100% -the original plan costed 57% and the revised plan costed 43%)
  • We have shed the 7 logical reads attributed to the “worktable”

Define Table with half-way corrected index partition

To satisfy half-way correction, we will do the following:

  • Create a new table dbo.customerRevisedHalfway
  • The Primary Key will be a Non-clustered Index and it will be named PK_customerRevisedHalfway and it will be partitioned
  • We will create a Clustered Index (idx_DBA_pf_customerIDChecksumModulo20__customerID)
  • An Index on InsertTime (idx_InsertTime) will be partitioned
  • An Index on regionName (idx_Region) will not be partitioned


/*

    exec sp_helpindex 'dbo.customerRevisedHalfway'
*/

if OBJECT_ID('dbo.customerRevisedHalfway') is not null
begin
    drop table [dbo].[customerRevisedHalfway] 
end    
go	

CREATE TABLE [dbo].[customerRevisedHalfway]
(

	[customerID] [varchar](30) NOT NULL,
	[regionName] [varchar](50) NULL,
	[InsertTime] [datetime] 
            CONSTRAINT defaultcustomerRevisedHalfwayInsertTime 
                DEFAULT (getutcdate()) NULL,
	[salesDate] [datetime] not null 
            constraint defaultcustomerRevisedHalfwaySalesDate
                DEFAULT (getutcdate()),
	[SalesYear] as 
            cast(isNull(datepart(year,[salesDate]), 1990) as smallint) 
               persisted not null,

		CONSTRAINT [PK_customerRevisedHalfway] PRIMARY KEY
                 NONCLUSTERED 
		(
			  [customerID] ASC
			, [salesYear]
		)
		WITH (
		              PAD_INDEX  = OFF
		            , STATISTICS_NORECOMPUTE  = OFF
		            , IGNORE_DUP_KEY = OFF
		            , ALLOW_ROW_LOCKS  = ON
		            , ALLOW_PAGE_LOCKS  = ON
		            , FILLFACTOR = 50
            ) 
		ON [partitionSchemeYear]([SalesYear])
) ON [partitionSchemeYear]([SalesYear])

GO

CREATE CLUSTERED INDEX [idx_DBA_pf_customerIDChecksumModulo20__customerID] 
ON [dbo].[customerRevisedHalfway] 
(
	[salesYear] ASC,
	[customerID] ASC
)
WITH 
(
      PAD_INDEX  = OFF
    , STATISTICS_NORECOMPUTE  = OFF
    , SORT_IN_TEMPDB = OFF
    , IGNORE_DUP_KEY = OFF
    , DROP_EXISTING = OFF
    , ONLINE = OFF
    , ALLOW_ROW_LOCKS  = ON
    , ALLOW_PAGE_LOCKS  = ON
    , FILLFACTOR = 50
) 
ON [partitionSchemeYear]([SalesYear])
GO

CREATE NONCLUSTERED INDEX [idx_InsertTime] 
ON [dbo].[customerRevisedHalfway] 
    (
        [InsertTime] ASC
    )
    INCLUDE ( [customerID]) 
    WITH (

              ONLINE = ON
            , ALLOW_ROW_LOCKS  = ON
            , ALLOW_PAGE_LOCKS  = ON
          ) 
          --ON [PRIMARY]
          ON [partitionSchemeYear]([SalesYear])

GO

--index is pointing @ default file group
CREATE NONCLUSTERED INDEX [idx_Region] 
ON [dbo].[customerRevisedHalfway] 
    (
        [regionName] ASC
    )
    INCLUDE ( [customerID]) 
    WITH (

              ONLINE = ON
            , ALLOW_ROW_LOCKS  = ON
            , ALLOW_PAGE_LOCKS  = ON
          ) 
          ON [PRIMARY]
          --ON [partitionSchemeYear]([SalesYear])

GO

Insert into Half way aligned partition


set statistics time off
set statistics io off

truncate table [dbo].[customer];
truncate table [dbo].[customerRevised];
truncate table [dbo].[customerRevisedHalfway];
go

set statistics profile on; 
set statistics time off;   
set statistics io on;
go

insert into [dbo].[customer]
(
    [customerID], [salesDate]
)
values
(
   'A1', GETUTCDATE()
)

insert into [dbo].[customerRevised]
(
    [customerID], [salesDate]
)
values
(
    'B1', getutcdate()
)

insert into [dbo].[customerRevisedHalfway]
(
  [customerID], [salesDate]
)
values
(
  'C1', getutcdate()
)

Misalignment Correction (Halfway):

Misaligned Index - Correction - Halfway

Statistics I/O

Misaligned Index - Correction - Halfway (Statistics IO)

Explanation:

  • The Query cost of misalignment is at 40%, fully corrected alignment is at 30%, and halfway correction is at 30%
  • When partitioning is fully corrected and when only halfway corrected we still shed the same amount of logical reads (7); directly attributable to the “worktable”
  • For the half way corrected partitioned index, you will see two distinct index inserts (the first one for the Clustered index and the secondary one for the non-partitioned aligned index); note that the half way corrected case is pointing at two different file groups (different partitions)
  • Whereas for the fully corrected index, we have a single index operator (all indexes are tied into the same partition)

CodeBase

To share the code, I have posted it to my GitHub Account.

Conclusion

Eager Spool

ScaryDBA in his web post,  Spools in Execution Plan ( http://www.scarydba.com/2009/09/09/spools-in-execution-plans/), did a good job describing this Table Spool thing.

Spool operations are temporary storage of the data for later reuse in a query plan. There are two types of spool operations, eager spool and lazy spool. A spool is basically a temporary table created within the execution of the query that is used when it’s likely that data will be needed again, and again during the execution of the query. This is not an explicit #temp temporary table, but a work table for operations within the processing necessary for a given query’s behavior. A spool is created when the optimizer thinks that it can work better with a semi-permanent sub-set of data rather than have to perform multiple seeks or scans against a table or index or in other places where data re-use is important (more in a bit).

Seemingly, in our original table definition, with more than one misaligned index, we ended up with a situation where post the “Clustered Index” operation, the Query Engine opted to go the route of creating and populating “Table Spools”.

It is not immediately clear at this point why the Query Engine chose this fairly in-efficient path.

Addendum

Addendum (2013-04-27)

The original incident was captured on MS SQL Server – v2008R2.

I have revised per v2012.

Execution Plan

TableSpool__MSSQLServer_v2012

IO Cost

TableSpool__MSSQLServer_v2012_IOStats

So Microsoft’s documentation did a fairly good job on describing what a Table Spool is:

http://msdn.microsoft.com/en-us/library/ms181032(v=sql.105).aspx
The Table Spool operator scans the input and places a copy of each row in a hidden spool table that is stored in the tempdb database and existing only for the lifetime of the query. If the operator is rewound (for example, by a Nested Loops operator) but no rebinding is needed, the spooled data is used instead of rescanning the input.

But, I will allow you the Performance Engineer to look at your queries and understand which ones can potentially follow this path and guard against it when other paths can be a bit less pernicious.

Microsoft – SQL Server – Insert into a table leads to “Table Spool \ Eager Spool”

Background

You never quite know about MS SQL Server Query Engine.

Here we found one of our database systems running really slow.  We tried everything, but nothing lead to Isaac’s well.

The system was just too slow.

Looked at every query and nothing came up.

Finally settled on an insert statement.  Straight record insert, not even an insert into <table-destination> select * from <table-source>.

The query plan of a performant query looked like the query on top.  And, that of the troublesome query looked like the one below.

The non-performant query has a couple of items that I could not place:

  • Table Spool
  • Eager Spool

SQL plan

Table Inserts - SQL Plan - Table Spool - Eager Spool

The Statistic I/O for the non-performance Query Plan is also a bit out-of-sorts.  It had :

  • More I/O (Logical Reads)
  • A worktable read

Statistics I/O

Table Inserts - Causes Statisics IO

So hear I am, goggled until I am sore.

There is a bit of ‘cries’ on the Net that talks about what can cause Table Spool \ Eager Spool.

Microsoft definition for “Eager Spool” is:

Eager Spool Operator
http://msdn.microsoft.com/en-us/library/ms190435.aspx

The Eager Spool operator takes the entire input, storing each row in a hidden temporary object stored in the tempdb database. If the operator is rewound (for example, by a Nested Loops operator) but no rebinding is needed, the spooled data is used instead of rescanning the input. If rebinding is needed, the spooled data is discarded and the spool object is rebuilt by rescanning the (rebound) input.

The Eager Spool operator builds its spool file in an “eager” manner: when the spool’s parent operator asks for the first row, the spool operator consumes all rows from its input operator and stores them in the spool.

Eager Spool is a logical operator.

Remas Rusanu did a noteworthy job – “Getting rid of Table Spool in SQL Server Execution plan” — http://stackoverflow.com/questions/1931100/getting-rid-of-table-spool-in-sql-server-execution-plan :

I certainly understand if you cannot show us the query. But is really hard to guess why the spooling is preffered by the engine whithout knowing any specifics. I recommend you go over Craig Freedman’s blog, he is an engineer in the query optimizer team and has explained a lot of the inner workings of SQL 2005/2008 optimizer. Here are some entries I could quickly find that touch the topic of spooling in one form or another:

 

SQL customer support team also has an interesting blog at http://blogs.msdn.com/psssql/

And ‘sqltips’ (the relational engine’s team blog) has some tips, like Spool operators in query plan…
And, so I have a bit more familiarity with what can trigger Table Spool.

To me it can occur when one is querying data from a Common Table Expression or sorting data, and when the system is caching that data for later re-use.

So here I am thinking this is a simple insert / values sql.

What can possibly trigger need for caching and later consideration or usage.

Checked the following:

  • Triggers
  • Computed columns
  • Complex column defaults

Finally, checked the indexes as the operator that follows the Table Spool / Eager Spool operators is the “Index Insert”.

As we have two indexes, disabled one of them.

And, retested.

Table Inserts - SQL Plan - Table Spool - Eager Spool (Good)

 

Wow – I am loving Isaac’s well.

So what was the problem.  It appeared that Microsoft’s engine triggers “Table Spool” / “Eager Spool” combination when there are two or more indexes that are defined with different partition schemes.

In our case:

First Index : Clustered

    create clustered index idx_pf_seed on
    [dbo].[customer]
    on [partitionScheme]
    ([regionID])

Second Index: NonClustered

create nonclustered index idx_CustomerName
on [dbo].[customer]
(
  [insertTime] asc
)
;

Just a thought as there is so little documentation.

A few days later, I posted an addendum to this entry.  That entry is available @

Microsoft – SQL Server – Table Inserts – Table Spool [Eager Spool] – Explained

https://danieladeniji.wordpress.com/2013/03/22/microsoft-sql-server-table-inserts-table-spool-eager-spool-explained/

References: