SQL Server – Table Spool – Causation – Indexes on Non-Persisted Columns (along with an Output Clause)

Background

Table Spools is a fact of Life in SQL Server.

One of the cases that I recently discovered is when one has an index on a non-persisted computed column and one introduces an Output Clause to capture the effects of the change.

 

Create Tables

if DB_ID('Demo') is null
begin

    exec('create database [Demo]');

end
go

USE [Demo]
GO

if OBJECT_ID('accountActivity') is not null
begin

    drop table [accountActivity]

end
go

if OBJECT_ID('accountActivityPersisted') is not null
begin

    drop table [accountActivityPersisted]

end
go

CREATE TABLE [dbo].[accountActivity]
(
	[AccountID] [bigint] NOT NULL , --IDENTITY(1,1) NOT NULL,
	[AccountCreationDate] [datetime] NULL,
	[TransactionDate] [datetime] NULL,
	[ageCalc]  
            AS (datediff(day,[AccountCreationDate],[TransactionDate])),
     CONSTRAINT PK_AccountActivity PRIMARY KEY CLUSTERED 
    (
	    [AccountID] ASC
    )
) 
ON [PRIMARY]
GO

CREATE NONCLUSTERED INDEX [idx_AgeCalc] 
ON [dbo].[accountActivity] 
(
	[ageCalc] ASC
)
go

CREATE TABLE [dbo].[accountActivityPersisted]
(
	[AccountID] [bigint] NOT NULL , --IDENTITY(1,1) NOT NULL,
	[AccountCreationDate] [datetime] NULL,
	[TransactionDate] [datetime] NULL,
	[ageCalc]  AS 
          (datediff(day,[AccountCreationDate],[TransactionDate])) 
           persisted  ,
    CONSTRAINT PK_AccountActivityPersisted PRIMARY KEY CLUSTERED 
    (
	    [AccountID] ASC
    )
) 
ON [PRIMARY]
GO

CREATE NONCLUSTERED INDEX [idx_AgeCalc] 
ON [dbo].[accountActivityPersisted] 
(
	[ageCalc] ASC
)
go

</code></pre>
<h2></h2>
<h1>Test SQL</h1>
<pre><code>

use [Demo]
go

set statistics io on
set nocount on
go

begin tran

    DECLARE @Results AS TABLE
    (
      AccountID BIGINT
    )

    declare	
           @AccountID bigint
	 , @AccountCreationDate datetime
    	 , @TransactionDate datetime

    set @AccountID = RAND() * 1E10
    set @AccountCreationDate = '1/1/2001'
    set @TransactionDate = GETUTCDATE()

    Insert into dbo.[accountActivity]
    (
         [AccountID]
       , [AccountCreationDate]
       , [TransactionDate]

   )
   OUTPUT 
      INSERTED.AccountID

  INTO @Results		
   (
       AccountID
   )
   select
              @AccountID
	    , @AccountCreationDate
	    , @TransactionDate

   Insert into dbo.[accountActivityPersisted]
   (
        [AccountID]
      , [AccountCreationDate]
      , [TransactionDate]
   )
   OUTPUT 
      INSERTED.AccountID
   INTO @Results		
   (
       AccountID
   )
   select
             @AccountID
           , @AccountCreationDate
	   , @TransactionDate

rollback tran

Execution Plan

  • On a scale of 100, the cost of maintaining an index when the columns are not persisted is about 59%
  • Whereas for non-persisted columns, it is 41%

Table Spool - Index on Non-Persisted Columns

Statistics I/O

  • The worktable maintenance cost is quite a bit 

Non-Persisted Columns



Table '#744F2D60'. Scan count 0, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 2, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'accountActivity'. Scan count 0, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Persisted Columns


Table '#744F2D60'. Scan count 0, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'accountActivityPersisted'. Scan count 0, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Screen Shot:
Table Spool - Index on Non-Persisted Columns (Statistics IO)

 

Moral of the Story

Persist your computed columns.

 

References

Microsoft – SQL Server – Table Inserts – Table Spool [Eager Spool] – Is misaligned partitioning really the issue?

This is a follow-up to a couple of other posts:

Table Design

Create Database – demo


  if db_id('Demo') is not null
  begin

     create database [Demo];

  end
  go

Define Partition Scheme and Function

use [Demo]
go

if not exists   
                (

                    select *
                    from   sys.partition_functions
                    where  name = 'pfnYear'

                )
begin

    print 'Creating partition function pfnYear(smallint) ...'

    create partition function pfnYear(smallint)
    as range right for values
    (
          2010   
        , 2011
        , 2012
        , 2013
        , 2014
        , 2015

    )

        print 'Created partition function pfnYear(smallint) '

end

go

if not exists   
                (

                    select *
                    from   sys.partition_schemes
                    where  name = 'partitionSchemeYear'

                )
begin

    print 'Creating partition scheme partitionSchemeYear  ...'

        create partition scheme partitionSchemeYear 
        as partition pfnYear
        ALL TO ([default])

    print 'Created partition scheme partitionSchemeYear'

end

Define Table with misaligned index partition

To satisfy half-way correction, we will do the following:

  • Create a new table dbo.customer
  • The Primary Key will be a Non-clustered Index and it will be named PK_customer and it will be partitioned
  • We will create a Clustered Index (idx_DBA_pf_customerIDChecksumModulo20__customerID)
  • An Index on InsertTime (idx_InsertTime) will not be partitioned
  • An Index on regionName (idx_Region) will not be partitioned


/*

    exec sp_helpindex 'dbo.customer'
*/

if OBJECT_ID('dbo.customer') is not null
begin
    drop table [dbo].[customer] 
end    
go	

CREATE TABLE [dbo].[customer]
(

	[customerID] [varchar](30) NOT NULL,
	[regionName] [varchar](50) NULL,
	[InsertTime] [datetime] CONSTRAINT defaultCustomerInsertTime DEFAULT (getutcdate()) NULL,
	[salesDate] [datetime] not null constraint defaultCustomerSalesDate DEFAULT (getutcdate()),
	[SalesYear] as cast(isNull(datepart(year,[salesDate]), 1990) as smallint) persisted not null,

		CONSTRAINT [PK_customer] PRIMARY KEY NONCLUSTERED 
		(
			  [customerID] ASC
			, [salesYear]
		)
		WITH (
		              PAD_INDEX  = OFF
		            , STATISTICS_NORECOMPUTE  = OFF
		            , IGNORE_DUP_KEY = OFF
		            , ALLOW_ROW_LOCKS  = ON
		            , ALLOW_PAGE_LOCKS  = ON
		            , FILLFACTOR = 50
            ) 
		ON [partitionSchemeYear]([SalesYear])
) ON [partitionSchemeYear]([SalesYear])

GO

CREATE CLUSTERED INDEX [idx_pfnYear_SalesYear__customerID] 
ON [dbo].[customer] 
(
	[salesYear] ASC,
	[customerID] ASC
)
WITH 
(
      PAD_INDEX  = OFF
    , STATISTICS_NORECOMPUTE  = OFF
    , SORT_IN_TEMPDB = OFF
    , IGNORE_DUP_KEY = OFF
    , DROP_EXISTING = OFF
    , ONLINE = OFF
    , ALLOW_ROW_LOCKS  = ON
    , ALLOW_PAGE_LOCKS  = ON
    , FILLFACTOR = 50
) 
ON [partitionSchemeYear]([SalesYear])
GO

CREATE NONCLUSTERED INDEX [idx_InsertTime] 
ON [dbo].[customer] 
    (
        [InsertTime] ASC
    )
    INCLUDE ( [customerID]) 
    WITH (

              ONLINE = ON
            , ALLOW_ROW_LOCKS  = ON
            , ALLOW_PAGE_LOCKS  = ON
          ) 
          ON [PRIMARY]

GO

CREATE NONCLUSTERED INDEX [idx_Region] 
ON [dbo].[customer] 
    (
        [regionName] ASC
    )
    INCLUDE ( [customerID]) 
    WITH (

              ONLINE = ON
            , ALLOW_ROW_LOCKS  = ON
            , ALLOW_PAGE_LOCKS  = ON
          ) 
          ON [PRIMARY]

GO

Insert into Misaligned partition



set statistics time off
set statistics io off

truncate table [dbo].[customer];
go

set statistics profile on; 
set statistics time off;   
set statistics io on;
go

insert into [dbo].[customer]
(
    [customerID], [salesDate]
)
values
(
   'A1', GETUTCDATE()
)

Results of misalignment:

Misaligned Index - Full

Misaligned Index - Full - Statistics IO

Explanation:

  • Expensive Table Spool (Eager Spool) – Taking up 31%
  • Expensive Work Table using up 7 logical reads

Define Table with corrected index partition

To satisfy full correction, we will do the following:

  • Create a new table dbo.customerRevised
  • The Primary Key will be a Non-clustered Index and it will be named PK_customerRevised and it will be partitioned
  • We will create a Clustered Index (idx_DBA_pf_customerIDChecksumModulo20__customerID)
  • An Index on InsertTime (idx_InsertTime) will be partitioned
  • An Index on regionName (idx_Region) will be partitioned


/*
    exec sp_helpindex 'dbo.customerRevised'
*/

if OBJECT_ID('dbo.customerRevised') is not null
begin
    drop table [dbo].[customerRevised] 
end    
go	

CREATE TABLE [dbo].[customerRevised]
(

	[customerID] [varchar](30) NOT NULL,
	[regionName] [varchar](50) NULL,
	[InsertTime] [datetime] 
             CONSTRAINT defaultcustomerRevisedInsertTime 
                DEFAULT (getutcdate()) NULL,
	[salesDate] [datetime] not null 
             constraint defaultcustomerRevisedSalesDate 
                DEFAULT (getutcdate()),
	[SalesYear] as 
              cast(isNull(datepart(year,[salesDate]), 1990) as smallint) 
                  persisted not null,

		CONSTRAINT [PK_customerRevised] PRIMARY KEY NONCLUSTERED 
		(
			  [customerID] ASC
			, [salesYear]
		)
		WITH (
		              PAD_INDEX  = OFF
		            , STATISTICS_NORECOMPUTE  = OFF
		            , IGNORE_DUP_KEY = OFF
		            , ALLOW_ROW_LOCKS  = ON
		            , ALLOW_PAGE_LOCKS  = ON
		            , FILLFACTOR = 50
            ) 
		ON [partitionSchemeYear]([SalesYear])
) ON [partitionSchemeYear]([SalesYear])

GO

CREATE CLUSTERED INDEX [idx_DBA_pf_customerIDChecksumModulo20__customerID] 
ON [dbo].[customerRevised] 
(
	[salesYear] ASC,
	[customerID] ASC
)
WITH 
(
      PAD_INDEX  = OFF
    , STATISTICS_NORECOMPUTE  = OFF
    , SORT_IN_TEMPDB = OFF
    , IGNORE_DUP_KEY = OFF
    , DROP_EXISTING = OFF
    , ONLINE = OFF
    , ALLOW_ROW_LOCKS  = ON
    , ALLOW_PAGE_LOCKS  = ON
    , FILLFACTOR = 50
) 
ON [partitionSchemeYear]([SalesYear])
GO

CREATE NONCLUSTERED INDEX [idx_InsertTime] 
ON [dbo].[customerRevised] 
    (
        [InsertTime] ASC
    )
    INCLUDE ( [customerID]) 
    WITH (

              ONLINE = ON
            , ALLOW_ROW_LOCKS  = ON
            , ALLOW_PAGE_LOCKS  = ON
          ) 
          --ON [PRIMARY]
          ON [partitionSchemeYear]([SalesYear])

GO

CREATE NONCLUSTERED INDEX [idx_Region] 
ON [dbo].[customerRevised] 
    (
        [regionName] ASC
    )
    INCLUDE ( [customerID]) 
    WITH (

              ONLINE = ON
            , ALLOW_ROW_LOCKS  = ON
            , ALLOW_PAGE_LOCKS  = ON
          ) 
          --ON [PRIMARY]
          ON [partitionSchemeYear]([SalesYear])
GO


Misalignment Correction (Insert Statement)


set statistics time off
set statistics io off

truncate table [dbo].[customer];
truncate table [dbo].[customerRevised];
go

set statistics profile on; 
set statistics time off;   
set statistics io on;
go

insert into [dbo].[customer]
(
    [customerID], [salesDate]
)
values
(
   'A1', GETUTCDATE()
)

insert into [dbo].[customerRevised]
(
    [customerID], [salesDate]
)
values
(
    'B1', getutcdate()
)

Misalignment Correction:

Misaligned Index - Correction - Full

Misalignmed Index - Correction - Statistics IO

Explanation:

  • With proper partition correction we now have a much cleaner Plan (At a base of 100% -the original plan costed 57% and the revised plan costed 43%)
  • We have shed the 7 logical reads attributed to the “worktable”

Define Table with half-way corrected index partition

To satisfy half-way correction, we will do the following:

  • Create a new table dbo.customerRevisedHalfway
  • The Primary Key will be a Non-clustered Index and it will be named PK_customerRevisedHalfway and it will be partitioned
  • We will create a Clustered Index (idx_DBA_pf_customerIDChecksumModulo20__customerID)
  • An Index on InsertTime (idx_InsertTime) will be partitioned
  • An Index on regionName (idx_Region) will not be partitioned


/*

    exec sp_helpindex 'dbo.customerRevisedHalfway'
*/

if OBJECT_ID('dbo.customerRevisedHalfway') is not null
begin
    drop table [dbo].[customerRevisedHalfway] 
end    
go	

CREATE TABLE [dbo].[customerRevisedHalfway]
(

	[customerID] [varchar](30) NOT NULL,
	[regionName] [varchar](50) NULL,
	[InsertTime] [datetime] 
            CONSTRAINT defaultcustomerRevisedHalfwayInsertTime 
                DEFAULT (getutcdate()) NULL,
	[salesDate] [datetime] not null 
            constraint defaultcustomerRevisedHalfwaySalesDate
                DEFAULT (getutcdate()),
	[SalesYear] as 
            cast(isNull(datepart(year,[salesDate]), 1990) as smallint) 
               persisted not null,

		CONSTRAINT [PK_customerRevisedHalfway] PRIMARY KEY
                 NONCLUSTERED 
		(
			  [customerID] ASC
			, [salesYear]
		)
		WITH (
		              PAD_INDEX  = OFF
		            , STATISTICS_NORECOMPUTE  = OFF
		            , IGNORE_DUP_KEY = OFF
		            , ALLOW_ROW_LOCKS  = ON
		            , ALLOW_PAGE_LOCKS  = ON
		            , FILLFACTOR = 50
            ) 
		ON [partitionSchemeYear]([SalesYear])
) ON [partitionSchemeYear]([SalesYear])

GO

CREATE CLUSTERED INDEX [idx_DBA_pf_customerIDChecksumModulo20__customerID] 
ON [dbo].[customerRevisedHalfway] 
(
	[salesYear] ASC,
	[customerID] ASC
)
WITH 
(
      PAD_INDEX  = OFF
    , STATISTICS_NORECOMPUTE  = OFF
    , SORT_IN_TEMPDB = OFF
    , IGNORE_DUP_KEY = OFF
    , DROP_EXISTING = OFF
    , ONLINE = OFF
    , ALLOW_ROW_LOCKS  = ON
    , ALLOW_PAGE_LOCKS  = ON
    , FILLFACTOR = 50
) 
ON [partitionSchemeYear]([SalesYear])
GO

CREATE NONCLUSTERED INDEX [idx_InsertTime] 
ON [dbo].[customerRevisedHalfway] 
    (
        [InsertTime] ASC
    )
    INCLUDE ( [customerID]) 
    WITH (

              ONLINE = ON
            , ALLOW_ROW_LOCKS  = ON
            , ALLOW_PAGE_LOCKS  = ON
          ) 
          --ON [PRIMARY]
          ON [partitionSchemeYear]([SalesYear])

GO

--index is pointing @ default file group
CREATE NONCLUSTERED INDEX [idx_Region] 
ON [dbo].[customerRevisedHalfway] 
    (
        [regionName] ASC
    )
    INCLUDE ( [customerID]) 
    WITH (

              ONLINE = ON
            , ALLOW_ROW_LOCKS  = ON
            , ALLOW_PAGE_LOCKS  = ON
          ) 
          ON [PRIMARY]
          --ON [partitionSchemeYear]([SalesYear])

GO

Insert into Half way aligned partition


set statistics time off
set statistics io off

truncate table [dbo].[customer];
truncate table [dbo].[customerRevised];
truncate table [dbo].[customerRevisedHalfway];
go

set statistics profile on; 
set statistics time off;   
set statistics io on;
go

insert into [dbo].[customer]
(
    [customerID], [salesDate]
)
values
(
   'A1', GETUTCDATE()
)

insert into [dbo].[customerRevised]
(
    [customerID], [salesDate]
)
values
(
    'B1', getutcdate()
)

insert into [dbo].[customerRevisedHalfway]
(
  [customerID], [salesDate]
)
values
(
  'C1', getutcdate()
)

Misalignment Correction (Halfway):

Misaligned Index - Correction - Halfway

Statistics I/O

Misaligned Index - Correction - Halfway (Statistics IO)

Explanation:

  • The Query cost of misalignment is at 40%, fully corrected alignment is at 30%, and halfway correction is at 30%
  • When partitioning is fully corrected and when only halfway corrected we still shed the same amount of logical reads (7); directly attributable to the “worktable”
  • For the half way corrected partitioned index, you will see two distinct index inserts (the first one for the Clustered index and the secondary one for the non-partitioned aligned index); note that the half way corrected case is pointing at two different file groups (different partitions)
  • Whereas for the fully corrected index, we have a single index operator (all indexes are tied into the same partition)

CodeBase

To share the code, I have posted it to my GitHub Account.

Conclusion

Eager Spool

ScaryDBA in his web post,  Spools in Execution Plan ( http://www.scarydba.com/2009/09/09/spools-in-execution-plans/), did a good job describing this Table Spool thing.

Spool operations are temporary storage of the data for later reuse in a query plan. There are two types of spool operations, eager spool and lazy spool. A spool is basically a temporary table created within the execution of the query that is used when it’s likely that data will be needed again, and again during the execution of the query. This is not an explicit #temp temporary table, but a work table for operations within the processing necessary for a given query’s behavior. A spool is created when the optimizer thinks that it can work better with a semi-permanent sub-set of data rather than have to perform multiple seeks or scans against a table or index or in other places where data re-use is important (more in a bit).

Seemingly, in our original table definition, with more than one misaligned index, we ended up with a situation where post the “Clustered Index” operation, the Query Engine opted to go the route of creating and populating “Table Spools”.

It is not immediately clear at this point why the Query Engine chose this fairly in-efficient path.

Addendum

Addendum (2013-04-27)

The original incident was captured on MS SQL Server – v2008R2.

I have revised per v2012.

Execution Plan

TableSpool__MSSQLServer_v2012

IO Cost

TableSpool__MSSQLServer_v2012_IOStats

So Microsoft’s documentation did a fairly good job on describing what a Table Spool is:

http://msdn.microsoft.com/en-us/library/ms181032(v=sql.105).aspx
The Table Spool operator scans the input and places a copy of each row in a hidden spool table that is stored in the tempdb database and existing only for the lifetime of the query. If the operator is rewound (for example, by a Nested Loops operator) but no rebinding is needed, the spooled data is used instead of rescanning the input.

But, I will allow you the Performance Engineer to look at your queries and understand which ones can potentially follow this path and guard against it when other paths can be a bit less pernicious.