Column Stores Indexes – Query Comparison – Unpartitioned VS Partitioned

Objective

We had Column Store Tables created and now have tracking partitioned tables created, as well.

It is now time to see whether we benefit by partitioning the tables.

From our original set of tables, we only created tracking Partition tables for the Row Store Clustered Index and the Column Store Clustered Index.

For brevity sake, we skipped the Column Store Non-Clustered Index.

 

Query

Scenario

Our sample query will retrieve the top two sales within a date range.

SQL

Sample

Here is what the query likes when we target the [Fact].[SaleLargeRowStore] table.

BTW, that table is the unpartitioned Row Store table.

It has a Clustered Index on it’s primary key, Sale Key.


declare @dateBegin datetime
declare @dateEnd   datetime


set @dateBegin = '2013-04-15';
set @dateEnd = '2013-05-15';

; with cteYearCustomer --[Fact].[SaleLargeRowStore]
(
      [CustomerKey]
    , [profit]
)
as
(
    select  
              [CustomerKey] = tblFS.[Customer Key]
            , [profit] = max(tblFS.[profit])

    from  [Fact].[SaleLargeRowStore] tblFS

    where tblFS.[Invoice Date Key] between @dateBegin and @dateEnd

    group by
             tblFS.[Customer Key]

)

select top 2 *

from    cteYearCustomer cteYC

order by
        [profit] desc


Full


dbcc dropcleanbuffers with no_infomsgs
go

use [WideWorldImportersDW]
go

set nocount on
go

set statistics io on
go

declare @dateBegin datetime
declare @dateEnd   datetime


set @dateBegin = '2013-04-15';
set @dateEnd = '2013-05-15';

-- [Fact].[SaleLargeRowStore] 
declare @dateRSCBegin datetime
declare @dateRSCEnd datetime

declare @dateRSCPIndexIsChosenByEngineBegin datetime
declare @dateRSCPIndexIsChosenByEngineEnd datetime

declare @dateRSCPIndexIsClusteredBegin datetime
declare @dateRSCPIndexIsClusteredEnd datetime

declare @dateCSCIndexBegin datetime
declare @dateCSCIndexEnd datetime

declare @dateCSCPIndexBegin datetime
declare @dateCSCPIndexEnd datetime

declare @dateCSNCIndexBegin datetime
declare @dateCSNCIndexEnd datetime


declare @datediffSum bigint 

declare @NUMBER_OF_CHARS_IN_DIVIDER int

set @NUMBER_OF_CHARS_IN_DIVIDER = 180

set @dateRSCBegin = getdate();

print '@@version :- '
		+ cast(@@version as varchar(4000))

print replicate('=', @NUMBER_OF_CHARS_IN_DIVIDER);

print '[Fact].[SaleLargeRowStore]'
print '**************************'

-- [Fact].[SaleLargeRowStore]
; with cteYearCustomer --[Fact].[SaleLargeRowStore]
(
	  [CustomerKey]
	, [profit]
)
as
(
	select  
			  [CustomerKey] = tblFS.[Customer Key]
			, [profit] = max(tblFS.[profit])

	from  [Fact].[SaleLargeRowStore] tblFS

	where tblFS.[Invoice Date Key] between @dateBegin and @dateEnd

	group by
			 tblFS.[Customer Key]

)

select top 2 *

from    cteYearCustomer cteYC

order by
		[profit] desc

set @dateRSCEnd = getdate()

print replicate('=', @NUMBER_OF_CHARS_IN_DIVIDER);



/*
	 [Fact].[SaleLargeRowStoreClusteredPartition]
*/

print '[Fact].[SaleLargeRowStoreClusteredPartition] -- Clustered Index'
print '***************************************************************'

set @dateRSCPIndexIsClusteredBegin = getdate()

; with cteYearCustomer --[Fact].[SaleLargeRowStoreClusteredPartition] --Clustered Index
(
	  [CustomerKey]
	, [profit]
)
as
(
	select  
			  [CustomerKey] = tblFS.[Customer Key]
			, [profit] = max(tblFS.[profit])

	from [Fact].[SaleLargeRowStoreClusteredPartition]  tblFS with ( INDEX = 1)

	where tblFS.[Invoice Date Key] between @dateBegin and @dateEnd

	group by
			 tblFS.[Customer Key]

)

select top 2 *

from    cteYearCustomer cteYC

order by
		[profit] desc

set @dateRSCPIndexIsClusteredEnd = getdate()

print replicate('=', @NUMBER_OF_CHARS_IN_DIVIDER);

exec('dbcc dropcleanbuffers with no_infomsgs')

/*
	 [Fact].[SaleLargeRowStoreClusteredPartition]
*/

print '[Fact].[SaleLargeRowStoreClusteredPartition] -- Chosen by SQL Engine'
print '********************************************************************'

set @dateRSCPIndexIsChosenByEngineBegin = getdate()

; with cteYearCustomer --[Fact].[SaleLargeRowStoreClusteredPartition] -- Chosen by SQL Engine
(
	  [CustomerKey]
	, [profit]
)
as
(
	select  
			  [CustomerKey] = tblFS.[Customer Key]
			, [profit] = max(tblFS.[profit])

	from [Fact].[SaleLargeRowStoreClusteredPartition]  tblFS --with ( INDEX = 1)

	where tblFS.[Invoice Date Key] between @dateBegin and @dateEnd

	group by
			 tblFS.[Customer Key]

)

select top 2 *

from    cteYearCustomer cteYC

order by
		[profit] desc

set @dateRSCPIndexIsChosenByEngineEnd = getdate()

print replicate('=', @NUMBER_OF_CHARS_IN_DIVIDER);



/*
	[Fact].[SaleLargeColumnStoreClustered]
*/
print '[Fact].[SaleLargeColumnStoreClustered]'
print '**************************************'

set @dateCSCIndexBegin = getdate()

; with cteYearCustomer -- [Fact].[SaleLargeColumnStoreClustered]
(
	  [CustomerKey]
	, [profit]
)
as
(
	select  
			  [CustomerKey] = tblFS.[Customer Key]
			, [profit] = max(tblFS.[profit])

	from  [Fact].[SaleLargeColumnStoreClustered] tblFS

	where tblFS.[Invoice Date Key] between @dateBegin and @dateEnd

	group by
			--  tblY.[year]
			 tblFS.[Customer Key]

)

select top 2 *

from    cteYearCustomer cteYC

order by
		[profit] desc

set @dateCSCIndexEnd = getdate()

print replicate('=', @NUMBER_OF_CHARS_IN_DIVIDER);

/*
	[Fact].[SaleLargeColumnStoreClusteredPartition] 
*/

print '[Fact].[SaleLargeColumnStoreClusteredPartition]'
print '**********************************************'

set @dateCSCPIndexBegin = getdate()

; with cteYearCustomer -- [Fact].[SaleLargeColumnStoreClusteredPartition] 
(
	  [CustomerKey]
	, [profit]
)
as
(
	select  
			  [CustomerKey] = tblFS.[Customer Key]
			, [profit] = max(tblFS.[profit])

	from  [Fact].[SaleLargeColumnStoreClusteredPartition] tblFS

	where tblFS.[Invoice Date Key] between @dateBegin and @dateEnd

	group by
			 tblFS.[Customer Key]

)

select top 2 *

from    cteYearCustomer cteYC

order by
		[profit] desc

set @dateCSCPIndexEnd = getdate()

print replicate('=', @NUMBER_OF_CHARS_IN_DIVIDER)

/*
	[Fact].[SaleLargeColumnStoreNonClustered] 
*/

print '[Fact].[SaleLargeColumnStoreNonClustered]'
print '*****************************************'

set @dateCSNCIndexBegin = getdate()

; with cteYearCustomer -- [Fact].[SaleLargeColumnStoreNonClustered 
(
	  [CustomerKey]
	, [profit]
)
as
(
	select  
			  [CustomerKey] = tblFS.[Customer Key]
			, [profit] = max(tblFS.[profit])

	from  [Fact].[SaleLargeColumnStoreNonClustered] tblFS

	where tblFS.[Invoice Date Key] between @dateBegin and @dateEnd

	group by
			 tblFS.[Customer Key]

)

select top 2 *

from    cteYearCustomer cteYC

order by
		[profit] desc

set @dateCSNCIndexEnd = getdate()


print replicate('=', @NUMBER_OF_CHARS_IN_DIVIDER)

set @datediffSum =
			        isNull(datediff(millisecond, @dateRSCBegin, @dateRSCEnd), 0)
				  + isNull(datediff(millisecond, @dateRSCPIndexIsChosenByEngineBegin, @dateRSCPIndexIsChosenByEngineEnd), 0)
				  + isNull(datediff(millisecond, @dateRSCPIndexIsClusteredBegin, @dateRSCPIndexIsClusteredEnd), 0)
				  + isNull(datediff(millisecond, @dateCSCIndexBegin, @dateCSCIndexEnd), 0)
				  + isNull(datediff(millisecond, @dateCSCPIndexBegin, @dateCSCPIndexEnd), 0)
				  + isNull(datediff(millisecond, @dateCSNCIndexBegin, @dateCSNCIndexEnd), 0)


select 

		  [sourceID] = [sourceID]
		,  =  
		, [dateStart] = dateStart
		, [dateEnd] = dateEnd
		, [durationMilliseconds] =  [duration] 
		, [%] = cast([%] as decimal(6, 2))

from   
	(

		select 
				  [sourceID] = 1
				,  = '[Fact].[SaleLargeRowStore]'
				, [dateStart] = @dateRSCBegin
				, [dateEnd] = @dateRSCEnd
				, [duration] = datediff(millisecond, @dateRSCBegin, @dateRSCEnd)
				, [%] = (datediff(millisecond, @dateRSCBegin, @dateRSCEnd)) * 100.00 / @datediffSum


		union all

		select 
				  [sourceID] = 2.1
				,  = '[Fact].[SaleLargeRowStoreClusteredPartition] - Clustered Index'
				, [dateStart] = @dateRSCPIndexIsClusteredBegin
				, [dateEnd] = @dateRSCPIndexIsClusteredEnd
				, [duration] = datediff(millisecond, @dateRSCPIndexIsClusteredBegin, @dateRSCPIndexIsClusteredEnd)
				, [%] = (datediff(millisecond, @dateRSCPIndexIsClusteredBegin, @dateRSCPIndexIsClusteredEnd)) * 100.00 / @datediffSum

		union all

		select 
				  [sourceID] = 2.2
				,  = '[Fact].[SaleLargeRowStoreClusteredPartition] - Chosen by Engine'
				, [dateStart] = @dateRSCPIndexIsChosenByEngineBegin
				, [dateEnd] = @dateRSCPIndexIsChosenByEngineBegin
				, [duration] = datediff(millisecond, @dateRSCPIndexIsChosenByEngineBegin, @dateRSCPIndexIsChosenByEngineEnd)
				, [%] = (datediff(millisecond, @dateRSCPIndexIsChosenByEngineBegin, @dateRSCPIndexIsChosenByEngineEnd)) * 100.00 / @datediffSum

		union all

		select 
				  [sourceID] = 3
				,  = '[Fact].[SaleLargeColumnStoreClustered]'
				, [dateStart] = @dateCSCIndexBegin
				, [dateEnd] = @dateCSCIndexEnd
				, [duration] = datediff(millisecond, @dateCSCIndexBegin, @dateCSCIndexEnd)
				, [%] = (datediff(millisecond, @dateCSCIndexBegin, @dateCSCIndexEnd)) * 100.00 / @datediffSum


		union all

		select 
				  [sourceID] = 4
				,  = '[Fact].[SaleLargeColumnStoreClusteredPartition]'
				, [dateStart] = @dateCSCPIndexBegin
				, [dateEnd] = @dateCSCPIndexEnd
				, [duration] = datediff(millisecond, @dateCSCPIndexBegin, @dateCSCPIndexEnd)
				, [%] = (datediff(millisecond, @dateCSCPIndexBegin, @dateCSCPIndexEnd)) * 100.00 / @datediffSum

		union all

		select 
				  [sourceID] = 5
				,  = '[Fact].[SaleLargeColumnStoreNonClustered]'
				, [dateStart] = @dateCSNCIndexBegin
				, [dateEnd] = @dateCSNCIndexEnd
				, [duration] = datediff(millisecond, @dateCSNCIndexBegin, @dateCSNCIndexEnd)
				, [%] = (datediff(millisecond, @dateCSNCIndexBegin, @dateCSNCIndexEnd)) * 100.00 / @datediffSum
		

	) tblA

order by

	  [sourceID] asc


Query Plan

Query Plan – v2014

Row Store – Unpartition Table ( Fact.SaleLargeRowStore )

Image

Explanation
  1. Missing Index
    • Improvement Projection
      • The Query Processor estimates that implementing the following index could improve the query cost by 93.1978%.
    • Index Statement
      • CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>] ON [Fact].[SaleLargeRowStore] ([Invoice Date Key]) INCLUDE ([Customer Key],[Profit])

Row Store – Partition Table ( Fact.SaleLargeRowStoreClusteredPartition ) – Index Chosen By Engine

Image

Explanation
  1. Missing Index
    • Improvement Projection
      • The Query Processor estimates that implementing the following index could improve the query cost by 93.1978%.
    • Index Statement
      • CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>] ON [Fact].[SaleLargeRowStore] ([Invoice Date Key]) INCLUDE ([Customer Key],[Profit])

Row Store – Partition Table ( Fact.SaleLargeRowStoreClusteredPartition ) – Index Chosen By Engine

Image

Column Store – Unpartitioned Table ( Fact.SaleLargeColumnStoreClustered )

Image

  1. Column Index Scan
    • With a Clustered Column Store Index, the Clustering Index is inclusive of all columns

Column Store – Clustered – Partitioned Table ( Fact.SaleLargeColumnStoreClusteredPartition )

Image

 

Explanation
  1. Column Index Scan
    • With a Clustered Column Store Index, the Clustering Index is inclusive of all columns

 

Column Store – Non Clustered – Non-Partitioned Table –  ( Fact.SaleLargeColumnStoreNonClustered)

Image

Explanation
  1. Table Scan
    • We do not have a targeted Index

 

Statistics I/O

Image

Statistics – v2014

Statistics I/O – v2016

Tabulated

 

Table v2014 v2016
[Fact].[SaleLargeRowStore]
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Workfile’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table ‘Workfile’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘SaleLargeRowStore’. Scan count 1, logical reads 344481, physical reads 2, read-ahead reads 344467, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table ‘SaleLargeRowStore’. Scan count 5, logical reads 344945, physical reads 3, read-ahead reads 344456, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
[Fact].[SaleLargeRowStoreClusteredPartition] – Clustered Index ( Hardcoded to use clustered Index )
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Workfile’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table ‘Workfile’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘SaleLargeRowStoreClusteredPartition’. Scan count 13, logical reads 344557, physical reads 12, read-ahead reads 344531, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table ‘SaleLargeRowStoreClusteredPartition’. Scan count 13, logical reads 344611, physical reads 10, read-ahead reads 344547, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
[Fact].[SaleLargeRowStoreClusteredPartition] – Chosen by SQL Engine
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Workfile’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table ‘Workfile’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘SaleLargeRowStoreClusteredPartition’. Scan count 13, logical reads 37, physical reads 6, read-ahead reads 26, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table ‘SaleLargeRowStoreClusteredPartition’. Scan count 13, logical reads 37, physical reads 6, read-ahead reads 54, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
[Fact].[SaleLargeColumnStoreClustered]
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.  Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Workfile’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.  ( No References to workfile )
Table ‘SaleLargeColumnStoreClustered’. Scan count 1, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 1032, lob physical reads 5, lob read-ahead reads 2706.  Table ‘SaleLargeColumnStoreClustered’. Scan count 2, logical reads 4, physical reads 2, read-ahead reads 8, lob logical reads 268, lob physical reads 5, lob read-ahead reads 497.

Table ‘SaleLargeColumnStoreClustered’. Segment reads 1, segment skipped 0.

[Fact].[SaleLargeColumnStoreClusteredPartition]
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.  Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Workfile’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.  ( No References to workfile )
Table ‘SaleLargeColumnStoreClusteredPartition’. Scan count 1, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 78, lob physical reads 1, lob read-ahead reads 90.  Table ‘SaleLargeColumnStoreClusteredPartition’. Scan count 1, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
[Fact].[SaleLargeColumnStoreNonClustered]
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Workfile’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.  ( No References to workfile )
Table ‘SaleLargeColumnStoreNonClustered’. Scan count 1, logical reads 344132, physical reads 0, read-ahead reads 344098, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table ‘SaleLargeColumnStoreNonClustered’. Scan count 5, logical reads 344005, physical reads 0, read-ahead reads 343533, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Summary

Here a couple of findings:

  1. Vertical partitioning
    • Helpful
      • Especially when queries are date driven
      • Please do some prep work and determine which date columns users often target
    • Not as Helpful
      • In a multi-column Index, not as helpful when the Date Column is not the first column referenced in the Index
  2. SQL Server Version 2016/Enterprise Versus 2014/Developer Edition
    • Optimization
      • In v2016, Enterprise Edition
        • Workfile not referenced in Statistics IO
        • Segment Reads and Segments Skipped explicitly Stated
          • Table ‘SaleLargeColumnStoreClustered’. Segment reads 1, segment skipped 0.

Column Stores Indexes & Partitioning – Partition Tables

Objective

In a recent post, we created the infrastructure objects ( File Groups, Files, Partition Functions, and Partition Schemes).

In this post, we will create tables that will utilize those objects.

 

Outline

 

Table Row or Column Store Partition Index Columns
Fact.SaleLargeRowStore Row Store [Sale Key]
Fact.SaleLargeRowStoreClusteredPartition Row Store InvoiceDateKeyYear [Sale Key]

[Invoice Date Key]

[Invoice Date Key]

[Customer Key]*
[Profit]*

 Fact.SaleLargeColumnStoreClustered  Column Store
 Fact.SaleLargeColumnStoreClusteredPartition  Column Store  InvoiceDateKeyYear
 Fact.SaleLargeColumnStoreNonClustered  Column Store

 

 

Exercise

Tables

[Fact].[SaleLargeRowStoreClusteredPartition]

SQL

Code

USE [WideWorldImportersDW]
GO

/*

    DROP TABLE [Fact].[SaleLargeRowStoreClusteredPartition]

*/

/*

    exec sp_helpindex '[Fact].[SaleLargeRowStoreClusteredPartition]'

*/

if object_id('[Fact].[SaleLargeRowStoreClusteredPartition]') is null
begin

    CREATE TABLE [Fact].[SaleLargeRowStoreClusteredPartition]
    (
        [Sale Key] [bigint] IDENTITY(1,1) NOT NULL,
        [City Key] [int] NOT NULL,
        [Customer Key] [int] NOT NULL,
        [Bill To Customer Key] [int] NOT NULL,
        [Stock Item Key] [int] NOT NULL,
        
        [Invoice Date Key] [date] NOT NULL,

        [InvoiceDateKeyYear] [int] NOT NULL,
        
        [Delivery Date Key] [date] NULL,
        
        [DeliveryDateKeyYear] [int] NULL,
        
        [Salesperson Key] [int] NOT NULL,
        [WWI Invoice ID] [int] NOT NULL,
        [Description] [nvarchar](100) NOT NULL,
        [Package] [nvarchar](50) NOT NULL,
        [Quantity] [int] NOT NULL,
        [Unit Price] [decimal](18, 2) NOT NULL,
        [Tax Rate] [decimal](18, 3) NOT NULL,
        [Total Excluding Tax] [decimal](18, 2) NOT NULL,
        [Tax Amount] [decimal](18, 2) NOT NULL,
        [Profit] [decimal](18, 2) NOT NULL,
        [Total Including Tax] [decimal](18, 2) NOT NULL,
        [Total Dry Items] [int] NOT NULL,
        [Total Chiller Items] [int] NOT NULL,
        [Lineage Key] [int] NOT NULL
    

        ) 

        --ON [USERDATA] 
        ON [pschemeYear]
        (
            [Invoice Date Key]
        )

end
go

declare @object sysname
declare @objectID int
declare @constraint sysname
declare @index  sysname

declare @addPartitionedIndex   bit
declare @addUnpartitionedIndex bit

set @object = '[Fact].[SaleLargeRowStoreClusteredPartition]'
set @constraint = 'PK_Fact_SaleLargeRowStoreClusteredPartition'

set @objectID = object_id(@object)

if object_id(@object) is not null
begin

    /*

        alter table [Fact].[SaleLargeRowStoreClusteredPartition]
            drop CONSTRAINT [PK_Fact_SaleLargeRowStoreClusteredPartition] 

    */
    if not exists
        (

            select *
            from  sys.key_constraints tblSKC
            where tblSKC.parent_object_id = @objectID
            and   tblSKC.[name] = @constraint

        )
    begin

        print 'Create Primary Key ' 
                + @object 
                + '.' 
                + quoteName(@constraint) 
                + ' ...'

        alter table [Fact].[SaleLargeRowStoreClusteredPartition]
        ADD CONSTRAINT [PK_Fact_SaleLargeRowStoreClusteredPartition] 
        PRIMARY KEY CLUSTERED 
            (

                [Sale Key] ASC
                /*
                Msg 1908, Level 16, State 1, Line 19
                Column 'Invoice Date Key' is partitioning column of the index 'PK_Fact_SaleLargeRowStoreClusteredPartition'. Partition columns for a unique index must be a subset of the index key.
                Msg 1750, Level 16, State 0, Line 19
                Could not create constraint or index. See previous errors.
                */
            , [Invoice Date Key]

            )

        --ON [USERDATA]
        ON [pschemeYear]
        (
            [Invoice Date Key]
        )
            
        update statistics [Fact].[SaleLargeRowStoreClusteredPartition]
            [PK_Fact_SaleLargeRowStoreClusteredPartition] 
            with FULLSCAN;


        print 'Created Constraint - Primary Key'
                + @object + '.' 
                + quoteName(@constraint)

    end



    /*
        drop index [Fact].[SaleLargeRowStoreClusteredPartition].[INDX_InvoiceDateKey_WithIncludedColumns_UnPartition]
    */
    set @addPartitionedIndex = 0;
    set @addPartitionedIndex = 1;
    set @index = 'INDX_InvoiceDateKey_WithIncludedColumns_Partition';

    if not exists
        (
            select *
            from  sys.indexes tblSI
            where tblSI.object_id = @objectID
            and   tblSI.[name] = @index
        )
        and (@addPartitionedIndex = 1)
    begin

        print 'Create Index ' + @object + '.' + quoteName(@index) + ' ...'


        CREATE NONCLUSTERED INDEX [INDX_InvoiceDateKey_WithIncludedColumns_Partition]
        ON [Fact].[SaleLargeRowStoreClusteredPartition] 
        (
            [Invoice Date Key]
        )
        INCLUDE 
        (
              [Customer Key]
            , [Profit]
        )
        with
        (
            FILLFACTOR=100
        )
        /*
        ON [USERDATA]
        */
        ON [pschemeYear]
        (
            [Invoice Date Key]
        )


        update statistics [Fact].[SaleLargeRowStoreClusteredPartition]
            [INDX_InvoiceDateKey_WithIncludedColumns_Partition]
            with FULLSCAN;


        print 'Created Index ' + @object + '.' + quoteName(@index)

    end

end


go

 

[Fact].[SaleLargeColumnStoreClusteredPartition]

SQL

Code

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

USE [WideWorldImportersDW]
GO

/*

	DROP TABLE [Fact].[SaleLargeColumnStoreClusteredPartition]

	exec sp_helpindex '[Fact].[SaleLargeColumnStoreClusteredPartition]'

*/
if object_id('[Fact].[SaleLargeColumnStoreClusteredPartition]') is null
begin


	CREATE TABLE [Fact].[SaleLargeColumnStoreClusteredPartition]
	(
		[Sale Key] [bigint] IDENTITY(1,1) NOT NULL,
		[City Key] [int] NOT NULL,
		[Customer Key] [int] NOT NULL,
		[Bill To Customer Key] [int] NOT NULL,
		[Stock Item Key] [int] NOT NULL,
	
		[Invoice Date Key] [date] NOT NULL,
		[InvoiceDateKeyYear] [int] NOT NULL,
			
		[Delivery Date Key] [date] NULL,
		[DeliveryDateKeyYear] [int] NULL,
			
		[Salesperson Key] [int] NOT NULL,
		[WWI Invoice ID] [int] NOT NULL,
		[Description] [nvarchar](100) NOT NULL,
		[Package] [nvarchar](50) NOT NULL,
		[Quantity] [int] NOT NULL,
		[Unit Price] [decimal](18, 2) NOT NULL,
		[Tax Rate] [decimal](18, 3) NOT NULL,
		[Total Excluding Tax] [decimal](18, 2) NOT NULL,
		[Tax Amount] [decimal](18, 2) NOT NULL,
		[Profit] [decimal](18, 2) NOT NULL,
		[Total Including Tax] [decimal](18, 2) NOT NULL,
		[Total Dry Items] [int] NOT NULL,
		[Total Chiller Items] [int] NOT NULL,
		[Lineage Key] [int] NOT NULL
	
			

	) ON [USERDATA]

end


/*
	Object:  Index [INDX_FACT_SaleLargeColumnStoreClustered] 
*/

declare @table sysname
declare @index sysname

set @table = '[Fact].[SaleLargeColumnStoreClusteredPartition]';
set @index = 'INDX_FACT_SaleLargeColumnStoreClusteredPartition'

if object_id(@table) is not null
begin

	if not exists
	(
		select *
		from   sys.indexes tblSI
		where  tblSI.object_id = object_id(@table) 
		and    tblSI.[name] = @index
	) --and (1=0)
	begin

		print 'Create Index ' 
				+ @table
				+ ' - Index '
				+ @index
		
		CREATE CLUSTERED INDEX [INDX_FACT_SaleLargeColumnStoreClusteredPartition] 
		ON [Fact].[SaleLargeColumnStoreClusteredPartition] 
		(
			[Invoice Date Key]
		)
		WITH 
		(
				DROP_EXISTING = OFF
			--, FILLFACTOR = 100
			--, COMPRESSION_DELAY = 0
		) 
		--ON [USERDATA]
		ON [pschemeYear]
			(
				[Invoice Date Key]
			)
		
	end


	if exists
	(
		select *
		from   sys.indexes tblSI
		where  tblSI.object_id = object_id(@table) 
		and    tblSI.[name] = @index
		and    tblSI.type_desc != 'CLUSTERED COLUMNSTORE'
	)
	begin

		print 'Create Index ' 
				+ @table
				+ ' - Index '
				+ @index

		CREATE CLUSTERED COLUMNSTORE INDEX [INDX_FACT_SaleLargeColumnStoreClusteredPartition] 
		ON [Fact].[SaleLargeColumnStoreClusteredPartition] 
		WITH 
		(
			DROP_EXISTING = ON
		) 
		--ON [USERDATA]
		ON [pschemeYear]
		(
			[Invoice Date Key]
		)


	end

end
GO


Source Code Control

GitHub

The Code for creating the tables is posted on Github here

 

Summary

We will add corresponding data to our newly created partitioned tables and run a few queries and see the effect of the partitioning.

 

Column Stores Indexes & Partitioning – Foundational Work

Background

As data grows bigger and ages it becomes prudent to look at partitioning as an avenue to better reduce the overall footprint of data read from disk into memory.

Column Store by its very definition delivers on the promises of Vertical Partitioning.

Let us see review the steps we have to take to achieve Range Partitioning.

 

Design Goals

Our goal is to divvy up our data into years.

In each table that we will be targeting we will ensure that a date column exists.

And, hope to have one of those date columns track the occurrence date.

 

Overview

Here are the plumbing areas:

  1. Create new Filegroups and files within the new file groups
  2. Create Partitioning Function
  3. Create Partitioning Scheme

 

Steps

Platform

File Groups & Files

Objective

Create File groups for each business year.

And, create a single file in each file group.

SQL


USE [WideWorldImportersDW]   
go

-- Year NULL

if not exists  
(      
	select *      
	from   sys.filegroups tblSFG      
	where  tblSFG.name = 'fg_Partition_Year_Null'  
)  
begin      

	ALTER DATABASE [WideWorldImportersDW] 
		ADD FILEGROUP  [fg_Partition_Year_Null]   
		
end 
go

if not exists  
	(      
		select *      
		from   sys.master_files tblMF      
		where  tblMF.[database_id] = db_id('WideWorldImportersDW')      
		and    tblMF.name = 'fg_Partition_Year_Null__01'  
	)  
begin 

   ALTER DATABASE [WideWorldImportersDW]  
   ADD FILE   
   (      
      NAME = [fg_Partition_Year_Null__01]   
      , FILENAME = 'Z:\Microsoft\SQLServer\DataFiles\WideWorldImportersDW_fg_Partition_Year_Null_file_001.ndf'  
   )  
   TO FILEGROUP [fg_Partition_Year_Null]  
	
end 


-- Year 2010
if not exists  
(      
    select *      
    from   sys.filegroups tblSFG      
    where  tblSFG.name = 'fg_Partition_Year_2010'  
)  
begin

   ALTER DATABASE [WideWorldImportersDW]         
      ADD FILEGROUP  [fg_Partition_Year_2010]   

end 
    
if not exists  
(      
   select *      
   from   sys.master_files tblMF      
   where  tblMF.[database_id] = db_id('WideWorldImportersDW')      
   and    tblMF.name = 'fg_Partition_Year_2010__01'  
)  
begin
   
   ALTER DATABASE [WideWorldImportersDW]  
   ADD FILE   
   (      
         NAME = [fg_Partition_Year_2010__01]   
       , FILENAME = 'Z:\Microsoft\SQLServer\DataFiles\WideWorldImportersDW_fg_Partition_Year_2010_file_001.ndf'  
  )  
  TO FILEGROUP [fg_Partition_Year_2010]  

end 


-- Year 2011
if not exists  
(      
    select *      
    from   sys.filegroups tblSFG      
    where  tblSFG.name = 'fg_Partition_Year_2011'  
)  
begin

   ALTER DATABASE [WideWorldImportersDW]         
      ADD FILEGROUP  [fg_Partition_Year_2011]   

end 
    
if not exists  
(      
   select *      
   from   sys.master_files tblMF      
   where  tblMF.[database_id] = db_id('WideWorldImportersDW')      
   and    tblMF.name = 'fg_Partition_Year_2011__01'  
)  
begin
   
   ALTER DATABASE [WideWorldImportersDW]  
   ADD FILE   
   (      
         NAME = [fg_Partition_Year_2011__01]   
       , FILENAME = 'Z:\Microsoft\SQLServer\DataFiles\WideWorldImportersDW_fg_Partition_Year_2011_file_001.ndf'  
  )  
  TO FILEGROUP [fg_Partition_Year_2011]  

end 


-- Year 2012
if not exists  
(      
    select *      
    from   sys.filegroups tblSFG      
    where  tblSFG.name = 'fg_Partition_Year_2012'  
)  
begin

   ALTER DATABASE [WideWorldImportersDW]         
      ADD FILEGROUP  [fg_Partition_Year_2012]   

end 
    
if not exists  
(      
   select *      
   from   sys.master_files tblMF      
   where  tblMF.[database_id] = db_id('WideWorldImportersDW')      
   and    tblMF.name = 'fg_Partition_Year_2012__01'  
)  
begin
   
   ALTER DATABASE [WideWorldImportersDW]  
   ADD FILE   
   (      
         NAME = [fg_Partition_Year_2012__01]   
       , FILENAME = 'Z:\Microsoft\SQLServer\DataFiles\WideWorldImportersDW_fg_Partition_Year_2012_file_001.ndf'  
  )  
  TO FILEGROUP [fg_Partition_Year_2012]  

end 


-- Year 2013
if not exists  
(      
    select *      
    from   sys.filegroups tblSFG      
    where  tblSFG.name = 'fg_Partition_Year_2013'  
)  
begin

   ALTER DATABASE [WideWorldImportersDW]         
      ADD FILEGROUP  [fg_Partition_Year_2013]   

end 
    
if not exists  
(      
   select *      
   from   sys.master_files tblMF      
   where  tblMF.[database_id] = db_id('WideWorldImportersDW')      
   and    tblMF.name = 'fg_Partition_Year_2013__01'  
)  
begin
   
   ALTER DATABASE [WideWorldImportersDW]  
   ADD FILE   
   (      
         NAME = [fg_Partition_Year_2013__01]   
       , FILENAME = 'Z:\Microsoft\SQLServer\DataFiles\WideWorldImportersDW_fg_Partition_Year_2013_file_001.ndf'  
  )  
  TO FILEGROUP [fg_Partition_Year_2013]  

end 


-- Year 2014
if not exists  
(      
    select *      
    from   sys.filegroups tblSFG      
    where  tblSFG.name = 'fg_Partition_Year_2014'  
)  
begin

   ALTER DATABASE [WideWorldImportersDW]         
      ADD FILEGROUP  [fg_Partition_Year_2014]   

end 
    
if not exists  
(      
   select *      
   from   sys.master_files tblMF      
   where  tblMF.[database_id] = db_id('WideWorldImportersDW')      
   and    tblMF.name = 'fg_Partition_Year_2014__01'  
)  
begin
   
   ALTER DATABASE [WideWorldImportersDW]  
   ADD FILE   
   (      
         NAME = [fg_Partition_Year_2014__01]   
       , FILENAME = 'Z:\Microsoft\SQLServer\DataFiles\WideWorldImportersDW_fg_Partition_Year_2014_file_001.ndf'  
  )  
  TO FILEGROUP [fg_Partition_Year_2014]  

end 


-- Year 2015
if not exists  
(      
    select *      
    from   sys.filegroups tblSFG      
    where  tblSFG.name = 'fg_Partition_Year_2015'  
)  
begin

   ALTER DATABASE [WideWorldImportersDW]         
      ADD FILEGROUP  [fg_Partition_Year_2015]   

end 
    
if not exists  
(      
   select *      
   from   sys.master_files tblMF      
   where  tblMF.[database_id] = db_id('WideWorldImportersDW')      
   and    tblMF.name = 'fg_Partition_Year_2015__01'  
)  
begin
   
   ALTER DATABASE [WideWorldImportersDW]  
   ADD FILE   
   (      
         NAME = [fg_Partition_Year_2015__01]   
       , FILENAME = 'Z:\Microsoft\SQLServer\DataFiles\WideWorldImportersDW_fg_Partition_Year_2015_file_001.ndf'  
  )  
  TO FILEGROUP [fg_Partition_Year_2015]  

end 


-- Year 2016
if not exists  
(      
    select *      
    from   sys.filegroups tblSFG      
    where  tblSFG.name = 'fg_Partition_Year_2016'  
)  
begin

   ALTER DATABASE [WideWorldImportersDW]         
      ADD FILEGROUP  [fg_Partition_Year_2016]   

end 
    
if not exists  
(      
   select *      
   from   sys.master_files tblMF      
   where  tblMF.[database_id] = db_id('WideWorldImportersDW')      
   and    tblMF.name = 'fg_Partition_Year_2016__01'  
)  
begin
   
   ALTER DATABASE [WideWorldImportersDW]  
   ADD FILE   
   (      
         NAME = [fg_Partition_Year_2016__01]   
       , FILENAME = 'Z:\Microsoft\SQLServer\DataFiles\WideWorldImportersDW_fg_Partition_Year_2016_file_001.ndf'  
  )  
  TO FILEGROUP [fg_Partition_Year_2016]  

end 

-- Year 2017
if not exists  
(      
    select *      
    from   sys.filegroups tblSFG      
    where  tblSFG.name = 'fg_Partition_Year_2017'  
)  
begin

   ALTER DATABASE [WideWorldImportersDW]         
      ADD FILEGROUP  [fg_Partition_Year_2017]   

end 
    
if not exists  
(      
   select *      
   from   sys.master_files tblMF      
   where  tblMF.[database_id] = db_id('WideWorldImportersDW')      
   and    tblMF.name = 'fg_Partition_Year_2017__01'  
)  
begin
   
   ALTER DATABASE [WideWorldImportersDW]  
   ADD FILE   
   (      
         NAME = [fg_Partition_Year_2017__01]   
       , FILENAME = 'Z:\Microsoft\SQLServer\DataFiles\WideWorldImportersDW_fg_Partition_Year_2017_file_001.ndf'  
  )  
  TO FILEGROUP [fg_Partition_Year_2017]  

end 

Partition Function

Objective

  1. Create a new Partition Function
  2. Indicate Column’s data type
    • date
  3. Range Type
    • Left or Right
      • Right Values
  4. Values
    • null
      • When date is null
      • When null entries are accounted for, please be sure to use convert function
      • Else
        • Msg 7705, Level 16, State 1
        • Could not implicitly convert range values type specified at ordinal 1 to partition function parameter type

SQL

Code

 CREATE PARTITION FUNCTION [pfnYear]
 ( 
    date 
 )     
 AS RANGE RIGHT FOR VALUES    
 (       
	  convert ( date , null ) 
	, '2010-01-01'
	, '2011-01-01'
	, '2012-01-01'
	, '2013-01-01'
	, '2014-01-01'
	, '2015-01-01'
	, '2016-01-01'
	, '2017-01-01'
	, '2018-01-01'
	, '2019-01-01'
	, '2020-01-01'  
) 

Partition Scheme

SQL



CREATE PARTITION SCHEME [pschemeYear]   
AS PARTITION [pfnYear]    
TO  
(       
	  [fg_Partition_Year_Null]
	, [fg_Partition_Year_2010]
	, [fg_Partition_Year_2011]
	, [fg_Partition_Year_2012]
	, [fg_Partition_Year_2013]
	, [fg_Partition_Year_2014]
	, [fg_Partition_Year_2015]
	, [fg_Partition_Year_2016]
	, [fg_Partition_Year_2017]
	, [fg_Partition_Year_2018]
	, [fg_Partition_Year_2019]
	, [fg_Partition_Year_2020]
	, [fg_Partition_Year_Next]  
	
) 

Summary

To make it easier to come back and edit our documentation, we will address using the underlying Partitioning components in a secondary post

 

Denzil Ribeiro :- SQL Server – Column Store Index – Defrag

Background

Based on comments on numerous Blog Postings wanted to take a look at Denzil Riberio’s script for Identifying Column Store Indexes ripe for defragging.

BTW, those blog postings are Niko Neugebauer ( Link ) and Greg Low ( Link ).

 

Denzil Ribeiro

Denzil is MSFT and here is his blog post :

  1. SQL 2014 Clustered Columnstore index rebuild and maintenance considerations
    Published On :- 2015-July-8th
    Link

 

Code

Stored Procedure

dbo.sp_DenzilRibeiro_columnStoreIndexDefrag

 


use master
go


if object_id('[dbo].[sp_DenzilRibeiro_columnStoreIndexDefrag]') is null
begin
	
	exec('create procedure [dbo].[sp_DenzilRibeiro_columnStoreIndexDefrag] as ')

end
go

alter procedure [dbo].[sp_DenzilRibeiro_columnStoreIndexDefrag]
(
	  @debug bit = 0
	, @deletedRowsPercent int = 10
	, @rowgroupQualityNumberofRows int = 500000
	, @scriptOnly bit = 1
)
as

begin

	set nocount on;
	set XACT_ABORT on;

	/*
		The sample scripts are not supported under any Microsoft standard support program or service
		and are intented as a supplement to online documentation.The sample scripts are provided AS IS without warranty
		of any kind either expressed or implied. Microsoft further disclaims all implied warranties including,
		without limitation, any implied warranties of merchantability or of fitness for a particular purpose.
	*/

	/*
		Rebuild index statement is printed at partition level if

		a. RGQualityMeasure is not met for @PercentageRGQualityPassed Rowgroups

		   i) this is an arbitrary number, what we are saying is that if the average is above this number, don't bother rebuilding as we consider this number to be good quality rowgroups
		  ii) Second constraint is the Deleted rows, currently the default that is set am setting is 10% of the partition itself. If the partition is very large or small consider adjusting this

		c. In SQL 2014, post index rebuild,the dmv doesn't show why the RG is trimmed to < 1 million in this case in SQL 2014.
			i)  If the Dictionary is full ( 16MB) then no use in rebuilding this rowgroup as even after rebuild it may get trimmed
			ii) If dictionary is full only rebuild if deleted rows falls above the threshold

	*/

	if object_id('tempdb..#temp') IS NOT NULL
	begin

		drop table #temp

	end

	--Declare @DeletedRowsPercent Decimal(5,2)

	-- Debug = 1 if you need all rowgroup information regardless
	--Declare @Debug int =0

	--Percent of deleted rows for the partition
	--Set @DeletedRowsPercent = 10

	--RGQuality means we are saying anything over 500K compressed is good row group quality, anything less need to re-evaluate.
	--Declare @RGQuality int = 500000

	-- means 50% of rowgroups are < @RGQUality from the rows/rowgroup perspective
	--Declare @PercentageRGQualityPassed smallint = 20
	Declare @PercentageRGQualityPassed smallint --= 20

	--— Maxdop Hint optionally added to ensure we don't spread small amount of rows accross many threads
	--  IF we do that, we may end up with smaller rowgroups anyways.
	declare @maxdophint			smallint
	declare @effectivedop		smallint

	declare @iNumberofRecords	int
	declare @iRecordumber		int
	declare @command			nvarchar(4000)

	set @PercentageRGQualityPassed = 20


	;WITH CSAnalysis
	( 
		  [object_id]
		, TableName
		, SchemaName
		, index_id
		, indexName
		, partition_number
		, CountRGs
		, TotalRows
		, AvgRowsPerRG
		, CountRGLessThanQualityMeasure
		, RGQualityMeasure
		, PercentageRGLessThanQualityMeasure
		, DeletedRowsPercent
		, NumRowgroupsWithDeletedRows
	)
	AS
	(
		SELECT 
			  rg.object_id
			, object_name(rg.object_id) as TableName
			, SCHEMA_NAME(t.schema_id) AS SchemaName
			, rg.index_id
			, indexName = tblSI.[name]
			, rg.partition_number
			, COUNT(*) as CountRGs
			, SUM(total_rows) as TotalRows
			, AVG(total_rows) as AvgRowsPerRG
			, CountRGLessThanQualityMeasure
				= SUM(
						CASE 
							--WHEN rg.Total_Rows <@RGQuality THEN 1 
							WHEN rg.total_rows < @rowgroupQualityNumberofRows then 1
							ELSE 0 END
					) --as CountRGLessThanQualityMeasure

			--, @RGQuality as RGQualityMeasure
			, RGQualityMeasure
				= @rowgroupQualityNumberofRows

			, PercentageRGLessThanQualityMeasure
				= cast
				(
					(
						SUM
							(
								CASE 

									--WHEN rg.Total_Rows <@RGQuality THEN 1.0 ELSE 0 
									WHEN rg.Total_Rows <@rowgroupQualityNumberofRows THEN 1.0 ELSE 0 END ) /count(*) *100 ) as Decimal(5,2) ) -- PercentageRGLessThanQualityMeasure , [DeletedRowsPercent] = Sum ( rg.deleted_rows * 1.0 ) /sum (rg.total_rows *1.0) *100 --as 'DeletedRowsPercent' , [NumRowgroupsWithDeletedRows] = sum ( case when rg.deleted_rows >0 then 1 
						else 0 
					end 
				) --as 'NumRowgroupsWithDeletedRows'

		FROM sys.column_store_row_groups rg

		JOIN sys.tables t
			ON rg.object_id = t.object_id

		JOIN sys.indexes tblSI
			ON rg.object_id = tblSI.object_id
			and rg.index_id = tblSI.index_id

		where rg.state = 3

		group by 
			  rg.object_id
			, t.schema_id
			, rg.index_id
			, tblSI.[name]
			, rg.partition_number

	)
	, CSDictionaries 
	(
		  maxdictionarysize		  -- int
		, maxdictionaryentrycount -- int
		, maxpartition_number
		, [object_id]			  -- int
		, [index_id]
		, partition_number		  -- int
	)
	AS
	( 
		select 
				  max(dict.on_disk_size) as maxdictionarysize
				, max(dict.entry_count) as maxdictionaryentrycount
				, max(partition_number) as maxpartition_number
				, part.object_id
				, part.index_id
				, part.partition_number

		from sys.column_store_dictionaries dict

		join sys.partitions part 
			on dict.hobt_id = part.hobt_id

		group by 
			  part.object_id
			, part.index_id
			, part.partition_number
	)
	select 
			  [rowNumber] = rank()
								over
								(
									order by
									   a.SchemaName
									 , a.TableName
									 , a.index_id
									 , a.partition_number
								)
			, a.*
			, b.maxdictionarysize
			, b.maxdictionaryentrycount
			, maxpartition_number
			, [command] = cast(null as nvarchar(4000))
			, [identified] = cast(0	as bit)

	into #temp 

	from CSAnalysis a

	inner join CSDictionaries b
			on  a.object_id = b.object_id 
			and a.index_id = b.index_id
			and a.partition_number = b.partition_number


	--— True if running from the same context that will run the rebuild index.
	select @effectivedop=effective_max_dop 
	from sys.dm_resource_governor_workload_groups
	where group_id in 
			(	
				select group_id 
				from sys.dm_exec_requests 
				where session_id = @@spid
			)


	--— Get the Alter Index Statements.
	update a
	set    a.[command]
			= 'ALTER INDEX '
			 + QuoteName(a.IndexName) 
			 + ' ON ' + QUOTENAME(a.SchemaName) 
			 + '.' + QuoteName(a.TableName) 
			 + ' REBUILD ' 
			 +  Case
					when maxpartition_number = 1 THEN ' '
					else ' PARTITION = ' + cast(partition_number as varchar(10))
				End

			+ ' WITH (MAXDOP ='  
			+ cast(
					(
						Case  

							WHEN (TotalRows*1.0/1048576) < 1.0 
								THEN 1 

							WHEN (TotalRows*1.0/1048576) < @effectivedop THEN FLOOR(TotalRows*1.0/1048576) ELSE 0 END ) as varchar(10)) + ')' --as Command + ' -- comment ' + ' deletedRow% is ' + cast(DeletedRowsPercent as varchar(30)) + ' rowgroupQualityNumberofRows is ' + cast(AvgRowsPerRG as varchar(30)) , a.[identified] = 1 from #temp a where ( ( DeletedRowsPercent >= @DeletedRowsPercent)

			-- Rowgroup Quality trigger, percentage less than rowgroup quality as long as dictionary is not full
			OR 
				( 

					( 
						( 
								--( AvgRowsPerRG < @RGQuality )
								( AvgRowsPerRG < @rowgroupQualityNumberofRows) --and ( TotalRows > @RGQuality) 
							and ( TotalRows > @rowgroupQualityNumberofRows) 
							 
						)
						AND 
						(

							(
								PercentageRGLessThanQualityMeasure 
									>= @PercentageRGQualityPassed
							)

						)
					)

					---- DictionaryNotFull, lower threshold than 16MB.
					AND maxdictionarysize < ( 16*1000*1000)

				) 

		)

	-- Add logic to actually run those statements

	set @iRecordumber = 1
	set @iNumberofRecords = ( select max([rowNumber]) from #temp)

	while (@iRecordumber<= @iNumberofRecords) begin select @command = [command] from #temp where [rowNumber] = 1 and [identified] = 1 if (@command is not null) begin print @command if (@scriptOnly = 0) begin exec(@command) end end set @iRecordumber = @iRecordumber + 1 end -- Debug print if needed if @debug=1 begin Select * from #temp order by TableName , index_id , partition_number end else begin -- Deleted rows trigger Select * from #temp where ( DeletedRowsPercent >= @DeletedRowsPercent)

	end
end
go

exec sys.sp_MS_marksystemobject '[dbo].[sp_DenzilRibeiro_columnStoreIndexDefrag]'
go

 

Lab


use [WideWorldImportersDW]
go

declare  @debug bit
declare  @deletedRowsPercent int
declare  @rowgroupQualityNumberofRows int


set @debug  = 1
set @deletedRowsPercent = 10
set @rowgroupQualityNumberofRows = 500000

exec [dbo].[sp_DenzilRibeiro_columnStoreIndexDefrag]

   	  @debug = @debug
	, @deletedRowsPercent = @deletedRowsPercent
	, @rowgroupQualityNumberofRows = @rowgroupQualityNumberofRows


Output

Output #1

Source Code Control

Github

Placed in Github here

 

Summary

Nothing new here.

The whole point is to avail on Github so that it will be easier to digest and learn from.

 

References

  1. Denzil Ribeiro
    • SQL 2014 Clustered Columnstore index rebuild and maintenance considerations
      Link
  2. Niko Neugebauer
    • Clustered Columnstore Indexes – part 36 (“Maintenance Solutions for Columnstore”)
      Link
  3. Greg Low
    • Rebuild clustered columnstore indexes when they require maintenance
      Link

Sql Server – ColumnStore Indexes – Metadata – Encoding Type & Sizes

Background

Another post on metadata surrounding Column Store Indexes.

Lineage

  1. Sql Server – ColumnStore Indexes – Metadata – Stats & Fragmentation
    Published On :- 2017-Sept-2nd
    Link

 

Encoding

Tabulate

ID Type Description Data Type
1 VALUE_BASED non-string/binary with no dictionary (very similar to 4 with some internal variations) Non String
2 VALUE_HASH_BASED non-string/binary column with common values in dictionary Non String
3 STRING_HASH_BASED string/binary column with common values in dictionary String/Binary
4 STORE_BY_VALUE_BASED non-string/binary with no dictionary Non String/Binary
5 STRING_STORE_BY_VALUE_BASED string/binary with no dictionary String/Binary

 

Explanation

  1. Data types
    • Basically, we have two types String/Binary or non-string/non-binary
  2. Storage
    • Value
    • Hash Value

Metadata

Overview

  1. Column Store – Segment
    • sys.column_store_segments
      • Each Column is kept in each own segment.
      • And, encoded on scheme based on its data types and data richness
  2. Column Store – Row Groups
    • sys.column_store_row_groups
      • Column Indexes are vertically packaged into row groups
      • Each Row Group contains about a million records

 

Code

Column Segment

sys.column_store_segments

SQL

; with cteEncodingType
(
	  [encodingTypeID]
	, [encodingType]
	, [encodingDescription]
)
as
(

	--Type of encoding used for that segment:
	select 
			  [id] = 1
			, [encodingType] = 'VALUE_BASED'
			, [encodingDescription] = 'non-string/binary with no dictionary (very similar to 4 with some internal variations)'
	
	union
	
	select 
			  2
			, 'VALUE_HASH_BASED'
			, 'non-string/binary column with common values in dictionary'
	
	union

	select 
			  3
			, 'STRING_HASH_BASED'
			, 'string/binary column with common values in dictionary'
	
	union
	
	select 
			  4
			, 'STORE_BY_VALUE_BASED'
			, 'non-string/binary with no dictionary'
	union

	select 
			  5
			, 'STRING_STORE_BY_VALUE_BASED'
			, 'string/binary with no dictionary'

)
select 
		[object]
			=   object_schema_name(tblSP.object_id)
				+ '.'
				+ object_name(tblSP.object_id)

		, tblCSRG.partition_number

		, [totalRows]
			= tblCSRG.total_rows

		, [deletedRows]
			= tblCSRG.deleted_rows

		, [sizeInBytes]
			= tblCSRG.size_in_bytes

		, [sizeInKB]
			= tblCSRG.size_in_bytes/1024

from   sys.column_store_row_groups tblCSRG

inner join sys.partitions tblSP

	on	tblCSRG.[object_id] = tblSP.[object_id]
	and tblCSRG.[partition_number] = tblSP.[partition_number]



 

Image

 

Explanation
  1. Based on Column Type ( data type ) and data variance, one of 5 encoding types is chosen

 

Column Segment

sys.column_store_segments

SQL

use [WideWorldImportersDW]
go

with cteRowGroup
(
		  [object]
		, [indexName]
		, [type]
		, [partitionNumber]
		, [rowGroupID]
		, [rowGroupIDMax]
		, [stateDescription]
		, [totalRows]
		, [deletedRows]
		, [sizeInBytes]
		, [sizeInKB]

)
as
(
	select 
		[object]
			=   object_schema_name(tblCSRG.object_id)
				+ '.'
				+ object_name(tblCSRG.object_id)

		, [indexName]
			= tblSI.[name]
		
		, [type]
			= tblSI.[type_desc]
						
		, [partitionNumber]
			= tblCSRG.partition_number

		, [rowGroupID]
			= tblCSRG.row_group_id

		, [rowGroupIDMax]
			= max ( tblCSRG.row_group_id )
				over(
					partition by
							  tblCSRG.object_id
							, tblCSRG.index_id
				  )

		, [stateDescription]
			= tblCSRG.state_description

		, [totalRows]
			= tblCSRG.total_rows

		, [deletedRows]
			= tblCSRG.deleted_rows

		, [sizeInBytes]
			= tblCSRG.size_in_bytes

		, [sizeInKB]
			= tblCSRG.size_in_bytes/1024

from   sys.column_store_row_groups tblCSRG


inner join sys.partitions tblSP

	on	tblCSRG.[object_id] = tblSP.[object_id]
	and tblCSRG.[index_id] = tblSP.[index_id]
	and tblCSRG.[partition_number] = tblSP.[partition_number]

inner join sys.indexes tblSI
	on  tblCSRG.object_id = tblSI.object_id
	and tblCSRG.index_id = tblSI.index_id

)
select 
		  [object]
		, [indexName]
		, [type]
		, [partitionNumber]
		, [rowGroupID]
		, [rowGroupCurrent]
			 = case 
					when ([rowGroupID] = rowGroupIDMax) then 'Yes'
					else 'No'
				end
		, [stateDescription]
		, [totalRows]
		, [deletedRows]
		, [sizeInBytes]
		, [sizeInKB]


from   cteRowGroup cteRG

order by
	  [object]
	, [indexName]
	, [partitionNumber]
	, [rowGroupID]

Image

 

Explanation
  1. Grouping
    • Column Indexes are grouped into Row Groups of about a million records each ( 1024 * 1024 = 1048576)
    • As a  million records are packaged into a Row Group it is filed away and a new Row Group started
    • The current Group will be the Max (Row Group ID) for that partition ( Object ID / Index ID / Partition ID )

Summary

We can see that depending on our data type and the variance of the data set, we will have access to specific sets of encoding.

Also, Columnar Indexes enjoy both the benefits of Column and Vertical partitioning.

 

References

  1. Microsoft
    • Technet
      • Database Engine Features and Tasks > Database Features > Indexes (Database Engine)
        • Column Store Indexes
          Link
    • Docs / SQL / Relational databases / System catalog views/
      • sys.column_store_row_groups (Transact-SQL)
        Link
      • sys.column_store_segments (Transact-SQL)
        Link
  2. Patrick Keisler
    • Everyday SQL
      • Monitor the Number of Deleted Rows in a Clustered Columnstore Index
        Link
  3. Greg Low
    • Greg Low (The Bit Bucket: IDisposable)
      • Rebuild clustered columnstore indexes when they require maintenance
        Link
  4. SqlTimes
    • Playing with Database Servers
      • What is a hobt-id?
        Link

Wide World Importers DW – fact.Sale – Query – Top N Sales for each year

Background

Let us start issuing queries against the Data Warehouse DB, WideWorldImportersDW, that we are in the process of actualizing.

 

Lineage

  1. World Wide Importers – Using On SQL Server 2014
    Published On :- 2017-August-23rd
    Link
  2. WideWorldImportersDW – Fattening Up – fact.Sale
    Published On :- 2017-August-31sth
    Link

 

Sample Query

Scenario

The first query is quite popular as it will be return the “top N customers for each year“.  The column that will cast the deciding vote is profit.

 

Choices

Here are the various query patterns that we will try out:

 

Approach Group By Implication
Windowing Function InvoiceDateKeyYear, Customer Key Grouping by actual column
Correlated Join / Max datepart(year, tblFS.[Invoice Date Key]), Customer Key Function/ Computed Column
Correlated Join / Max InvoiceDateKeyYear, Customer Key Grouping by actual column

Queries

Query – Windowing Function / group by Actual Columns

SQL


set XACT_ABORT on
go

set nocount on
go

set statistics io on;
go

dbcc dropcleanbuffers with no_infomsgs
go

use [WideWorldImportersDW]
go

declare @dateStart1 datetime
declare @dateEnd1 datetime

declare @dateStart2 datetime
declare @dateEnd2 datetime

declare @dateStart3 datetime
declare @dateEnd3 datetime

declare @topN int
declare @divider char(1)

declare @tblYear TABLE
(
	  [id] int not null identity(1,1)
	, [year] int not null

	, primary key
	(
		[year]
	)
)

insert into @tblYear
([year])
select 2012
union
select 2013
union
select 2014
union
select 2015
union
select 2016

set @topN = 2
set @divider = '='

print replicate(@divider, 120)

set @dateStart1 = getdate()

; with cteYearCustomer --[Fact].[SaleLargeRowStore]
(
	  [InvoiceDateKeyYear]
	, [CustomerKey]
	, [profit]
	, [rankID]
)
as
(
	select
			  --datepart(year, tblFS.[Invoice Date Key])
			  tblFS.[InvoiceDateKeyYear]
			, [CustomerKey] = tblFS.[Customer Key]
			, [profit] = tblFS.[profit]
			, [rankID] = rank()
							over
								(
									PARTITION by
										  tblFS.[InvoiceDateKeyYear]
										, tblFS.[Customer Key]

									order by
										tblFS.[profit] desc
								)

	from   [Fact].[SaleLargeRowStore] tblFS 

)

select
		  cteYC.[InvoiceDateKeyYear]
	    , cteYC.[CustomerKey]
		, cteYC.[profit]

from   @tblYear cteY

cross apply
		(
		    select distinct top (@topN) with ties *

			from   cteYearCustomer tblFS

			where  cteY.[year] = tblFS.[InvoiceDateKeyYear]

			and    tblFS.[rankID] = 1

			order by
					tblFS.[profit] desc

		) cteYC

order by
		  cteYC.[InvoiceDateKeyYear]
		, cteYC.[profit] desc
		, cteYC.[CustomerKey]

set @dateEnd1 = getdate()

print replicate(@divider, 120)

set @dateStart2 = getdate()

; with cteYearCustomer -- [Fact].[SaleLargeColumnStoreClustered]
(
	  [InvoiceDateKeyYear]
	, [CustomerKey]
	, [profit]
	, [rankID]
)
as
(
	select
			  --datepart(year, tblFS.[Invoice Date Key])
			  tblFS.[InvoiceDateKeyYear]
			, [CustomerKey] = tblFS.[Customer Key]
			, [profit] = tblFS.[profit]
			, [rankID] = rank()
							over
								(
									PARTITION by
										  tblFS.[InvoiceDateKeyYear]
										, tblFS.[Customer Key]

									order by
										tblFS.[profit] desc
								)

	from  [Fact].[SaleLargeColumnStoreClustered] tblFS

)

select
		  cteYC.[InvoiceDateKeyYear]
	    , cteYC.[CustomerKey]
		, cteYC.[profit]

from   @tblYear cteY

cross apply
		(

		    select distinct top (@topN) with ties *

			from   cteYearCustomer tblFS

			where  cteY.[year] = tblFS.InvoiceDateKeyYear

			and    tblFS.[rankID] = 1

			order by tblFS.[profit]

		) cteYC

order by
		  cteYC.[InvoiceDateKeyYear]
		, cteYC.[profit] desc
		, cteYC.[CustomerKey]

set @dateEnd2 = getdate()

print replicate(@divider, 120)

set @dateStart3 = getdate()

; with cteYearCustomer -- [Fact].[SaleLargeColumnStoreNonClustered]
(
	  [InvoiceDateKeyYear]
	, [CustomerKey]
	, [profit]
	, [rankID]
)
as
(
	select
			  --datepart(year, tblFS.[Invoice Date Key])
			  tblFS.[InvoiceDateKeyYear]
			, [CustomerKey] = tblFS.[Customer Key]
			, [profit] = tblFS.[profit]
			, [rankID] = rank()
							over
								(
									PARTITION by
										  tblFS.[InvoiceDateKeyYear]
										, tblFS.[Customer Key]

									order by
										tblFS.[profit] desc
								)

	from  [Fact].[SaleLargeColumnStoreNonClustered] tblFS

)

select
		  cteYC.[InvoiceDateKeyYear]
	    , cteYC.[CustomerKey]
		, cteYC.[profit]

from   @tblYear cteY

cross apply
		(

		    select distinct top (@topN) with ties *

			from   cteYearCustomer tblFS

			where  cteY.[year] = tblFS.InvoiceDateKeyYear

			and    tblFS.[rankID] = 1

			order by tblFS.[profit]

		) cteYC

order by
		  cteYC.[InvoiceDateKeyYear]
		, cteYC.[profit] desc
		, cteYC.[CustomerKey]

set @dateEnd3 = getdate()

print replicate(@divider, 120)

declare @datediffSum  int

set @datediffSum =
					  datediff(second, @dateStart1, @dateEnd1)
					+ datediff(second, @dateStart2, @dateEnd2)
					+ datediff(second, @dateStart3, @dateEnd3)

select 

		  [sourceID] = [sourceID]
		,  = 
		, [dateStart] = @dateStart1
		, [dateEnd] = @dateEnd1
		, [duration] =  [duration]
		, [%] = cast([%] as decimal(6, 2))

from   (

	select
			  [sourceID] = 1
			,  = '[Fact].[SaleLargeRowStore]-Index Chosen By Engine'
			, [dateStart1] = @dateStart1
			, [dateEnd1] = @dateEnd1
			, [duration] = datediff(second, @dateStart1, @dateEnd1)
			, [%] = (datediff(second, @dateStart1, @dateEnd1)) * 100.00 / @datediffSum

	union

	select
			  [sourceID] = 2
			,  = '[Fact].[SaleLargeColumnStoreClustered]'
			, [dateStart2] = @dateStart2
			, [dateEnd2] = @dateEnd2
			, [duration2] = datediff(second, @dateStart2, @dateEnd2)
			, [%] = (datediff(second, @dateStart2, @dateEnd2)) * 100.00 / @datediffSum

	union

	select
			  [sourceID] = 3
			,  = '[Fact].[SaleLargeColumnStoreNonClustered]'
			, [dateStart3] = @dateStart3
			, [dateEnd3] = @dateEnd3
			, [duration3] = datediff(second, @dateStart3, @dateEnd3)
			, [%] = (datediff(second, @dateStart3, @dateEnd3)) * 100.00 / @datediffSum
	) tblA

Query Plan

Query Plan – Individual Queries Plan
Query Plan – RowStore Clustered Index

Query Plan – RowStore Clustered Index – Clustered Index Scan ( Clustered )

Query Operator – Properties

Explanation

  1. Actual Number of Rows :- 61,140,035 ( 61 million )
  2. Description :- Scanning a clustered index, entirely or only a range.
    • Table Scan
  3. Cost
    • Estimated CPU Cost :- 13.451
    • Estimated IO Cost :- 254.633
    • Estimated Operator Cost :- 268.084
      • We can see most of cost is IO
    • Estimated Number of Executions :-1
      • The controller is a year table variable
      • For Table variable, # of records is assumed to 1
  4. Number of Reads
    • Number :-
      • Estimated Number of Rows :- 12,228,000
      • Actual Number of Rows :- 61,140,000

 

Query Plan – RowStore Clustered Index – Warning – “Operator used tempdb to spill data
Query Plan – ColumnStore Clustered Index

Query Plan – Column Store Clustered Index – Clustered Index Scan ( Clustered )

Query Operator – Properties

Query Operator – Explanation
  1. Actual Number of Rows :- 61140025 ( 61 million )
  2. Description :- Scanning a clustered index, entirely or only a range.
    • Table Scan
  3. Cost
    • Estimated CPU Cost :- 13.451
    • Estimated IO Cost :- 3.63868
    • Estimated Operator Cost :- 17.0896
      • IO Cost is 25% of CPU Cost
    • Estimated Number of Executions :-1
      • The controller is a year table variable
      • For Table variable, # of records is assumed to 1
  4. Number of Reads
    • Number :-
      • Estimated Number of Rows :- 12228000
      • Actual Number of Rows :- 61140025

 

 

Query Plan – ColumnStore Non-Clustered Index

Query Plan – Column Store Clustered Index – NonClustered Index Scan ( Clustered )

Query Operator – Properties

Explanation

  1. Actual Number of Rows :- 61140025 ( 6.1 million )
  2. Description :- Scan rows from a table.
    • Table Scan
  3. Cost
    • Estimated CPU Cost :- 13.451
    • Estimated IO Cost :- 254.747
    • Estimated Operator Cost :- 268.198
      • We can see most of cost is IO
      • IO if 95% of Total
    • Estimated Number of Executions :-1
      • The controller is a year table variable
      • For Table variable, # of records is assumed to 1
  4. Number of Reads
    • Number :-
      • Estimated Number of Rows :- 12228000
      • Actual Number of Rows :- 61140025

 

Query Plan – ReadOut
  1. Operator :- Sort
    • On all of approaches the Sort Operator is the most expensive
    • Row Store Index
      • Operator used tempdb to spill data during execution with spill level 1 and 1 spilled thread(s), Sort wrote 476430 pages to and read 476430 pages from tempdb with granted memory 4845520KB and used memory 4845520 KB
      • High granted and used Memory
        • 4845520 KB
          • 4845.52 MB
          • 4.84552 GB
  2. Operator :- Sequence Project
    • Norm of Windowing Functions

 

Statistics I/O

Image

 

Tabulated

 

Table Type Table / Target IO Stats
Row Store – Clustered Index
SaleLargeRowStore Scan count 5, logical reads 1722395, physical reads 2, read-ahead reads 344480, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Worktable Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 158810, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Workfile Table ‘Workfile’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table Variable – Year Table ‘#A3F66ED9’. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Column Store – Clustered Index
SaleLargeColumnStoreClustered Table ‘SaleLargeColumnStoreClustered’. Scan count 5, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 10349, lob physical reads 0, lob read-ahead reads 0.

Table ‘SaleLargeColumnStoreClustered’. Segment reads 70, segment skipped 0.

Worktable Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table Variable – Year Table ‘#A3F66ED9’. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Column Store – Non Clustered Index
SaleLargeColumnStoreNonClustered Table ‘SaleLargeColumnStoreNonClustered’. Scan count 5, logical reads 1719525, physical reads 0, read-ahead reads 343604, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Worktable Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table Variable – Year Table ‘#A3F66ED9’. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Explanation

  1. The Row Store Clustered Index & the Heap Column Store Index have very high I/O
  2. On the other hand, the Column Store – Clustered Index, have much lower I/O

 

Timing

Image

Explanation
  1. Duration
    • Percentile
      • The Column Stored Clustered Index is a bit better at 27%
      • While the Row Store Clustered Index & Column Store Heap are at 35 and 37%
    • Actual Time – Individual Queries
      • Row Store – Clustered Index
        • 433 seconds or 7.21 minutes
      • Column Store – Clustered Index
        • 330 seconds or 5.5 minutes
      • Column Stored – Non Clustered Index
        • 408 seconds or 6.8 minutes
    • Actual Time – All Queries
      • 19.09 minutes

 

Correlated Join / Max – group By Function [ datepart(year) ]

SQL


dbcc dropcleanbuffers with no_infomsgs
go

set nocount on
go

set statistics io on;
go

use [WideWorldImportersDW]
go

declare @dateStart1 datetime
declare @dateEnd1 datetime

declare @dateStart2 datetime
declare @dateEnd2 datetime

declare @dateStart3 datetime
declare @dateEnd3 datetime

declare @datediffSum  int

declare @topN int

declare @tblYear TABLE
(
	  [id] int not null identity(1,1)
	, [year] int not null

	, primary key
	(
		[year]
	)
)

insert into @tblYear
([year])
select 2012
union
select 2013
union
select 2014
union
select 2015
union
select 2016

set @topN = 2

print replicate('*', 120)

set @dateStart1 = getdate()

; with cteYearCustomer
(
	  [InvoiceDateKeyYear]
	, [CustomerKey]
	, [profit]
)
as
(
	select
			  datepart(year, tblFS.[Invoice Date Key])
			 -- tblFS.[InvoiceDateKeyYear]
			, [CustomerKey] = tblFS.[Customer Key]
			, [profit] = max(tblFS.[profit])

	from   [Fact].[SaleLargeRowStore] tblFS --with ( INDEX =1)

	group by
			  datepart(year, tblFS.[Invoice Date Key])
			  --tblFS.[InvoiceDateKeyYear]
			, tblFS.[Customer Key]

)

select
		  cteYC.[InvoiceDateKeyYear]
	    , cteYC.[CustomerKey]
		, cteYC.[profit]

from   @tblYear cteY

cross apply
		(
		    select top (@topN) with ties *

			from   cteYearCustomer tblFS

			where  cteY.[year] = tblFS.[InvoiceDateKeyYear]

			order by tblFS.[profit]

		) cteYC

order by
		  cteYC.[InvoiceDateKeyYear]
		, cteYC.[profit] desc
		, cteYC.[CustomerKey]

set @dateEnd1 = getdate()

print replicate('*', 120)

set @dateStart2 = getdate()

; with cteYearCustomer
(
	  [InvoiceDateKeyYear]
	, [CustomerKey]
	, [profit]
)
as
(
	select
			 datepart(year, tblFS.[Invoice Date Key])
			 -- tblFS.[InvoiceDateKeyYear]
			, [CustomerKey] = tblFS.[Customer Key]
			, [profit] = max(tblFS.[profit])

	from  [Fact].[SaleLargeColumnStoreClustered] tblFS

	group by
			  datepart(year, tblFS.[Invoice Date Key])
			  --tblFS.[InvoiceDateKeyYear]
			, tblFS.[Customer Key]

)

select
		  cteYC.[InvoiceDateKeyYear]
	    , cteYC.[CustomerKey]
		, cteYC.[profit]

from   @tblYear cteY

cross apply
		(
		    select top (@topN) with ties *

			from   cteYearCustomer tblFS

			where  cteY.[year] = tblFS.InvoiceDateKeyYear

			order by tblFS.[profit]

		) cteYC

order by
		  cteYC.[InvoiceDateKeyYear]
		, cteYC.[profit] desc
		, cteYC.[CustomerKey]

set @dateEnd2 = getdate()

print replicate('*', 120)

set @dateStart3 = getdate()

; with cteYearCustomer
(
	  [InvoiceDateKeyYear]
	, [CustomerKey]
	, [profit]
)
as
(
	select
			  datepart(year, tblFS.[Invoice Date Key])
			  --tblFS.[InvoiceDateKeyYear]
			, [CustomerKey] = tblFS.[Customer Key]
			, [profit] = max(tblFS.[profit])

	from  [Fact].[SaleLargeColumnStoreNonClustered] tblFS

	group by
			  datepart(year, tblFS.[Invoice Date Key])
			 -- tblFS.[InvoiceDateKeyYear]
			, tblFS.[Customer Key]
)

select
		  cteYC.[InvoiceDateKeyYear]
	    , cteYC.[CustomerKey]
		, cteYC.[profit]

from   @tblYear cteY

cross apply
		(
		    select top (@topN) with ties *

			from   cteYearCustomer tblFS

			where  cteY.[year] = tblFS.InvoiceDateKeyYear

			order by tblFS.[profit]

		) cteYC

order by
		  cteYC.[InvoiceDateKeyYear]
		, cteYC.[profit] desc
		, cteYC.[CustomerKey]

set @dateEnd3 = getdate()

print replicate('*', 120)

set @datediffSum =
			      datediff(second, @dateStart1, @dateEnd1)
				+ datediff(second, @dateStart2, @dateEnd2)
				+ datediff(second, @dateStart3, @dateEnd3)

select 

		  [sourceID] = [sourceID]
		,  = 
		, [dateStart] = dateStart1
		, [dateEnd] = dateEnd1
		, [duration] =  [duration]
		, [%] = cast([%] as decimal(6, 2))

from
	(

		select
				  [sourceID] = 1
				,  = '[Fact].[SaleLargeRowStore]'
				, [dateStart1] = @dateStart1
				, [dateEnd1] = @dateEnd1
				, [duration] = datediff(second, @dateStart1, @dateEnd1)
				, [%] = (datediff(second, @dateStart1, @dateEnd1)) * 100.00 / @datediffSum

		union

		select
				  [sourceID] = 2
				,  = '[Fact].[SaleLargeColumnStoreClustered]'
				, [dateStart2] = @dateStart2
				, [dateEnd2] = @dateEnd2
				, [duration2] = datediff(second, @dateStart2, @dateEnd2)
				, [%] = (datediff(second, @dateStart2, @dateEnd2)) * 100.00 / @datediffSum

		union

		select
				  [sourceID] = 3
				,  = '[Fact].[SaleLargeColumnStoreNonClustered]'
				, [dateStart3] = @dateStart3
				, [dateEnd3] = @dateEnd3
				, [duration3] = datediff(second, @dateStart3, @dateEnd3)
				, [%] = (datediff(second, @dateStart3, @dateEnd3)) * 100.00 / @datediffSum

	) tblA

go

 

Query Plan

Query Plan – Individual Queries Plan
Query Plan – RowStore Clustered Index

Image

Explanation

  1. The Clustered Index Scan is the most expensive operation
  2. The Sort came in at 0

 

Query Plan – Column Store Clustered Index

Image

Explanation

  1. The Column Index Scan is at 49%
  2. The compute scalar is at 3%
    • Processing datepart(year, [InvoiceDate])
  3. The Filter ( for each year is at 16 %)
  4. And, the Hash match is 31%
  5. The Sort is only 0%

 

Operator – Compute Scaler

Image

Defined Values

Explanation

  1. [Expr1003] = Scalar Operator(datepart(year,[WideWorldImportersDW].[Fact].[SaleLargeColumnStoreClustered].[Invoice Date Key] as [tblFS].[Invoice Date Key]))
Operator – Filter

Image – Properties

Image – Properties – Filter By Predicate

Explanation

  1. [Expr1003] = Scalar Operator(datepart(year,[WideWorldImportersDW].[Fact].[SaleLargeColumnStoreClustered].[Invoice Date Key] as [tblFS].[Invoice Date Key]))

 

Query Plan – Column Store Non Clustered Index

Image

Explanation

  1. The Column Index Scan is at 94%
  2. The compute scalar is at 0%
    • Processing datepart(year, [InvoiceDate])
  3. No Filtering
    • It looks like we have just data rows that we need
  4. The Sort is only 0%

Statistics I/O

Image

Tabulated
Table Type Table / Target IO Stats

Row Store – Clustered Index
SaleLargeRowStore Table ‘SaleLargeRowStore’. Scan count 5, logical reads 1722395, physical reads 2, read-ahead reads 323184, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Worktable Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Workfile Table ‘Workfile’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table Variable – Year Table ‘#A7C6FFBD’. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Column Store – Clustered Index
SaleLargeColumnStoreClustered Table ‘SaleLargeColumnStoreClustered’. Scan count 5, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 13464, lob physical reads 0, lob read-ahead reads 0.

Table ‘SaleLargeColumnStoreClustered’. Segment reads 70, segment skipped 0.

Worktable Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table Variable – Year Table ‘#A7C6FFBD’. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Column Store – Non Clustered Index
SaleLargeColumnStoreNonClustered Table ‘SaleLargeColumnStoreNonClustered’. Scan count 5, logical reads 1719525, physical reads 0, read-ahead reads 342488, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Worktable Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table Variable – Year Table ‘#A3F66ED9’. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Explanation

  1. The Row Store Clustered Index & the Heap Column Store Index have very high I/O
    • They are table scans, one is searching through a clustered index and the other through a heap
  2. On the other hand, the Column Store – Clustered Index, has much lower I/O
    • Likely based on the higher compression that is used for Column Store Indexes

 

Timing

Image

Explanation
  1. Duration
    • Percentile
      • The Column Stored Clustered Index is a bit better at 30%
      • The Row Store Clustered Index is at 37.78%
      • And, the Column Stored Heap is at 32.22%
    • Actual Time – Individual Queries
      • Row Store – Clustered Index
        • 27 seconds
      • Column Store – Clustered Index
        • 29 seconds
      • Column Stored – Non Clustered Index
        • 34 seconds
    • Actual Time – All Queries
      • 1.36 minutes
      • or 96 seconds

 

Correlated Join / Max – group By Actual Column – year

SQL


use [WideWorldImportersDW]
go

dbcc dropcleanbuffers with no_infomsgs
go

set nocount on
go

set statistics io on;
go

declare @dateStart1 datetime
declare @dateEnd1 datetime

declare @dateStart2 datetime
declare @dateEnd2 datetime

declare @dateStart3 datetime
declare @dateEnd3 datetime

declare @topN int

declare @tblYear TABLE
(
	  [id] int not null identity(1,1)
	, [year] int not null

	, primary key
	(
		[year]
	)
)

insert into @tblYear
([year])
select 2012
union
select 2013
union
select 2014
union
select 2015
union
select 2016

set @topN = 2

print replicate('*', 120)

print replicate('*', 120)

set @dateStart1 = getdate()

; with cteYearCustomer
(
	  [InvoiceDateKeyYear]
	, [CustomerKey]
	, [profit]
)
as
(
	select
			  --datepart(year, tblFS.[Invoice Date Key])
			  tblFS.[InvoiceDateKeyYear]
			, [CustomerKey] = tblFS.[Customer Key]
			, [profit] = max(tblFS.[profit])

	from   [Fact].[SaleLargeRowStore] tblFS --with ( INDEX =1)

	group by
			  --datepart(year, tblFS.[Invoice Date Key])
			  tblFS.[InvoiceDateKeyYear]
			, tblFS.[Customer Key]

)

select
		  cteYC.[InvoiceDateKeyYear]
	    , cteYC.[CustomerKey]
		, cteYC.[profit]

from   @tblYear cteY

cross apply
		(
		    select top (@topN) with ties *

			from   cteYearCustomer tblFS

			where  cteY.[year] = tblFS.[InvoiceDateKeyYear]

			order by tblFS.[profit]

		) cteYC

order by
		  cteYC.[InvoiceDateKeyYear]
		, cteYC.[profit] desc
		, cteYC.[CustomerKey]

set @dateEnd1 = getdate()

print replicate('*', 120)

set @dateStart2 = getdate()

; with cteYearCustomer
(
	  [InvoiceDateKeyYear]
	, [CustomerKey]
	, [profit]
)
as
(
	select
			--  datepart(year, tblFS.[Invoice Date Key])
			  tblFS.[InvoiceDateKeyYear]
			, [CustomerKey] = tblFS.[Customer Key]
			, [profit] = max(tblFS.[profit])

	from  [Fact].[SaleLargeColumnStoreClustered] tblFS

	group by
			  --datepart(year, tblFS.[Invoice Date Key])
			  tblFS.[InvoiceDateKeyYear]
			, tblFS.[Customer Key]

)

select-- top 5 with ties
		  cteYC.[InvoiceDateKeyYear]
	    , cteYC.[CustomerKey]
		, cteYC.[profit]

--from   cteYear
from   @tblYear cteY

cross apply
		(
		    select top (@topN) with ties *

			from   cteYearCustomer tblFS

			where  cteY.[year] = tblFS.InvoiceDateKeyYear

			order by tblFS.[profit]

		) cteYC

order by
		  cteYC.[InvoiceDateKeyYear]
		, cteYC.[profit] desc
		, cteYC.[CustomerKey]

set @dateEnd2 = getdate()

print replicate('*', 120)

set @dateStart3 = getdate()

; with cteYearCustomer
(
	  [InvoiceDateKeyYear]
	, [CustomerKey]
	, [profit]
)
as
(
	select
			  --datepart(year, tblFS.[Invoice Date Key])
			  tblFS.[InvoiceDateKeyYear]
			, [CustomerKey] = tblFS.[Customer Key]
			, [profit] = max(tblFS.[profit])

	from  [Fact].[SaleLargeColumnStoreNonClustered] tblFS

	group by
			 -- datepart(year, tblFS.[Invoice Date Key])
			  tblFS.[InvoiceDateKeyYear]
			, tblFS.[Customer Key]
)

select-- top 5 with ties
		  cteYC.[InvoiceDateKeyYear]
	    , cteYC.[CustomerKey]
		, cteYC.[profit]

from   @tblYear cteY

cross apply
		(
		    select top (@topN) with ties *

			from   cteYearCustomer tblFS

			where  cteY.[year] = tblFS.InvoiceDateKeyYear

			order by tblFS.[profit]

		) cteYC

order by
		  cteYC.[InvoiceDateKeyYear]
		, cteYC.[profit] desc
		, cteYC.[CustomerKey]

set @dateEnd3 = getdate()

print replicate('*', 120)

declare @datediffSum  int

set @datediffSum =
					  datediff(second, @dateStart1, @dateEnd1)
					+ datediff(second, @dateStart2, @dateEnd2)
					+ datediff(second, @dateStart3, @dateEnd3)

select 

		  [sourceID] = [sourceID]
		,  = 
		, [dateStart] = dateStart1
		, [dateEnd] = dateEnd1
		, [duration] =  [duration]
		, [%] = cast([%] as decimal(6, 2))

from   (

	select
			  [sourceID] = 1
			,  = '[Fact].[SaleLargeRowStore]'
			, [dateStart1] = @dateStart1
			, [dateEnd1] = @dateEnd1
			, [duration] = datediff(second, @dateStart1, @dateEnd1)
			, [%] = (datediff(second, @dateStart1, @dateEnd1)) * 100.00 / @datediffSum

	union

	select
			  [sourceID] = 2
			,  = '[Fact].[SaleLargeColumnStoreClustered]'
			, [dateStart2] = @dateStart2
			, [dateEnd2] = @dateEnd2
			, [duration2] = datediff(second, @dateStart2, @dateEnd2)
			, [%] = (datediff(second, @dateStart2, @dateEnd2)) * 100.00 / @datediffSum

	union

	select
			  [sourceID] = 3
			,  = '[Fact].[SaleLargeColumnStoreNonClustered]'
			, [dateStart3] = @dateStart3
			, [dateEnd3] = @dateEnd3
			, [duration3] = datediff(second, @dateStart3, @dateEnd3)
			, [%] = (datediff(second, @dateStart3, @dateEnd3)) * 100.00 / @datediffSum
	) tblA

Query Plan

Query Plan – Individual Queries Plan
Query Plan – RowStore Clustered Index
Image

Explanation
  1. The Clustered Index Scan takes up 94%
  2. Sort is 0%
  3. We are told of the missing Index
Image – Operator – Clustered Index Scan
  1. Description :- Scanning a clustered index, entirely or only a range.
    • Table Scan
  2. Cost
    • Estimated CPU Cost :- 13.451
    • Estimated IO Cost :- 254.633
    • Estimated Operator Cost :- 268.084
      • We can see most of cost is IO
  3. Number of Executions
    • Estimated Number of Executions :-1
      • The controller is a year table variable
      • For Table variable, # of records is assumed to 1
    • Actual Number of Executions :- 5
      • Number of records in Table variables ( year )
  4. Number of Reads
    • Estimated Number of Rows :-
      • 12228000 ( 12 million )
    • Number of Rows Reads :-
      • 61140025 ( 61 million )
        • Number of Record in Table :- 12 million
        • Number of batches :- 5
Image – Operator – Hash Matched – Defined Values

Image – Hash Matched – Defined Values

Explanation
  1. Grouping by InvoiceDateKeyYear

 

Image – Missing Index

Explanation
  1. We are told, actually warned, of a missing index
  2. The Index definition is “[Fact].[SaleLargeRowStore] ([InvoiceDateKeyYear]) INCLUDE ([Customer Key],[Profit])
  3. And, so we are asked to look into having an index keyed on our grouping column ( InvoiceDateKeyYear )
  4. And, the other columns that we are referencing ( Customer Key and profit) included

 

Query Plan – RowStore Clustered Index – Clustered Index Scan ( Clustered )

Query Operator – Properties

Explanation

  1. Actual Number of Rows :- 61140025 ( 6.1 million )
  2. Description :- Scanning a clustered index, entirely or only a range.
    • Table Scan
  3. Cost
    • Estimated CPU Cost :- 13.451
    • Estimated IO Cost :- 254.633
    • Estimated Operator Cost :- 268.084
      • We can see most of our cost is IO
    • Estimated Number of Executions :-1
      • The controller is a year table variable
      • For Table variable, # of records is assumed to 1
  4. Number of Reads
    • Number :-
      • Estimated Number of Rows :- 12228000
      • Actual Number of Rows :- 61140025
    • Explanation :-
      • Estimate is just
      • Likely because of predicate push down we ended up reading only half the full number of records

 

Query Plan – RowStore Clustered Index – Warning – “Operator used tempdb to spill data
Query Plan – ColumnStore Clustered Index

Explanation
  1. The Clustered Index Scan takes up 46%
  2. Missing Index
    • We are told of the missing Index
    • Impact of missing index :- 40%
  3. Hash Match Aggregate Operator
    • Hash Match operator use in calculating max
  4. Filter Operator
    • Filter on Year
  5. Sort is 0%

 

Query Plan – Column Store Clustered Index – Clustered Index Scan ( Clustered )

Query Operator – Properties

Query Operator – Explanation
  1. Actual Number of Rows :- 61140025 ( 6.1 million )
  2. Description :- Scanning a clustered index, entirely or only a range.
    • Table Scan
  3. Cost
    • Estimated CPU Cost :- 13.451
    • Estimated IO Cost :- 3.63868
    • Estimated Operator Cost :- 17.0896
      • IO Cost is 25% of CPU Cost
    • Estimated Number of Executions :-1
      • The controller is a year table variable
      • For Table variable, # of records is assumed to 1
  4. Number of Reads
    • Number :-
      • Estimated Number of Rows :- 12228000
      • Actual Number of Rows :- 61140025
    • Explanation :-
      • Estimate is just
      • Likely because of predicate push down we ended up reading only half the full number of records

 

 

Query Plan – ColumnStore Non-Clustered Index

Image

Explanation
  1. The Clustered Index Scan takes up 94%
  2. Hash Match Aggregate Operator
    • takes up 6%
    • Hash Match operator use in calculating max
  3. Sort is 0%
Query Plan – Column Store Clustered Index – NonClustered Index Scan ( Clustered )

Query Operator – Properties – Columnstore Index Scan

Explanation

  1. Actual Execution Mode
    • Row
  2. Description
    • Scan a columnstore index, entirely or only a range.
  3. Cost
    • Estimated CPU Cost :- 13.451
    • Estimated IO Cost :- 3.63868
    • Estimated Operator Cost :- 17.0896
  4. Number of Executions
    • Estimated Number of Executions :-1
      • The controller is a year table variable
      • For Table variable, # of records is assumed to 1
  5. Number of Reads
    • Estimated Number of Rows :- 12228000 ( 12 million )
    • Actual Number of Rows :- 12228000 ( 12 million )
    • Number of Rows Read :- 61140025 ( 61 million )

Statistics I/O

Image

Tabulated
Table Type Table / Target IO Stats

Row Store – Clustered Index
SaleLargeRowStore Table ‘SaleLargeRowStore’. Scan count 5, logical reads 1722395, physical reads 2, read-ahead reads 344480, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Worktable Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Workfile Table ‘Workfile’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table Variable – Year Table ‘#A7C6FFBD’. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Column Store – Clustered Index
SaleLargeColumnStoreClustered Table ‘SaleLargeColumnStoreClustered’. Scan count 5, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 10349, lob physical reads 0, lob read-ahead reads 0.

Table ‘SaleLargeColumnStoreClustered’. Segment reads 70, segment skipped 0.

Worktable Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table Variable – Year Table ‘#A7C6FFBD’. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Column Store – Non Clustered Index
SaleLargeColumnStoreNonClustered Table ‘SaleLargeColumnStoreNonClustered’. Scan count 5, logical reads 1719525, physical reads 0, read-ahead reads 343604, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Worktable Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table Variable – Year Table ‘#A3F66ED9’. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Explanation

  1. The Row Store Clustered Index & the Heap Column Store Index have very high I/O
    • They are table scans, one is searching through a clustered index and the other through a heap
  2. On the other hand, the Column Store – Clustered Index, has much lower I/O
    • Likely based on the higher compression that is used for Column Store Indexes

 

Timing

Image

Explanation
  1. Duration
    • Percentile
      • The Column Stored Clustered Index is a bit better at 30.77%
      • The Row Store Clustered Index is at 35.38%
      • And, the Column Stored Heap is at 33.85%
    • Actual Time – Individual Queries
      • Row Store – Clustered Index
        • 23 seconds
      • Column Store – Clustered Index
        • 20 seconds
      • Column Stored – Non Clustered Index
        • 22 seconds
    • Actual Time – All Queries
      • 1.06 minutes
      • or 66 seconds

 

Source Code Control

GitHub

WideWorldImportersDWFactSale/Query/QueryMaxByYear.InvoiceDate/
Link

 

Methodology Group By Filename
Windowing Function Group by InvoiceDateKeyYear, CustomerKey queryMaxByYearCorrelatedJoin_GroupByActualColumn.sql
Correlated Join / Max Group by Datepart(year, InvoiceDateKey), CustomerKey queryMaxByYearCorrelatedJoin_GroupByFunctionDatePartYear.sql
Correlated Join / Max Group by InvoiceDateKeyYear, CustomerYear queryMaxByYearWindowsFunction_GroupByActualColumn.sql

 

References

  1. Windowing Functions
    • Dwain Camps
      • The Performance of the T-SQL Window Functions
        Link

 

Summary

Thankfully, there is a lot here.

We only looked at two ways of identifying the Max Record; and those two approaches are Window Functions and Correlated Max.

The Windowing Function seems to be more expensive.

The Column Store Clustered Index is the most performant.

Likely specific Row Store Indexes would have been better; but we did not take the time to create one.

Still have yet to find queries that will make us choose a Non-Clustered Column Index over a Clustered Column Index.

 

WideWorldImportersDW – Fattening Up – fact.Sale

Background

In an earlier post we spoke about restoring Microsoft’s new sample database, WideWorldImportersDW.

The end game is to have a database to try out Columnar Indexes.

Lineage

It is just one post so far, but let us claim space for additional journaling along the same thoughts:

  1. World Wide Importers – Using On SQL Server 2014
    Published On :- 2017-August-23rd
    Link

 

Metadata – Microsoft

Number of Records

Code


use [WideWorldImportersDW]
go

select
		
          [table]
            =  tblSS.[name]
                + '.'
                + tblSO.[name]
 
        , [indexName]
            = tblSI.[name]
                     
        , [indexType]
          = tblSI.[type_desc]
 
        , [numberofRecords]
            = sum(tblSP.[rows])
 
        , numberofPagesData
            = sum(tblSPS.in_row_data_page_count)
 
        , numberofPagesLOB
            = sum(tblSPS.lob_used_page_count)
 
        , usedPageCount
            = sum(used_page_count)
 
from   sys.schemas tblSS
 
inner join sys.objects tblSO
 
    on tblSS.schema_id = tblSO.schema_id
 
inner join sys.indexes tblSI
 
    on tblSO.object_id = tblSI.object_id
 
 
inner join sys.partitions tblSP
 
    on tblSI.object_id = tblSP.object_id
    and tblSI.index_id = tblSP.index_id
 
 
inner join sys.dm_db_partition_stats tblSPS
 
    on tblSP.object_id = tblSPS.object_id
    and tblSP.index_id = tblSPS.index_id
    and tblSP.partition_id = tblSPS.partition_id
 
 
where tblSI.index_id in (0,1) 

and   tblSS.[name] = 'Fact'
 
group by
          tblSS.[name]
        , tblSO.[name]
        , tblSI.index_id
        , tblSI.[name]
        , tblSI.[type_desc]
 
order by
          tblSS.[name]
        , tblSO.[name]
        , tblSI.index_id



Image

Explanation

Our three biggest Fact tables ( Fact.Movement, Fact.Order, Fact.Sale ) are about 2.3 million records.

 

Make Bigger

fact.Sale

We chose fact.Sale out of our largest Fact tables.

The reason been the presence of a Stored Procedure (SP), [Application].[Configuration_PopulateLargeSaleTable].

Here are the other tooling available SPs in the same schema, Application.

Objective

We will leave the original sales table, dbo.Fact, as is.

And, create new ones.

 

List of Tables

Here are the tables.

Usage Table
Sales Table – Original Fact.Sale
Sales Table – Row Store – Clustered Index Fact.SaleLargeRowStore
Sales Table – Column Store – Clustered Index Fact.SaleLargeColumnStoreClustered
Sales Table – Column Store – NonClustered Index Fact.SaleLargeColumnStoreNonClustered

Create Tables

Code

Fact.SaleLargeRowStore


SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

USE [WideWorldImportersDW]
GO

/*
    DROP TABLE [Fact].[SaleLargeRowStore]
*/

if object_id('[Fact].[SaleLargeRowStore]') is null
begin

    CREATE TABLE [Fact].[SaleLargeRowStore]
    (
        [Sale Key] [bigint] IDENTITY(1,1) NOT NULL,
        [City Key] [int] NOT NULL,
        [Customer Key] [int] NOT NULL,
        [Bill To Customer Key] [int] NOT NULL,
        [Stock Item Key] [int] NOT NULL,
        
        [Invoice Date Key] [date] NOT NULL,

        [InvoiceDateKeyYear] [int] NOT NULL,
        
        [Delivery Date Key] [date] NULL,
        
        [DeliveryDateKeyYear] [int] NULL,
        
        [Salesperson Key] [int] NOT NULL,
        [WWI Invoice ID] [int] NOT NULL,
        [Description] [nvarchar](100) NOT NULL,
        [Package] [nvarchar](50) NOT NULL,
        [Quantity] [int] NOT NULL,
        [Unit Price] [decimal](18, 2) NOT NULL,
        [Tax Rate] [decimal](18, 3) NOT NULL,
        [Total Excluding Tax] [decimal](18, 2) NOT NULL,
        [Tax Amount] [decimal](18, 2) NOT NULL,
        [Profit] [decimal](18, 2) NOT NULL,
        [Total Including Tax] [decimal](18, 2) NOT NULL,
        [Total Dry Items] [int] NOT NULL,
        [Total Chiller Items] [int] NOT NULL,
        [Lineage Key] [int] NOT NULL
    
            , CONSTRAINT [PK_Fact_SaleLargeRowStore] PRIMARY KEY CLUSTERED 
            (
                [Sale Key] ASC
            )
            WITH 
            (
                  PAD_INDEX = OFF
                , STATISTICS_NORECOMPUTE = OFF
                , IGNORE_DUP_KEY = OFF
                , ALLOW_ROW_LOCKS = ON
                , ALLOW_PAGE_LOCKS = ON
            ) ON [USERDATA]

        ) ON [USERDATA]

end
go

Fact.SaleLargeColumnStoreClustered

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

USE [WideWorldImportersDW]
GO

if object_id('[Fact].[SaleLargeColumnStoreClustered]') is null
begin

	CREATE TABLE [Fact].[SaleLargeColumnStoreClustered]
	(
		[Sale Key] [bigint] IDENTITY(1,1) NOT NULL,
		[City Key] [int] NOT NULL,
		[Customer Key] [int] NOT NULL,
		[Bill To Customer Key] [int] NOT NULL,
		[Stock Item Key] [int] NOT NULL,
	
		[Invoice Date Key] [date] NOT NULL,
		[InvoiceDateKeyYear] [int] NOT NULL,
			
		[Delivery Date Key] [date] NULL,
		[DeliveryDateKeyYear] [int] NULL,
			
		[Salesperson Key] [int] NOT NULL,
		[WWI Invoice ID] [int] NOT NULL,
		[Description] [nvarchar](100) NOT NULL,
		[Package] [nvarchar](50) NOT NULL,
		[Quantity] [int] NOT NULL,
		[Unit Price] [decimal](18, 2) NOT NULL,
		[Tax Rate] [decimal](18, 3) NOT NULL,
		[Total Excluding Tax] [decimal](18, 2) NOT NULL,
		[Tax Amount] [decimal](18, 2) NOT NULL,
		[Profit] [decimal](18, 2) NOT NULL,
		[Total Including Tax] [decimal](18, 2) NOT NULL,
		[Total Dry Items] [int] NOT NULL,
		[Total Chiller Items] [int] NOT NULL,
		[Lineage Key] [int] NOT NULL
	
			

		) ON [USERDATA]

		CREATE CLUSTERED COLUMNSTORE INDEX [INDX_FACT_SaleLargeColumnStoreClustered]
			ON [Fact].[SaleLargeColumnStoreClustered]
			ON [USERDATA]
			;  
	)

end
go
	
Fact.SaleLargeColumnStoreNonClustered

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO


USE [WideWorldImportersDW]
GO

if object_id('[Fact].[SaleLargeColumnStoreNonClustered]') is null
begin

	CREATE TABLE [Fact].[SaleLargeColumnStoreNonClustered]
	(
		[Sale Key] [bigint] IDENTITY(1,1) NOT NULL,
		[City Key] [int] NOT NULL,
		[Customer Key] [int] NOT NULL,
		[Bill To Customer Key] [int] NOT NULL,
		[Stock Item Key] [int] NOT NULL,

		[Invoice Date Key] [date] NOT NULL,
		[InvoiceDateKeyYear] [int] NOT NULL,

		[Delivery Date Key] [date] NULL,
		[DeliveryDateKeyYear] [int] NULL,
				
		[Salesperson Key] [int] NOT NULL,
		[WWI Invoice ID] [int] NOT NULL,
		[Description] [nvarchar](100) NOT NULL,
		[Package] [nvarchar](50) NOT NULL,
		[Quantity] [int] NOT NULL,
		[Unit Price] [decimal](18, 2) NOT NULL,
		[Tax Rate] [decimal](18, 3) NOT NULL,
		[Total Excluding Tax] [decimal](18, 2) NOT NULL,
		[Tax Amount] [decimal](18, 2) NOT NULL,
		[Profit] [decimal](18, 2) NOT NULL,
		[Total Including Tax] [decimal](18, 2) NOT NULL,
		[Total Dry Items] [int] NOT NULL,
		[Total Chiller Items] [int] NOT NULL,
		[Lineage Key] [int] NOT NULL
	
			

		) ON [USERDATA]

end

declare @tableName sysname
declare @objectID  int

set @tableName = '[Fact].[SaleLargeColumnStoreNonClustered]'
set @objectID = object_id(@tableName)


/*
	Msg 35339, Level 16, State 1, Line 162
	Multiple nonclustered columnstore indexes are not supported.
*/
if @objectID is not null
begin

	if not exists
	(
		select *
		from   sys.indexes tblSI
		where  tblSI.object_id = @objectID
		and    tblSI.[name] = 'INDX_InvoiceDateKeyYear_DeliveryDateKeyYear'

	)
	begin

		CREATE NONCLUSTERED COLUMNSTORE INDEX [INDX_InvoiceDateKeyYear_DeliveryDateKeyYear]
			ON [Fact].[SaleLargeColumnStoreNonClustered]
			(
				  [InvoiceDateKeyYear]
				, [DeliveryDateKeyYear]
			)
			ON [USERDATA]
			;  

	end


end
go


Populate Tables

Approach

We take a uniform approach in populating each of our tables.

And, that approach is to:

  1. Decide on our target year
    • In our case always 2012
  2. Does Index need to be disabled to be able to manage data?
    • If Clustered Row Store or Clustered Column Store, then  “No, Indexes does not need to be frozen
    • If Non-Clustered Column Store exists, then “Yes, Indexes need to be frozen
  3. Remove any existing records for our target year
  4. If base data does not exist, add it
    • Remember base data is data after Jan 1st 2013
  5. Add data for Jan 1st 2012 to Dec 31st 2012
    • For each data add data @desiredTargetRecordCount / 365
  6. Defragment data
    • If Re-org requested
      • Reorg data
      • Update Statistics
    • If rebuild requested
      • Rebuild data

 

Code

Fact.usp_SaleLargeRowStorePopulateTable

use [WideWorldImportersDW]
go

if object_id('[Fact].[usp_SaleLargeRowStorePopulateTable]') is null
begin

    exec('create procedure [Fact].[usp_SaleLargeRowStorePopulateTable] as ')

end
go

alter procedure [Fact].[usp_SaleLargeRowStorePopulateTable]
(
      @EstimatedRowsFor2012 bigint = 12000000
    , @defrag bit = 0
)
as

begin


    DECLARE @OrderCounter bigint = 0;
    DECLARE @NumberOfSalesPerDay bigint
    DECLARE @DateCounter date;
    DECLARE @DateCounterBegin date
    DECLARE @DateCounterEnd date
    DECLARE @StartingSaleKey bigint;
    DECLARE @MaximumSaleKey bigint = (SELECT MAX([Sale Key]) FROM Fact.Sale);
    DECLARE @OutputCounter varchar(20);

    DECLARE @LineageKey int

    DECLARE @iCycle int
    DECLARE @lNumberofRecordsDeleted bigint
    DECLARE @lNumberofRecordsDeletedTotal bigint

    DECLARE @lNumberofRecordsAdded   bigint

    DECLARE @CHAR_TAB  varchar(10)

    DECLARE @tableDestination sysname

    set @CHAR_TAB = char(9);
    set @DateCounter = '20120101';
    set @DateCounterEnd = '20121231'

    set @DateCounterBegin = @DateCounter;

    set @MaximumSaleKey 
            = (
                SELECT MAX([Sale Key]) 
                FROM   Fact.Sale
              );

    set @LineageKey =1

    set @NumberOfSalesPerDay   = @EstimatedRowsFor2012 / 365;


    BEGIN


        SET NOCOUNT ON;
        SET XACT_ABORT ON;

        SET @tableDestination = '[Fact].[SaleLargeRowStore]';
        set @DateCounterEnd = '20121231'

        PRINT 'EstimatedRowsFor2012 Total '
                + cast(@EstimatedRowsFor2012 as varchar(30));

        PRINT 'Targeting ' 
                + CAST(@NumberOfSalesPerDay AS varchar(20)) 
                + ' sales per day.'
                ;

        --added by dadeniji 2017-08-23 7:01 AM
        PRINT 'DateCounter ' + convert(varchar(30), @DateCounter, 100)
        PRINT 'DateCounterMax ' + convert(varchar(30), '20121231', 100)

        IF @NumberOfSalesPerDay > 50000
        BEGIN
            PRINT 'WARNING: Limiting sales to 40000 per day';
            SET @NumberOfSalesPerDay = 50000;
        END;

        set @lNumberofRecordsDeleted = 1
        set @iCycle = 0
        set @lNumberofRecordsDeletedTotal = 0

        while (@lNumberofRecordsDeleted != 0)
        begin

            set @iCycle = @iCycle + 1

            delete top ( 10000)
            from    [Fact].[SaleLargeRowStore]
            where  [Invoice Date Key] <= @DateCounterEnd set @lNumberofRecordsDeleted = @@ROWCOUNT set @lNumberofRecordsDeletedTotal = @lNumberofRecordsDeletedTotal + @lNumberofRecordsDeleted PRINT @CHAR_TAB + cast(@iCycle as varchar(10)) + ') ' + 'Pruning records prior to ' + convert(varchar(30), @DateCounterEnd, 100) + ' Number of record(s) removed is ' + cast(@lNumberofRecordsDeleted as varchar(10)) + ' :: Total thus far ' + cast(@lNumberofRecordsDeletedTotal as varchar(10)) + '- TS is ' + convert(varchar(30), getdate(), 100) end if not exists ( select * from [Fact].[SaleLargeRowStore] where [Invoice Date Key] > @DateCounterEnd
        )
        begin

    
            PRINT 'Adding base data Fact.Sale - those greater than '  
                    + convert(varchar(30), @DateCounterEnd, 100)

            set identity_insert [Fact].[SaleLargeRowStore] on 

            insert into [Fact].[SaleLargeRowStore]
                (
                      [Sale Key]
                    , [City Key]
                    , [Customer Key]
                    , [Bill To Customer Key]
                    , [Stock Item Key]

                    , [Invoice Date Key]
                    , [InvoiceDateKeyYear]
                                        
                    , [Delivery Date Key]
                    , [DeliveryDateKeyYear]
                    
                    , [Salesperson Key]
                    , [WWI Invoice ID]
                    , [Description]
                    , Package
                    , Quantity
                    , [Unit Price]
                    , [Tax Rate]
                    , [Total Excluding Tax]
                    , [Tax Amount]
                    , Profit
                    , [Total Including Tax]
                    , [Total Dry Items]
                    , [Total Chiller Items]
                    , [Lineage Key]
                )

            select 
                      [Sale Key]
                    , [City Key]
                    , [Customer Key]
                    , [Bill To Customer Key]
                    , [Stock Item Key]

                    , [Invoice Date Key]
                    , [InvoiceDateKeyYear]
                        = datepart(year, [Invoice Date Key])

                    , [Delivery Date Key]
                    , [DeliveryDateKeyYear]
                        = datepart(year, [Delivery Date Key])

                    , [Salesperson Key]
                    , [WWI Invoice ID]
                    , [Description]
                    , Package
                    , Quantity
                    , [Unit Price]
                    , [Tax Rate]
                    , [Total Excluding Tax]
                    , [Tax Amount]
                    , Profit
                    , [Total Including Tax]
                    , [Total Dry Items]
                    , [Total Chiller Items]
                    , [Lineage Key]

            from   [Fact].[Sale]
    
            where  [Invoice Date Key] > @DateCounterEnd

            set @lNumberofRecordsAdded = @@ROWCOUNT


            set identity_insert [Fact].[SaleLargeRowStore] off
                
            PRINT 'Added base data Fact.Sale - ' 
                    + ' those greater than '  
                    + convert(varchar(30), @DateCounterEnd, 100)
                    + ' - Number of record(s) inserted during Initial loading is ' 
                    + cast(@lNumberofRecordsAdded as varchar(10))
                    + '- TS is '
                    + convert(varchar(30), getdate(), 100)

        end

    
        set @iCycle = 0;

        set identity_insert [Fact].[SaleLargeRowStore] off
    
        WHILE @DateCounter < '20121231' BEGIN SET @OutputCounter = CONVERT(varchar(20), @DateCounter, 112); --RAISERROR(@OutputCounter, 0, 1) WITH NOWAIT; -- PRINT 'Processing entry for ' + @OutputCounter SET @StartingSaleKey = @MaximumSaleKey - @NumberOfSalesPerDay - FLOOR(RAND() * 20000); SET @OrderCounter = 0; set @iCycle = @iCycle + 1 INSERT [Fact].[SaleLargeRowStore] WITH (TABLOCK) ( [City Key] , [Customer Key] , [Bill To Customer Key] , [Stock Item Key] , [Invoice Date Key] , [InvoiceDateKeyYear] , [Delivery Date Key] , [DeliveryDateKeyYear] , [Salesperson Key] , [WWI Invoice ID] , [Description] , Package , Quantity , [Unit Price] , [Tax Rate] , [Total Excluding Tax] , [Tax Amount] , Profit , [Total Including Tax] , [Total Dry Items] , [Total Chiller Items] , [Lineage Key] ) SELECT TOP(@NumberOfSalesPerDay) [City Key] , [Customer Key] , [Bill To Customer Key] , [Stock Item Key] , @DateCounter as [Invoice Date Key] , [InvoiceDateKeyYer] = datepart(year, @DateCounter) , [Delivery Date Key] = DATEADD(day, 1, @DateCounter) , [DeliveryDateKeyYer] = datepart ( year , DATEADD ( day , 1 , @DateCounter ) ) , [Salesperson Key] , [WWI Invoice ID] , [Description] , Package , Quantity , [Unit Price] , [Tax Rate] , [Total Excluding Tax] , [Tax Amount] , Profit = tblFS.[Profit] + ( tblFS.[Profit] * rand ( tblFS.[City Key] * tblFS.[Unit Price] ) ) , [Total Including Tax] , [Total Dry Items] , [Total Chiller Items] , @LineageKey FROM Fact.Sale tblFS WHERE tblFS.[Sale Key] > @StartingSaleKey
        
            and   tblFS.[Invoice Date Key] >='2013-01-01'
        
            ORDER BY 
                    [Sale Key];

            set @lNumberofRecordsAdded = @@ROWCOUNT
        
            PRINT @CHAR_TAB
                    + cast(@iCycle as varchar(10)) 
                    + ' - '
                    + 'Day : '
                    + convert(varchar(30), @DateCounter, 101) 
                    + ' - Number of record(s) added to [Fact].[SaleLargeRowStore] is ' 
                    + ' ' + cast(@lNumberofRecordsAdded as varchar(10))
                    + '- TS is '
                    + convert(varchar(30), getdate(), 100)

            SET @DateCounter = DATEADD(day, 1, @DateCounter);

        END;
    
    
        set identity_insert [Fact].[SaleLargeRowStore] off


        if (@defrag = 0)
        begin


            print @CHAR_TAB
                 + 'Reorg Index [Fact].[SaleLargeRowStore].[PK_Fact_SaleLargeRowStore] '
                 + convert(varchar(30), getdate(), 100)
                 +  '... '

            ALTER INDEX [PK_Fact_SaleLargeRowStore]
                on [Fact].[SaleLargeRowStore] REORGANIZE
                ;

            print @CHAR_TAB
                 + 'Re-orged Index [Fact].[SaleLargeRowStore].[PK_Fact_SaleLargeRowStore] '
                 + convert(varchar(30), getdate(), 100)

            print @CHAR_TAB
                 + 'Update Statistics on Index [Fact].[SaleLargeRowStore].[PK_Fact_SaleLargeRowStore] '
                 + convert(varchar(30), getdate(), 100)
                 + ' ....'

            update statistics [Fact].[SaleLargeRowStore] [PK_Fact_SaleLargeRowStore]
                with FULLSCAN;

            print @CHAR_TAB
                 + 'Updated Statistics on Index [Fact].[SaleLargeRowStore].[PK_Fact_SaleLargeRowStore] '
                 + convert(varchar(30), getdate(), 100)
                 + ' ....'


        end
        else if (@defrag = 1)
        begin
        

            print @CHAR_TAB
                 + 'Rebuilding Index [Fact].[SaleLargeRowStore].[PK_Fact_SaleLargeRowStore] '
                 + convert(varchar(30), getdate(), 100)
                 +  '... '

            ALTER INDEX [PK_Fact_SaleLargeRowStore]
                on [Fact].[SaleLargeRowStore] REBUILD
                ;

            print @CHAR_TAB
                 + 'Rebuilt Index [Fact].[SaleLargeRowStore].[PK_Fact_SaleLargeRowStore] '
                 + convert(varchar(30), getdate(), 100)

        end     

    END

    dbcc shrinkfile('WWI_Log') with no_infomsgs;

END

go


Fact.usp_SaleLargeColumnStoreClusteredPopulateTable

use [WideWorldImportersDW]
go

if object_id('[Fact].[usp_SaleLargeColumnStoreClusteredPopulateTable]') is null
begin

	exec('create procedure [Fact].[usp_SaleLargeColumnStoreClusteredPopulateTable] as ')

end
go

alter procedure [Fact].[usp_SaleLargeColumnStoreClusteredPopulateTable]
(
	  @EstimatedRowsFor2012 bigint = 12000000
	, @defrag bit = 0
)
as

begin


	DECLARE @OrderCounter bigint = 0;
	DECLARE @NumberOfSalesPerDay bigint;
	DECLARE @DateCounter date ;
	DECLARE @DateCounterBegin date
	DECLARE @DateCounterEnd date
	DECLARE @StartingSaleKey bigint;
	DECLARE @MaximumSaleKey bigint;
	DECLARE @OutputCounter varchar(20);

	DECLARE @LineageKey	int

	DECLARE @iCycle int
	DECLARE @lNumberofRecordsDeleted bigint
	DECLARE @lNumberofRecordsDeletedTotal bigint

	DECLARE @lNumberofRecordsAdded	 bigint

	DECLARE @CHAR_TAB  varchar(10)

	DECLARE @tableDestination sysname

	set @CHAR_TAB = char(9);
	set @DateCounter = '20120101';
	set @DateCounterEnd = '20121231'

	set @DateCounterBegin = @DateCounter;

	set @MaximumSaleKey 
			= (
				SELECT MAX([Sale Key]) 
				FROM   Fact.Sale
			  );

	set @LineageKey	=1

	set @NumberOfSalesPerDay   = @EstimatedRowsFor2012 / 365;


	BEGIN


		SET NOCOUNT ON;
		SET XACT_ABORT ON;

		SET @tableDestination = '[Fact].[SaleLargeColumnStoreClustered]';
		set @DateCounterEnd = '20121231'

		PRINT 'EstimatedRowsFor2012 Total '
				+ cast(@EstimatedRowsFor2012 as varchar(30));

		PRINT 'Targeting ' 
				+ CAST(@NumberOfSalesPerDay AS varchar(20)) 
				+ ' sales per day.'
				;

		--added by dadeniji 2017-08-23 7:01 AM
		PRINT 'DateCounter ' + convert(varchar(30), @DateCounter, 100)
		PRINT 'DateCounterMax ' + convert(varchar(30), '20121231', 100)

		IF @NumberOfSalesPerDay > 50000
		BEGIN
			PRINT 'WARNING: Limiting sales to 40000 per day';
			SET @NumberOfSalesPerDay = 50000;
		END;

		set @lNumberofRecordsDeleted = 1
		set @iCycle = 0
		set @lNumberofRecordsDeletedTotal = 0

		print @CHAR_TAB
			+ 'Pruning data '
			+ ' Begin Date ' + convert ( varchar(30), @DateCounter, 100)
			+ ' End Date '+   convert ( varchar(30), @DateCounterEnd, 100)
			+ ' from ' + @tableDestination
			+ convert(varchar(30), getdate(), 100)
			+  '... '

		while (@lNumberofRecordsDeleted != 0)
		begin

			set @iCycle = @iCycle + 1

			delete top ( 10000)
			from   [Fact].[SaleLargeColumnStoreClustered] 
			where  [Invoice Date Key] <= @DateCounterEnd set @lNumberofRecordsDeleted = @@ROWCOUNT set @lNumberofRecordsDeletedTotal = @lNumberofRecordsDeletedTotal + @lNumberofRecordsDeleted PRINT @CHAR_TAB + cast(@iCycle as varchar(10)) + ') ' + 'Pruning records prior to ' + convert(varchar(30), @DateCounterEnd, 100) + ' Number of record(s) removed is ' + cast(@lNumberofRecordsDeleted as varchar(10)) + ' :: Total thus far ' + cast(@lNumberofRecordsDeletedTotal as varchar(10)) end print @CHAR_TAB + 'Pruned data ' + ' Begin Date ' + convert ( varchar(30), @DateCounter, 100) + ' End Date '+ convert ( varchar(30), @DateCounterEnd, 100) + ' from ' + @tableDestination + '. TS is ' + convert(varchar(30), getdate(), 100) if not exists ( select * from [Fact].[SaleLargeColumnStoreClustered] where [Invoice Date Key] > @DateCounterEnd
		)
		begin

	
			PRINT 'Adding base data Fact.Sale - those greater than '  
					+ convert(varchar(30), @DateCounterEnd, 100)
					+ '. TS is '
					+ convert(varchar(30), getdate(), 100)

			set identity_insert [Fact].[SaleLargeColumnStoreClustered]  on 

			insert into [Fact].[SaleLargeColumnStoreClustered] 
				(
					  [Sale Key]
					, [City Key]
					, [Customer Key]
					, [Bill To Customer Key]
					, [Stock Item Key]

					, [Invoice Date Key]
					, [InvoiceDateKeyYear]

					, [Delivery Date Key]
					, [DeliveryDateKeyYear]

					, [Salesperson Key]
					, [WWI Invoice ID]
					, [Description]
					, Package
					, Quantity
					, [Unit Price]
					, [Tax Rate]
					, [Total Excluding Tax]
					, [Tax Amount]
					, Profit
					, [Total Including Tax]
					, [Total Dry Items]
					, [Total Chiller Items]
					, [Lineage Key]
				)

			select 
					  [Sale Key]
					, [City Key]
					, [Customer Key]
					, [Bill To Customer Key]
					, [Stock Item Key]

					, [Invoice Date Key]
					, [InvoiceDateKeyYer]
						= datepart(year, [Invoice Date Key])

					, [Delivery Date Key]
					, [DeliveryDateKeyYear]
						= datepart(year, [Delivery Date Key])

					, [Salesperson Key]
					, [WWI Invoice ID]
					, [Description]
					, Package
					, Quantity
					, [Unit Price]
					, [Tax Rate]
					, [Total Excluding Tax]
					, [Tax Amount]
					, Profit
					, [Total Including Tax]
					, [Total Dry Items]
					, [Total Chiller Items]
					, [Lineage Key]

			from   [Fact].[Sale]
	
			where  [Invoice Date Key] > @DateCounterEnd

			set @lNumberofRecordsAdded = @@ROWCOUNT


			set identity_insert [Fact].[SaleLargeColumnStoreClustered]  off
				
			PRINT 'Added base data Fact.Sale - ' 
					+ ' those greater than '  
					+ convert(varchar(30), @DateCounterEnd, 100)
					+ ' - Number of record(s) inserted during Initial loading is ' 
					+ cast(@lNumberofRecordsAdded as varchar(10))
					+ '- TS is '
					+ convert(varchar(30), getdate(), 100)

		end

	
		set @iCycle = 0;

		set identity_insert [Fact].[SaleLargeColumnStoreClustered]  off

		print @CHAR_TAB
				 + 'Adding data '
				 + ' Begin Date ' + convert ( varchar(30), @DateCounter, 100)
				 + ' End Date '+   convert ( varchar(30), @DateCounterEnd, 100)
				 + ' into ' + @tableDestination
				 + ' - TS :- ' + convert(varchar(30), getdate(), 100)
				 +  '... '
	
		WHILE @DateCounter < '20121231' BEGIN SET @OutputCounter = CONVERT(varchar(20), @DateCounter, 112); --RAISERROR(@OutputCounter, 0, 1) WITH NOWAIT; -- PRINT 'Processing entry for ' + @OutputCounter SET @StartingSaleKey = @MaximumSaleKey - @NumberOfSalesPerDay - FLOOR(RAND() * 20000); SET @OrderCounter = 0; set @iCycle = @iCycle + 1 INSERT [Fact].[SaleLargeColumnStoreClustered] WITH (TABLOCK) ( [City Key] , [Customer Key] , [Bill To Customer Key] , [Stock Item Key] , [Invoice Date Key] , [InvoiceDateKeyYear] , [Delivery Date Key] , [DeliveryDateKeyYear] , [Salesperson Key] , [WWI Invoice ID] , [Description] , Package , Quantity , [Unit Price] , [Tax Rate] , [Total Excluding Tax] , [Tax Amount] , Profit , [Total Including Tax] , [Total Dry Items] , [Total Chiller Items] , [Lineage Key] ) SELECT TOP(@NumberOfSalesPerDay) [City Key] , [Customer Key] , [Bill To Customer Key] , [Stock Item Key] , [Invoice Date Key] = @DateCounter , [InvoiceDateKeyYear] = datepart(year, [Invoice Date Key]) , [Delivery Date Key] = DATEADD(day, 1, @DateCounter) , [DeliveryDateKeyYear] = datepart(year, [Delivery Date Key]) , [Salesperson Key] , [WWI Invoice ID] , [Description] , Package , Quantity , [Unit Price] , [Tax Rate] , [Total Excluding Tax] , [Tax Amount] , Profit = tblFS.[Profit] + ( tblFS.[Profit] * rand ( tblFS.[City Key] * tblFS.[Unit Price] ) ) , [Total Including Tax] , [Total Dry Items] , [Total Chiller Items] , @LineageKey FROM Fact.Sale tblFS WHERE tblFS.[Sale Key] > @StartingSaleKey
		
			and   tblFS.[Invoice Date Key] >='2013-01-01'
		
			ORDER BY 
					[Sale Key];

			set @lNumberofRecordsAdded = @@ROWCOUNT
		
			PRINT @CHAR_TAB
					+ cast(@iCycle as varchar(10)) 
					+ ' - '
					+ 'Day : '
					+ convert(varchar(30), @DateCounter, 101) 
					+ ' - Number of record(s) added to '
					+ @tableDestination 
					+ ' ' + cast(@lNumberofRecordsAdded as varchar(10))
					+ ' TS :- ' + convert(varchar(30), getdate(), 100)

			SET @DateCounter = DATEADD(day, 1, @DateCounter);

		END;
	

		print @CHAR_TAB
				 + 'Added data '
				 + ' Begin Date ' + convert ( varchar(30), @DateCounter, 100)
				 + ' End Date '+   convert ( varchar(30), @DateCounterEnd, 100)
				 + ' into ' + @tableDestination
				 + ' TS :- ' + convert(varchar(30), getdate(), 100)
				 +  ''
	
		set identity_insert [Fact].[SaleLargeColumnStoreClustered]  off


		if (@defrag = 0)
		begin


			print @CHAR_TAB
				 + 'Reorg Index '
				 + '[Fact].[SaleLargeColumnStoreClustered].[INDX_FACT_SaleLargeColumnStoreClustered] '
				 + convert(varchar(30), getdate(), 100)
				 +  '... '

			ALTER INDEX [INDX_FACT_SaleLargeColumnStoreClustered]
				on [Fact].[SaleLargeColumnStoreClustered]  REORGANIZE
				;

			print @CHAR_TAB
				 + 'Re-orged Index '
				 + '[Fact].[SaleLargeColumnStoreClustered].[INDX_FACT_SaleLargeColumnStoreClustered] '
				 + convert(varchar(30), getdate(), 100)

			print @CHAR_TAB
				 + 'Update Statistics on Index ' 
				 + ' [Fact].[SaleLargeColumnStoreClustered].[INDX_FACT_SaleLargeColumnStoreClustered]  '
				 + convert(varchar(30), getdate(), 100)
				 + ' ....'

			update statistics [Fact].[SaleLargeColumnStoreClustered] 
					[INDX_FACT_SaleLargeColumnStoreClustered] 
				with FULLSCAN;

			print @CHAR_TAB
				 + 'Updated Statistics on Index '
				 + ' [Fact].[SaleLargeColumnStoreClustered] [INDX_FACT_SaleLargeColumnStoreClustered] '
				 + convert(varchar(30), getdate(), 100)
				 + ' ....'


		end
		else if (@defrag = 1)
		begin
		

			print @CHAR_TAB
				 + 'Rebuilding Index '
				 + '[Fact].[SaleLargeRowStore].[PK_Fact_SaleLargeRowStore] '
				 + convert(varchar(30), getdate(), 100)
				 +  '... '

			ALTER INDEX [INDX_FACT_SaleLargeColumnStoreClustered]
				on  [Fact].[SaleLargeColumnStoreClustered] REBUILD
				;

			print @CHAR_TAB
				 + 'Rebuilt Index '
				 + ' [Fact].[SaleLargeColumnStoreClustered].[INDX_FACT_SaleLargeColumnStoreClustered] '
				 + convert(varchar(30), getdate(), 100)
		

		end		
	END

	dbcc shrinkfile('WWI_Log') with no_infomsgs;

END

go

Fact.usp_SaleLargeColumnStoreNonClusteredPopulateTable

use [WideWorldImportersDW]
go

if object_id('[Fact].[usp_SaleLargeColumnStoreNonClusteredPopulateTable]') is null
begin

	exec('create procedure [Fact].[usp_SaleLargeColumnStoreNonClusteredPopulateTable] as ')

end
go

alter procedure [Fact].[usp_SaleLargeColumnStoreNonClusteredPopulateTable]
(
	  @EstimatedRowsFor2012 bigint = 12000000
	, @defrag bit = 0
)
as

begin


	DECLARE @OrderCounter bigint = 0;
	DECLARE @NumberOfSalesPerDay bigint;
	DECLARE @DateCounter date;
	DECLARE @DateCounterBegin date
	DECLARE @DateCounterEnd date
	DECLARE @StartingSaleKey bigint;
	DECLARE @MaximumSaleKey bigint;
	DECLARE @OutputCounter varchar(20);

	DECLARE @LineageKey	int

	DECLARE @iCycle int
	DECLARE @lNumberofRecordsDeleted bigint
	DECLARE @lNumberofRecordsDeletedTotal bigint

	DECLARE @lNumberofRecordsAdded	 bigint
	DECLARE @lNumberofRecordsAddedTotal bigint

	DECLARE @CHAR_TAB  varchar(10)

	DECLARE @tableDestination sysname

	set @CHAR_TAB = char(9);
	set @DateCounter = '20120101';
	set @DateCounterEnd = '20121231'

	set @DateCounterBegin = @DateCounter;

	set @MaximumSaleKey 
			= (
				SELECT MAX([Sale Key]) 
				FROM   Fact.Sale
			  );

	set @LineageKey	=1

	set @NumberOfSalesPerDay   = @EstimatedRowsFor2012 / 365;



	set @lNumberofRecordsAddedTotal = 0

	BEGIN


		SET NOCOUNT ON;
		SET XACT_ABORT ON;

		-- exec sp_helpindex '[Fact].[SaleLargeColumnStoreNonClustered]'
		SET @tableDestination = '[Fact].[SaleLargeColumnStoreNonClustered]';
		set @DateCounterEnd = '20121231'

		PRINT 'EstimatedRowsFor2012 Total '
				+ cast(@EstimatedRowsFor2012 as varchar(30));

		PRINT 'Targeting ' 
				+ CAST(@NumberOfSalesPerDay AS varchar(20)) 
				+ ' sales per day.'
				;

		--added by dadeniji 2017-08-23 7:01 AM
		PRINT 'DateCounter ' + convert(varchar(30), @DateCounter, 100)
		PRINT 'DateCounterMax ' + convert(varchar(30), '20121231', 100)

		IF @NumberOfSalesPerDay > 50000
		BEGIN
			PRINT 'WARNING: Limiting sales to 40000 per day';
			SET @NumberOfSalesPerDay = 50000;
		END;

		set @lNumberofRecordsDeleted = 1
		set @iCycle = 0
		set @lNumberofRecordsDeletedTotal = 0

		/*
			Msg 35330, Level 15, State 1, Procedure usp_SaleLargeColumnStoreNonClusteredPopulateTable, Line 119 [Batch Start Line 2]
			DELETE statement failed because data cannot be updated in a table that has a nonclustered columnstore index. Consider disabling the columnstore index before issuing the DELETE statement, and then rebuilding the columnstore index after DELETE has completed.
		*/

		print @CHAR_TAB
			+ 'Disabling Index '
			+ '[Fact].[SaleLargeColumnStoreNonClustered].[INDX_FACT_Customer_Profit] '
			+ convert(varchar(30), getdate(), 100)
			+  '... '

		exec('
				ALTER INDEX ALL
				on   [Fact].[SaleLargeColumnStoreNonClustered] DISABLE
				;
			')


		--ALTER INDEX [INDX_FACT_Customer_Profit]
		ALTER INDEX ALL
				on   [Fact].[SaleLargeColumnStoreNonClustered] DISABLE
				;

		/*
		ALTER INDEX [INDX_InvoiceDateKeyYear_DeliveryDateKeyYear]
				on   [Fact].[SaleLargeColumnStoreNonClustered] DISABLE
				;
		*/

		print @CHAR_TAB
			+ 'Disabled Index '
			+ '[Fact].[SaleLargeColumnStoreNonClustered].[INDX_FACT_Customer_Profit] '
			+ convert(varchar(30), getdate(), 100)


		print @CHAR_TAB
			+ 'Pruning data '
			+ ' Begin Date ' + convert ( varchar(30), @DateCounter, 100)
			+ ' End Date '+   convert ( varchar(30), @DateCounterEnd, 100)
			+ ' from ' + @tableDestination
			+ convert(varchar(30), getdate(), 100)
			+  '... '

		while (@lNumberofRecordsDeleted != 0)
		begin

			set @iCycle = @iCycle + 1

			delete top ( 10000)
			from   [Fact].[SaleLargeColumnStoreNonClustered] 
			where  [Invoice Date Key] <= @DateCounterEnd set @lNumberofRecordsDeleted = @@ROWCOUNT set @lNumberofRecordsDeletedTotal = @lNumberofRecordsDeletedTotal + @lNumberofRecordsDeleted PRINT @CHAR_TAB + cast(@iCycle as varchar(10)) + ') ' + 'Pruning records prior to ' + convert(varchar(30), @DateCounterEnd, 100) + ' Number of record(s) removed is ' + cast(@lNumberofRecordsDeleted as varchar(10)) + ' :: Total thus far ' + cast(@lNumberofRecordsDeletedTotal as varchar(10)) end print @CHAR_TAB + 'Pruned data ' + ' Begin Date ' + convert ( varchar(30), @DateCounter, 100) + ' End Date '+ convert ( varchar(30), @DateCounterEnd, 100) + ' from ' + @tableDestination + '. TS is ' + convert(varchar(30), getdate(), 100) if not exists ( select * from [Fact].[SaleLargeColumnStoreNonClustered] where [Invoice Date Key] > @DateCounterEnd
		)
		begin

	
			PRINT 'Adding base data Fact.Sale - those greater than '  
					+ convert(varchar(30), @DateCounterEnd, 100)
					+ '. TS is '
					+ convert(varchar(30), getdate(), 100)

			set identity_insert [Fact].[SaleLargeColumnStoreNonClustered]  on 

			insert into [Fact].[SaleLargeColumnStoreNonClustered] 
				(
					  [Sale Key]
					, [City Key]
					, [Customer Key]
					, [Bill To Customer Key]
					, [Stock Item Key]

					, [Invoice Date Key]
					, [InvoiceDateKeyYear]

					, [Delivery Date Key]
					, [DeliveryDateKeyYear]

					, [Salesperson Key]
					, [WWI Invoice ID]
					, [Description]
					, Package
					, Quantity
					, [Unit Price]
					, [Tax Rate]
					, [Total Excluding Tax]
					, [Tax Amount]
					, Profit
					, [Total Including Tax]
					, [Total Dry Items]
					, [Total Chiller Items]
					, [Lineage Key]
				)

			select 
					  [Sale Key]
					, [City Key]
					, [Customer Key]
					, [Bill To Customer Key]
					, [Stock Item Key]

					, [Invoice Date Key]
					, [InvoiceDateKeyYear]
						= datepart(year, [Invoice Date Key])

					, [Delivery Date Key]

					, [DeliveryDateKeyYear]
						= datepart(year, [Delivery Date Key])

					, [Salesperson Key]
					, [WWI Invoice ID]
					, [Description]
					, Package
					, Quantity
					, [Unit Price]
					, [Tax Rate]
					, [Total Excluding Tax]
					, [Tax Amount]
					, Profit
					, [Total Including Tax]
					, [Total Dry Items]
					, [Total Chiller Items]
					, [Lineage Key]

			from   [Fact].[Sale]
	
			where  [Invoice Date Key] > @DateCounterEnd

			set @lNumberofRecordsAdded = @@ROWCOUNT


			set identity_insert [Fact].[SaleLargeColumnStoreNonClustered]  off
				
			PRINT 'Added base data Fact.Sale - ' 
					+ ' those greater than '  
					+ convert(varchar(30), @DateCounterEnd, 100)
					+ ' - Number of record(s) inserted during Initial loading is ' 
					+ cast(@lNumberofRecordsAdded as varchar(10))
					+ '- TS is '
					+ convert(varchar(30), getdate(), 100)

		end

	
		set @iCycle = 0;

		set identity_insert [Fact].[SaleLargeColumnStoreNonClustered]  off

		print @CHAR_TAB
				 + 'Adding data '
				 + ' Begin Date ' + convert ( varchar(30), @DateCounter, 100)
				 + ' End Date '+   convert ( varchar(30), @DateCounterEnd, 100)
				 + ' into ' + @tableDestination
				 + ' - TS :- ' + convert(varchar(30), getdate(), 100)
				 +  '... '
	
		WHILE @DateCounter < '20121231' BEGIN SET @OutputCounter = CONVERT(varchar(20), @DateCounter, 112); --RAISERROR(@OutputCounter, 0, 1) WITH NOWAIT; -- PRINT 'Processing entry for ' + @OutputCounter SET @StartingSaleKey = @MaximumSaleKey - @NumberOfSalesPerDay - FLOOR(RAND() * 20000); SET @OrderCounter = 0; set @iCycle = @iCycle + 1 INSERT [Fact].[SaleLargeColumnStoreNonClustered] WITH (TABLOCK) ( [City Key] , [Customer Key] , [Bill To Customer Key] , [Stock Item Key] , [Invoice Date Key] , [InvoiceDateKeyYear] , [Delivery Date Key] , [DeliveryDateKeyYear] , [Salesperson Key] , [WWI Invoice ID] , [Description] , Package , Quantity , [Unit Price] , [Tax Rate] , [Total Excluding Tax] , [Tax Amount] , Profit , [Total Including Tax] , [Total Dry Items] , [Total Chiller Items] , [Lineage Key] ) SELECT TOP(@NumberOfSalesPerDay) [City Key] , [Customer Key] , [Bill To Customer Key] , [Stock Item Key] , [Invoice Date Key] = @DateCounter , [InvoiceDateKeyYear] = datepart(year, @DateCounter) , [Delivery Date Key] = DATEADD(day, 1, @DateCounter) , [DeliveryDateKeyYer] = datepart ( year , DATEADD ( day , 1 , @DateCounter ) ) , [Salesperson Key] , [WWI Invoice ID] , [Description] , Package , Quantity , [Unit Price] , [Tax Rate] , [Total Excluding Tax] , [Tax Amount] , Profit = tblFS.[Profit] + ( tblFS.[Profit] * rand ( tblFS.[City Key] * tblFS.[Unit Price] ) ) , [Total Including Tax] , [Total Dry Items] , [Total Chiller Items] , @LineageKey FROM Fact.Sale tblFS WHERE tblFS.[Sale Key] > @StartingSaleKey
		
			and   tblFS.[Invoice Date Key] >='2013-01-01'
		
			ORDER BY 
					[Sale Key];

			set @lNumberofRecordsAdded = @@ROWCOUNT

			set @lNumberofRecordsAddedTotal = @lNumberofRecordsAddedTotal
												+ @lNumberofRecordsAdded
		
			PRINT @CHAR_TAB
					+ cast(@iCycle as varchar(10)) 
					+ ' - '
					+ 'Day : '
					+ convert(varchar(30), @DateCounter, 101) 
					+ ' - Number of record(s) added to '
					+ @tableDestination 
					+ ' ' + cast(@lNumberofRecordsAdded as varchar(10))
					+ ' TS :- ' + convert(varchar(30), getdate(), 100)

			SET @DateCounter = DATEADD(day, 1, @DateCounter);

		END;
	

		print @CHAR_TAB
				 + 'Added data '
				 + ' Begin Date ' + convert ( varchar(30), @DateCounter, 100)
				 + ' End Date '+   convert ( varchar(30), @DateCounterEnd, 100)
				 + ' into ' + @tableDestination
				 + ' TS :- ' + convert(varchar(30), getdate(), 100)
				 +  ''
	

		print @CHAR_TAB
				 + cast(@lNumberofRecordsAddedTotal as varchar(10))
				 + ' records added '
				 + ' into ' + @tableDestination
				 + ' TS :- ' + convert(varchar(30), getdate(), 100)
				 +  ''
	
		set identity_insert [Fact].[SaleLargeColumnStoreNonClustered]  off


		if 
			(
				    ( @defrag = 0) 
				and ( 1= 0)
			)	 
		begin


			print @CHAR_TAB
				 + 'Rebuilding Index [Fact].[SaleLargeColumnStoreNonClustered].[INDX_FACT_Customer_Profit] '
				 + convert(varchar(30), getdate(), 100)
				 +  '... '

			--ALTER INDEX [INDX_FACT_Customer_Profit]
			--ALTER INDEX [INDX_InvoiceDateKeyYear_DeliveryDateKeyYear]
			ALTER INDEX ALL
				on  [Fact].[SaleLargeColumnStoreNonClustered] REORGANIZE
				;

			print @CHAR_TAB
				 + 'Rebuilt Index [Fact].[SaleLargeColumnStoreNonClustered].[INDX_FACT_Customer_Profit] '
				 + convert(varchar(30), getdate(), 100)

			print @CHAR_TAB
				 + 'Update Statistics on Index ' 
				 + ' [Fact].[SaleLargeColumnStoreNonClustered].[INDX_FACT_SaleLargeColumnStoreClustered]  '
				 + convert(varchar(30), getdate(), 100)
				 + ' ....'

			update statistics [Fact].[SaleLargeColumnStoreNonClustered]
					 --[INDX_FACT_Customer_Profit]
					 --[INDX_InvoiceDateKeyYear_DeliveryDateKeyYear]
					 --ALL
				with FULLSCAN;

			print @CHAR_TAB
				 + 'Updated Statistics on Index '
				 +  ' [Fact].[SaleLargeColumnStoreNonClustered].[INDX_FACT_SaleLargeColumnStoreClustered]  '
				 + convert(varchar(30), getdate(), 100)
				 + ' ....'


		end
		else if 
				(
					   ( @defrag = 1)
					or ( 1=1)
				)
		begin
		

			print @CHAR_TAB
				 + 'Rebuilding Index [Fact].[SaleLargeColumnStoreClustered].[INDX_FACT_Customer_Profit] '
				 + convert(varchar(30), getdate(), 100)
				 +  '... '

			--ALTER INDEX [INDX_FACT_Customer_Profit]
			--ALTER INDEX [INDX_InvoiceDateKeyYear_DeliveryDateKeyYear]
			ALTER INDEX ALL
				on   [Fact].[SaleLargeColumnStoreNonClustered] REBUILD
				;

			print @CHAR_TAB
				 + 'Rebuilt Index [Fact].[SaleLargeColumnStoreNonClustered].[INDX_FACT_Customer_Profit] '
				 + convert(varchar(30), getdate(), 100)
		

		end		
	END

	dbcc shrinkfile('WWI_Log') with no_infomsgs;

END

go


 

Invoke SP to load data

Invoke SP to load data into tables

Fact.usp_SaleLargeRowStorePopulateTable

use [WideWorldImportersDW]
go

/*
	truncate table [Fact].[SaleLargeRowStore]
*/
declare	 @EstimatedRowsFor2012 bigint
declare  @defrag bit 

set @EstimatedRowsFor2012 = 12000000
set  @defrag =1

exec [Fact].[usp_SaleLargeRowStorePopulateTable]
	  @EstimatedRowsFor2012 = @EstimatedRowsFor2012
	, @defrag = @defrag

Fact.SaleLargeColumnStoreClustered

use [WideWorldImportersDW]
go

/*
	truncate table [Fact].[SaleLargeColumnStoreClustered]
*/

declare	 @EstimatedRowsFor2012 bigint
declare  @defrag bit 

set @EstimatedRowsFor2012 = 12000000
set  @defrag =1

exec [Fact].[usp_SaleLargeColumnStoreClusteredPopulateTable]
	  @EstimatedRowsFor2012 = @EstimatedRowsFor2012
	, @defrag = @defrag


Fact.SaleLargeColumnStoreNonClustered


use [WideWorldImportersDW]
go

/*

	truncate table [Fact].[SaleLargeColumnStoreNonClustered]

*/

declare	 @EstimatedRowsFor2012 bigint
declare  @defrag bit

set @EstimatedRowsFor2012 = 12000000
set  @defrag =0


exec [Fact].[usp_SaleLargeColumnStoreNonClusteredPopulateTable]
	  @EstimatedRowsFor2012 = @EstimatedRowsFor2012
	, @defrag = @defrag


Metadata – Fact.Sales*

Number of Records

Code


set nocount on
go
set XACT_ABORT on
go

use [WideWorldImportersDW]
go

; with cteTable
(
      [id]
	, [table]
	, [objectID]
)
as
(
	select 
			  tblA.[id]
			, tblA.[name]
			, [objectID]
				 = object_id(tblA.[name])
	from  
		(

			select 1 as [id], '[Fact].[SaleLargeRowStore]' as [name]
    
			union

			select 2  as [id], '[Fact].[SaleLargeColumnStoreClustered]' as [name]
    
			union

			select 2 as [id], '[Fact].[SaleLargeColumnStoreNonClustered]' as [name]

		) tblA
) 
select
	      cteT.id

        , [table]
            =  tblSS.[name]
                + '.'
                + tblSO.[name]
 
        , [indexName]
            = tblSI.[name]
                     
        , [indexType]
          = tblSI.[type_desc]
 
        , [numberofRecords]
            = sum(tblSP.[rows])
 
        , numberofPagesData
            = sum(tblSPS.in_row_data_page_count)
 
        , numberofPagesLOB
            = sum(tblSPS.lob_used_page_count)
 
        , usedPageCount
            = sum(used_page_count)

        , reservedPageCount
            = sum(tblSPS.reserved_page_count)
 
        , [reservedPageInMB]
            = sum(tblSPS.reserved_page_count *1)  / 128

from   sys.schemas tblSS
 
inner join sys.objects tblSO
 
    on tblSS.schema_id = tblSO.schema_id
 
inner join sys.indexes tblSI
 
    on tblSO.object_id = tblSI.object_id
 
 
inner join sys.partitions tblSP
 
    on tblSI.object_id = tblSP.object_id
    and tblSI.index_id = tblSP.index_id
 
 
inner join sys.dm_db_partition_stats tblSPS
 
    on tblSP.object_id = tblSPS.object_id
    and tblSP.index_id = tblSPS.index_id
    and tblSP.partition_id = tblSPS.partition_id
 
 
inner join cteTable cteT
        on cteT.[objectID] = tblSO.object_id
 
group by
		  cteT.id
        , tblSS.[name]
        , tblSO.[name]
        , tblSI.index_id
        , tblSI.[name]
        , tblSI.[type_desc]
 
order by
	  cteT.id
        , tblSS.[name]
        , tblSO.[name]
        , tblSI.index_id


Output

Explanation

  1. We have above 12 million records in each of our targeted table ( Row Store  Clustered/ Column Store-Clustered / Column Store – NonClustered)
  2. Row Store Clustered
    • Clustered Index => 2.691 GB
    • Index on Clustered Key & Profit – 360 MB
  3. Column Store Clustered
    • 196 MB
  4. Column Stored Non Clustered
    • Heap => 2.687 GB
    • Column Store Index on “Invoice Date Year” & “Delivery Date Year” – 17 MB

 

Source Control

GitHub

Availed in Github here.

 

Summary

Clustered Column Store offers the best savings in terms of storage footprint.

We have about 12 million records which should hopefully be a good start for untarget queries.

Untargeted, that is, queries that we have not hard-crafted specific indexes for.