SSMS – Linked Server – Column Metadata

Background

Had a good meeting this morning while we try to figure out how to better support our Developers.

One of the issues they brought up was an inability to view the datatype of linked Server tables.

Pictorial

Here is the deepest granularity when we connect to our Linked Server.

Image

ssms.columns.20190114.0425PM.PNG

Explanation

  1. We see the following
    • Server
    • Databases
    • Objects
      • Tables
      • Views

View Columns

Script

Outline

Here are avenues we can explore to view column metadata :-

  1. sp_columns_ex
  2. Openquery/sys
    • sys.all_columns
  3. Openquery/informational_schema
    • informational_schema.columns

Choices

exp_columns_ex

Syntax

exec sp_columns_ex
		  @table_server
		, @table_catalog
		, @table_schema
		, @table_name   

Sample

declare @linkedServer   sysname
declare @linkedDatabase sysname
declare @linkedSchema   sysname
declare @linkedTable    sysname

set @linkedServer= 'AWS-JobBuilder'
set @linkedDatabase = 'acs_ap'
set @linkedSchema = 'dbo'
set @linkedTable = 'ap_user'

exec sp_columns_ex
		  @table_server  = @linkedServer
		, @table_catalog = @linkedDatabase
		, @table_schema  = @linkedSchema
		, @table_name    = @linkedTable

Output

sp_tables_ex.2019014.0443pm

openquery/sys.*

Syntax

select top 10 *

from   openquery
        (
              [AWS-JobBuilder]

            , '
                    select
                              [server] = serverproperty(''servername'')
                            , [database] = db_name()
                            , [schema] = tblSS.name
                            , [object] = tblSAO.name
                            , [column] = tblSAC.name
                            , [type]   = tblST.[name]
                            , tblST.max_length
                            , tblST.is_nullable

                    from   sys.schemas tblSS

                    inner join sys.all_objects tblSAO

                            on tblSS.schema_id = tblSAO.schema_id

                    inner join sys.all_columns tblSAC

                        on tblSAO.object_id = tblSAC.object_id

                    inner join sys.types tblST

                        on  tblSAC.system_type_id = tblST.system_type_id
                        and tblSAC.user_type_id = tblST.user_type_id

              '
        )

Output

openquery.sys.all.2019014.0452pm

openquery/information_schema.columns

Syntax
select top 10 *

from   openquery
        (
              [AWS-JobBuilder]

            , '
                select top 100 

                          [server] = serverproperty(''servername'')

                        , [database] = tblSIC.[TABLE_CATALOG]

                        , [schema] = tblSIC.[TABLE_SCHEMA]

                        , [object] = tblSIC.[TABLE_NAME]

                        , [column] = tblSIC.[COLUMN_NAME]

                        , [position] = tblSIC.[ORDINAL_POSITION]

                        , [dataType] = tblSIC.[DATA_TYPE]

                        , [charMaxLength] = tblSIC.[CHARACTER_MAXIMUM_LENGTH]

                        , [charOctetLength] = tblSIC.[CHARACTER_OCTET_LENGTH]

                from   information_schema.columns tblSIC

              '
        )			   	

Output

openquery.openquery.informational_columns.2019014.0515PM.PNG

Summary

Unfortunately, SQL Server Management Studio ( SSMS ) v17.x does not let us view columns on Linked Servers.

To gather column level metadata, one has to write code.

DBeaver – Table Relationships

Background

Just wanting to make sure that I am properly tracking table relationships in MySQL.

 

Code

INFORMATION_SCHEMA

INFORMATION_SCHEMA.KEY_COLUMN_USAGE

Code

Sample


set @schema := 'sakila';
set @table := 'inventory';

select 

         case

            when tblKCU.TABLE_NAME = @table then 'References'
            when REFERENCED_TABLE_NAME = @table then 'Referenced'

         end as 'isReferencingOrReferenced'
       , tblKCU.CONSTRAINT_NAME as 'constraint'
       , tblKCU.TABLE_NAME as 'table'
       , tblKCU.COLUMN_NAME as 'column'

       , tblKCU.REFERENCED_TABLE_NAME as 'refTable'
       , tblKCU.REFERENCED_COLUMN_NAME as 'refColumn'

       , tblKCU.ORDINAL_POSITION as 'colPos'

from   information_schema.KEY_COLUMN_USAGE tblKCU

where  tblKCU.TABLE_SCHEMA = @schema

/* Foreign Key Relationship */
and    (

             ( tblKCU.TABLE_NAME = @table )
          or ( tblKCU.REFERENCED_TABLE_NAME = @table )

       )

/* Skip Primary Key */
and    (

              ( tblKCU.TABLE_NAME is not null )
          and ( tblKCU.REFERENCED_TABLE_NAME is not null )

       )

order by       

         tblKCU.TABLE_NAME
       , tblKCU.COLUMN_NAME

       , tblKCU.REFERENCED_TABLE_NAME
       , tblKCU.REFERENCED_COLUMN_NAME

       , tblKCU.CONSTRAINT_NAME       

       , tblKCU.ORDINAL_POSITION

;

Output

 

Tools

DBeaver

Outline

Steps to follow to track visually through DBeaver

  1. Launch DBeaver
  2. Connect to MySQL Instance
  3. Navigate to Database Tables <Specific Table>
  4. Double Click on the selected table
  5. On the right panel, choose the “ER Diagram” Tab
  6. Review the shown tables and their relationships to each other

Image

MySQL – information_schema.statistics

Background

As a quick follow-up to our discussion of metadata on tables, let us discuss indexes.

BTW, the discussion on tables is here.

INFORMATION_SCHEMA.STATISTICS

Code


select

		tblStat.TABLE_NAME as 'table'

	  , tblStat.INDEX_NAME as 'index'

	  , case tblStat.NON_UNIQUE
			   when 0 then 'No'
			   when 1 then 'Yes'
			   else '?'
		end as 'Unique'


	  , GROUP_CONCAT( COLUMN_NAME order by SEQ_IN_INDEX ) as colList


	  , tblStat.CARDINALITY


from  INFORMATION_SCHEMA.STATISTICS tblStat


where  tblStat.TABLE_SCHEMA = 'drupal'


group by

		 tblStat.TABLE_SCHEMA
	   , tblStat.TABLE_NAME
	   , tblStat.INDEX_NAME


order by

		 tblStat.TABLE_SCHEMA
	  ,  tblStat.TABLE_NAME
	  ,  tblStat.INDEX_NAME


LIMIT 25

;


Output

 

Comparison with Other Databases

  1. GROUP_CONCAT
    • The group_concat is very, very powerful and has few equal in other database platforms
    • Function Documentation

 

SHOW INDEX

To get a bit more information, specifically cardinality ( Number of unique values in each column combination ), we can issue “show index” against the table.

Code

Syntax


show index from [table] from [database];

Sample


show index from search_index from drupal;

Output

References

  1. Reference
    • INFORMATION_SCHEMA Tables
      • INFORMATION_SCHEMA STATISTICS Table
    • Show Index
    •  Group_Function
      • Group By Function
        Link
    • Group_Concat

MySQL – information_schema.tables

Background

With SQL being a lingua-franca for quering data it should not be a far stretch to note that the various governing bodies have agreed on standards on how to expose dictionary.

 

Mysql

For instance, if we want to see which tables are in a database we can launch our favorite GUI SQL Editor or just access MySQL’s Client query tool eponymous named mysql.

 

information_schema.tables

SQL

Sample


select 
       TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME
     , TABLE_TYPE, ENGINE
     , TABLE_ROWS, AVG_ROW_LENGTH 

from information_schema.tables 

where TABLE_TYPE not in ("SYSTEM VIEW")  

LIMIT 10


 

Output

Explanation

  1. Database
    • def
  2. Schema
    • audit_repository
  3. Table
  4. Table Type
    • Available Options
      • SYSTEM VIEW
      • BASE TABLE
  5. Engine
    • Available Options
      • Memory – System Tables
      • InnoDB
      • MyISAM
  6. TABLE ROWS
    • Number of records in table
  7. AVG ROW LENGTH
    • Based on column data type and used size allocation

 

Comparison with Other Databases

SQL Server

  1. Able to use double-quotes in query
    • We are able to use double-quotes (“) to delimit column values
  2. Limit N
    • In Transact SQL we use top N.
    • On the other hand, in MySQL we use LIMIT N

 

show tables

Objective

List tables and views that fits specified name format.

SQL

Syntax


show full tables from [database] like [name] where TABLE_TYPE in [BASE_TABLE | VIEW ];

Sample


show full tables from drupal like 'role%';

 

Output

 

describe

Objective

Display table’s structure.

SQL

Syntax


describe [table];

Sample


describe drupal.role;

Output

 

References

  1. MySQL
    • Show Tables
      • Show Tables Syntax
        Link
    • Show Commands
      • Extensions to SHOW Statements
        Link

Column Stores Indexes & Partitioning – Foundational Work

Background

As data grows bigger and ages it becomes prudent to look at partitioning as an avenue to better reduce the overall footprint of data read from disk into memory.

Column Store by its very definition delivers on the promises of Vertical Partitioning.

Let us see review the steps we have to take to achieve Range Partitioning.

 

Design Goals

Our goal is to divvy up our data into years.

In each table that we will be targeting we will ensure that a date column exists.

And, hope to have one of those date columns track the occurrence date.

 

Overview

Here are the plumbing areas:

  1. Create new Filegroups and files within the new file groups
  2. Create Partitioning Function
  3. Create Partitioning Scheme

 

Steps

Platform

File Groups & Files

Objective

Create File groups for each business year.

And, create a single file in each file group.

SQL


USE [WideWorldImportersDW]   
go

-- Year NULL

if not exists  
(      
	select *      
	from   sys.filegroups tblSFG      
	where  tblSFG.name = 'fg_Partition_Year_Null'  
)  
begin      

	ALTER DATABASE [WideWorldImportersDW] 
		ADD FILEGROUP  [fg_Partition_Year_Null]   
		
end 
go

if not exists  
	(      
		select *      
		from   sys.master_files tblMF      
		where  tblMF.[database_id] = db_id('WideWorldImportersDW')      
		and    tblMF.name = 'fg_Partition_Year_Null__01'  
	)  
begin 

   ALTER DATABASE [WideWorldImportersDW]  
   ADD FILE   
   (      
      NAME = [fg_Partition_Year_Null__01]   
      , FILENAME = 'Z:\Microsoft\SQLServer\DataFiles\WideWorldImportersDW_fg_Partition_Year_Null_file_001.ndf'  
   )  
   TO FILEGROUP [fg_Partition_Year_Null]  
	
end 


-- Year 2010
if not exists  
(      
    select *      
    from   sys.filegroups tblSFG      
    where  tblSFG.name = 'fg_Partition_Year_2010'  
)  
begin

   ALTER DATABASE [WideWorldImportersDW]         
      ADD FILEGROUP  [fg_Partition_Year_2010]   

end 
    
if not exists  
(      
   select *      
   from   sys.master_files tblMF      
   where  tblMF.[database_id] = db_id('WideWorldImportersDW')      
   and    tblMF.name = 'fg_Partition_Year_2010__01'  
)  
begin
   
   ALTER DATABASE [WideWorldImportersDW]  
   ADD FILE   
   (      
         NAME = [fg_Partition_Year_2010__01]   
       , FILENAME = 'Z:\Microsoft\SQLServer\DataFiles\WideWorldImportersDW_fg_Partition_Year_2010_file_001.ndf'  
  )  
  TO FILEGROUP [fg_Partition_Year_2010]  

end 


-- Year 2011
if not exists  
(      
    select *      
    from   sys.filegroups tblSFG      
    where  tblSFG.name = 'fg_Partition_Year_2011'  
)  
begin

   ALTER DATABASE [WideWorldImportersDW]         
      ADD FILEGROUP  [fg_Partition_Year_2011]   

end 
    
if not exists  
(      
   select *      
   from   sys.master_files tblMF      
   where  tblMF.[database_id] = db_id('WideWorldImportersDW')      
   and    tblMF.name = 'fg_Partition_Year_2011__01'  
)  
begin
   
   ALTER DATABASE [WideWorldImportersDW]  
   ADD FILE   
   (      
         NAME = [fg_Partition_Year_2011__01]   
       , FILENAME = 'Z:\Microsoft\SQLServer\DataFiles\WideWorldImportersDW_fg_Partition_Year_2011_file_001.ndf'  
  )  
  TO FILEGROUP [fg_Partition_Year_2011]  

end 


-- Year 2012
if not exists  
(      
    select *      
    from   sys.filegroups tblSFG      
    where  tblSFG.name = 'fg_Partition_Year_2012'  
)  
begin

   ALTER DATABASE [WideWorldImportersDW]         
      ADD FILEGROUP  [fg_Partition_Year_2012]   

end 
    
if not exists  
(      
   select *      
   from   sys.master_files tblMF      
   where  tblMF.[database_id] = db_id('WideWorldImportersDW')      
   and    tblMF.name = 'fg_Partition_Year_2012__01'  
)  
begin
   
   ALTER DATABASE [WideWorldImportersDW]  
   ADD FILE   
   (      
         NAME = [fg_Partition_Year_2012__01]   
       , FILENAME = 'Z:\Microsoft\SQLServer\DataFiles\WideWorldImportersDW_fg_Partition_Year_2012_file_001.ndf'  
  )  
  TO FILEGROUP [fg_Partition_Year_2012]  

end 


-- Year 2013
if not exists  
(      
    select *      
    from   sys.filegroups tblSFG      
    where  tblSFG.name = 'fg_Partition_Year_2013'  
)  
begin

   ALTER DATABASE [WideWorldImportersDW]         
      ADD FILEGROUP  [fg_Partition_Year_2013]   

end 
    
if not exists  
(      
   select *      
   from   sys.master_files tblMF      
   where  tblMF.[database_id] = db_id('WideWorldImportersDW')      
   and    tblMF.name = 'fg_Partition_Year_2013__01'  
)  
begin
   
   ALTER DATABASE [WideWorldImportersDW]  
   ADD FILE   
   (      
         NAME = [fg_Partition_Year_2013__01]   
       , FILENAME = 'Z:\Microsoft\SQLServer\DataFiles\WideWorldImportersDW_fg_Partition_Year_2013_file_001.ndf'  
  )  
  TO FILEGROUP [fg_Partition_Year_2013]  

end 


-- Year 2014
if not exists  
(      
    select *      
    from   sys.filegroups tblSFG      
    where  tblSFG.name = 'fg_Partition_Year_2014'  
)  
begin

   ALTER DATABASE [WideWorldImportersDW]         
      ADD FILEGROUP  [fg_Partition_Year_2014]   

end 
    
if not exists  
(      
   select *      
   from   sys.master_files tblMF      
   where  tblMF.[database_id] = db_id('WideWorldImportersDW')      
   and    tblMF.name = 'fg_Partition_Year_2014__01'  
)  
begin
   
   ALTER DATABASE [WideWorldImportersDW]  
   ADD FILE   
   (      
         NAME = [fg_Partition_Year_2014__01]   
       , FILENAME = 'Z:\Microsoft\SQLServer\DataFiles\WideWorldImportersDW_fg_Partition_Year_2014_file_001.ndf'  
  )  
  TO FILEGROUP [fg_Partition_Year_2014]  

end 


-- Year 2015
if not exists  
(      
    select *      
    from   sys.filegroups tblSFG      
    where  tblSFG.name = 'fg_Partition_Year_2015'  
)  
begin

   ALTER DATABASE [WideWorldImportersDW]         
      ADD FILEGROUP  [fg_Partition_Year_2015]   

end 
    
if not exists  
(      
   select *      
   from   sys.master_files tblMF      
   where  tblMF.[database_id] = db_id('WideWorldImportersDW')      
   and    tblMF.name = 'fg_Partition_Year_2015__01'  
)  
begin
   
   ALTER DATABASE [WideWorldImportersDW]  
   ADD FILE   
   (      
         NAME = [fg_Partition_Year_2015__01]   
       , FILENAME = 'Z:\Microsoft\SQLServer\DataFiles\WideWorldImportersDW_fg_Partition_Year_2015_file_001.ndf'  
  )  
  TO FILEGROUP [fg_Partition_Year_2015]  

end 


-- Year 2016
if not exists  
(      
    select *      
    from   sys.filegroups tblSFG      
    where  tblSFG.name = 'fg_Partition_Year_2016'  
)  
begin

   ALTER DATABASE [WideWorldImportersDW]         
      ADD FILEGROUP  [fg_Partition_Year_2016]   

end 
    
if not exists  
(      
   select *      
   from   sys.master_files tblMF      
   where  tblMF.[database_id] = db_id('WideWorldImportersDW')      
   and    tblMF.name = 'fg_Partition_Year_2016__01'  
)  
begin
   
   ALTER DATABASE [WideWorldImportersDW]  
   ADD FILE   
   (      
         NAME = [fg_Partition_Year_2016__01]   
       , FILENAME = 'Z:\Microsoft\SQLServer\DataFiles\WideWorldImportersDW_fg_Partition_Year_2016_file_001.ndf'  
  )  
  TO FILEGROUP [fg_Partition_Year_2016]  

end 

-- Year 2017
if not exists  
(      
    select *      
    from   sys.filegroups tblSFG      
    where  tblSFG.name = 'fg_Partition_Year_2017'  
)  
begin

   ALTER DATABASE [WideWorldImportersDW]         
      ADD FILEGROUP  [fg_Partition_Year_2017]   

end 
    
if not exists  
(      
   select *      
   from   sys.master_files tblMF      
   where  tblMF.[database_id] = db_id('WideWorldImportersDW')      
   and    tblMF.name = 'fg_Partition_Year_2017__01'  
)  
begin
   
   ALTER DATABASE [WideWorldImportersDW]  
   ADD FILE   
   (      
         NAME = [fg_Partition_Year_2017__01]   
       , FILENAME = 'Z:\Microsoft\SQLServer\DataFiles\WideWorldImportersDW_fg_Partition_Year_2017_file_001.ndf'  
  )  
  TO FILEGROUP [fg_Partition_Year_2017]  

end 

Partition Function

Objective

  1. Create a new Partition Function
  2. Indicate Column’s data type
    • date
  3. Range Type
    • Left or Right
      • Right Values
  4. Values
    • null
      • When date is null
      • When null entries are accounted for, please be sure to use convert function
      • Else
        • Msg 7705, Level 16, State 1
        • Could not implicitly convert range values type specified at ordinal 1 to partition function parameter type

SQL

Code

 CREATE PARTITION FUNCTION [pfnYear]
 ( 
    date 
 )     
 AS RANGE RIGHT FOR VALUES    
 (       
	  convert ( date , null ) 
	, '2010-01-01'
	, '2011-01-01'
	, '2012-01-01'
	, '2013-01-01'
	, '2014-01-01'
	, '2015-01-01'
	, '2016-01-01'
	, '2017-01-01'
	, '2018-01-01'
	, '2019-01-01'
	, '2020-01-01'  
) 

Partition Scheme

SQL



CREATE PARTITION SCHEME [pschemeYear]   
AS PARTITION [pfnYear]    
TO  
(       
	  [fg_Partition_Year_Null]
	, [fg_Partition_Year_2010]
	, [fg_Partition_Year_2011]
	, [fg_Partition_Year_2012]
	, [fg_Partition_Year_2013]
	, [fg_Partition_Year_2014]
	, [fg_Partition_Year_2015]
	, [fg_Partition_Year_2016]
	, [fg_Partition_Year_2017]
	, [fg_Partition_Year_2018]
	, [fg_Partition_Year_2019]
	, [fg_Partition_Year_2020]
	, [fg_Partition_Year_Next]  
	
) 

Summary

To make it easier to come back and edit our documentation, we will address using the underlying Partitioning components in a secondary post

 

Wide World Importers DW – fact.Sale – Query – Top N Sales for each year

Background

Let us start issuing queries against the Data Warehouse DB, WideWorldImportersDW, that we are in the process of actualizing.

 

Lineage

  1. World Wide Importers – Using On SQL Server 2014
    Published On :- 2017-August-23rd
    Link
  2. WideWorldImportersDW – Fattening Up – fact.Sale
    Published On :- 2017-August-31sth
    Link

 

Sample Query

Scenario

The first query is quite popular as it will be return the “top N customers for each year“.  The column that will cast the deciding vote is profit.

 

Choices

Here are the various query patterns that we will try out:

 

Approach Group By Implication
Windowing Function InvoiceDateKeyYear, Customer Key Grouping by actual column
Correlated Join / Max datepart(year, tblFS.[Invoice Date Key]), Customer Key Function/ Computed Column
Correlated Join / Max InvoiceDateKeyYear, Customer Key Grouping by actual column

Queries

Query – Windowing Function / group by Actual Columns

SQL


set XACT_ABORT on
go

set nocount on
go

set statistics io on;
go

dbcc dropcleanbuffers with no_infomsgs
go

use [WideWorldImportersDW]
go

declare @dateStart1 datetime
declare @dateEnd1 datetime

declare @dateStart2 datetime
declare @dateEnd2 datetime

declare @dateStart3 datetime
declare @dateEnd3 datetime

declare @topN int
declare @divider char(1)

declare @tblYear TABLE
(
	  [id] int not null identity(1,1)
	, [year] int not null

	, primary key
	(
		[year]
	)
)

insert into @tblYear
([year])
select 2012
union
select 2013
union
select 2014
union
select 2015
union
select 2016

set @topN = 2
set @divider = '='

print replicate(@divider, 120)

set @dateStart1 = getdate()

; with cteYearCustomer --[Fact].[SaleLargeRowStore]
(
	  [InvoiceDateKeyYear]
	, [CustomerKey]
	, [profit]
	, [rankID]
)
as
(
	select
			  --datepart(year, tblFS.[Invoice Date Key])
			  tblFS.[InvoiceDateKeyYear]
			, [CustomerKey] = tblFS.[Customer Key]
			, [profit] = tblFS.[profit]
			, [rankID] = rank()
							over
								(
									PARTITION by
										  tblFS.[InvoiceDateKeyYear]
										, tblFS.[Customer Key]

									order by
										tblFS.[profit] desc
								)

	from   [Fact].[SaleLargeRowStore] tblFS 

)

select
		  cteYC.[InvoiceDateKeyYear]
	    , cteYC.[CustomerKey]
		, cteYC.[profit]

from   @tblYear cteY

cross apply
		(
		    select distinct top (@topN) with ties *

			from   cteYearCustomer tblFS

			where  cteY.[year] = tblFS.[InvoiceDateKeyYear]

			and    tblFS.[rankID] = 1

			order by
					tblFS.[profit] desc

		) cteYC

order by
		  cteYC.[InvoiceDateKeyYear]
		, cteYC.[profit] desc
		, cteYC.[CustomerKey]

set @dateEnd1 = getdate()

print replicate(@divider, 120)

set @dateStart2 = getdate()

; with cteYearCustomer -- [Fact].[SaleLargeColumnStoreClustered]
(
	  [InvoiceDateKeyYear]
	, [CustomerKey]
	, [profit]
	, [rankID]
)
as
(
	select
			  --datepart(year, tblFS.[Invoice Date Key])
			  tblFS.[InvoiceDateKeyYear]
			, [CustomerKey] = tblFS.[Customer Key]
			, [profit] = tblFS.[profit]
			, [rankID] = rank()
							over
								(
									PARTITION by
										  tblFS.[InvoiceDateKeyYear]
										, tblFS.[Customer Key]

									order by
										tblFS.[profit] desc
								)

	from  [Fact].[SaleLargeColumnStoreClustered] tblFS

)

select
		  cteYC.[InvoiceDateKeyYear]
	    , cteYC.[CustomerKey]
		, cteYC.[profit]

from   @tblYear cteY

cross apply
		(

		    select distinct top (@topN) with ties *

			from   cteYearCustomer tblFS

			where  cteY.[year] = tblFS.InvoiceDateKeyYear

			and    tblFS.[rankID] = 1

			order by tblFS.[profit]

		) cteYC

order by
		  cteYC.[InvoiceDateKeyYear]
		, cteYC.[profit] desc
		, cteYC.[CustomerKey]

set @dateEnd2 = getdate()

print replicate(@divider, 120)

set @dateStart3 = getdate()

; with cteYearCustomer -- [Fact].[SaleLargeColumnStoreNonClustered]
(
	  [InvoiceDateKeyYear]
	, [CustomerKey]
	, [profit]
	, [rankID]
)
as
(
	select
			  --datepart(year, tblFS.[Invoice Date Key])
			  tblFS.[InvoiceDateKeyYear]
			, [CustomerKey] = tblFS.[Customer Key]
			, [profit] = tblFS.[profit]
			, [rankID] = rank()
							over
								(
									PARTITION by
										  tblFS.[InvoiceDateKeyYear]
										, tblFS.[Customer Key]

									order by
										tblFS.[profit] desc
								)

	from  [Fact].[SaleLargeColumnStoreNonClustered] tblFS

)

select
		  cteYC.[InvoiceDateKeyYear]
	    , cteYC.[CustomerKey]
		, cteYC.[profit]

from   @tblYear cteY

cross apply
		(

		    select distinct top (@topN) with ties *

			from   cteYearCustomer tblFS

			where  cteY.[year] = tblFS.InvoiceDateKeyYear

			and    tblFS.[rankID] = 1

			order by tblFS.[profit]

		) cteYC

order by
		  cteYC.[InvoiceDateKeyYear]
		, cteYC.[profit] desc
		, cteYC.[CustomerKey]

set @dateEnd3 = getdate()

print replicate(@divider, 120)

declare @datediffSum  int

set @datediffSum =
					  datediff(second, @dateStart1, @dateEnd1)
					+ datediff(second, @dateStart2, @dateEnd2)
					+ datediff(second, @dateStart3, @dateEnd3)

select 

		  [sourceID] = [sourceID]
		,  = 
		, [dateStart] = @dateStart1
		, [dateEnd] = @dateEnd1
		, [duration] =  [duration]
		, [%] = cast([%] as decimal(6, 2))

from   (

	select
			  [sourceID] = 1
			,  = '[Fact].[SaleLargeRowStore]-Index Chosen By Engine'
			, [dateStart1] = @dateStart1
			, [dateEnd1] = @dateEnd1
			, [duration] = datediff(second, @dateStart1, @dateEnd1)
			, [%] = (datediff(second, @dateStart1, @dateEnd1)) * 100.00 / @datediffSum

	union

	select
			  [sourceID] = 2
			,  = '[Fact].[SaleLargeColumnStoreClustered]'
			, [dateStart2] = @dateStart2
			, [dateEnd2] = @dateEnd2
			, [duration2] = datediff(second, @dateStart2, @dateEnd2)
			, [%] = (datediff(second, @dateStart2, @dateEnd2)) * 100.00 / @datediffSum

	union

	select
			  [sourceID] = 3
			,  = '[Fact].[SaleLargeColumnStoreNonClustered]'
			, [dateStart3] = @dateStart3
			, [dateEnd3] = @dateEnd3
			, [duration3] = datediff(second, @dateStart3, @dateEnd3)
			, [%] = (datediff(second, @dateStart3, @dateEnd3)) * 100.00 / @datediffSum
	) tblA

Query Plan

Query Plan – Individual Queries Plan
Query Plan – RowStore Clustered Index

Query Plan – RowStore Clustered Index – Clustered Index Scan ( Clustered )

Query Operator – Properties

Explanation

  1. Actual Number of Rows :- 61,140,035 ( 61 million )
  2. Description :- Scanning a clustered index, entirely or only a range.
    • Table Scan
  3. Cost
    • Estimated CPU Cost :- 13.451
    • Estimated IO Cost :- 254.633
    • Estimated Operator Cost :- 268.084
      • We can see most of cost is IO
    • Estimated Number of Executions :-1
      • The controller is a year table variable
      • For Table variable, # of records is assumed to 1
  4. Number of Reads
    • Number :-
      • Estimated Number of Rows :- 12,228,000
      • Actual Number of Rows :- 61,140,000

 

Query Plan – RowStore Clustered Index – Warning – “Operator used tempdb to spill data
Query Plan – ColumnStore Clustered Index

Query Plan – Column Store Clustered Index – Clustered Index Scan ( Clustered )

Query Operator – Properties

Query Operator – Explanation
  1. Actual Number of Rows :- 61140025 ( 61 million )
  2. Description :- Scanning a clustered index, entirely or only a range.
    • Table Scan
  3. Cost
    • Estimated CPU Cost :- 13.451
    • Estimated IO Cost :- 3.63868
    • Estimated Operator Cost :- 17.0896
      • IO Cost is 25% of CPU Cost
    • Estimated Number of Executions :-1
      • The controller is a year table variable
      • For Table variable, # of records is assumed to 1
  4. Number of Reads
    • Number :-
      • Estimated Number of Rows :- 12228000
      • Actual Number of Rows :- 61140025

 

 

Query Plan – ColumnStore Non-Clustered Index

Query Plan – Column Store Clustered Index – NonClustered Index Scan ( Clustered )

Query Operator – Properties

Explanation

  1. Actual Number of Rows :- 61140025 ( 6.1 million )
  2. Description :- Scan rows from a table.
    • Table Scan
  3. Cost
    • Estimated CPU Cost :- 13.451
    • Estimated IO Cost :- 254.747
    • Estimated Operator Cost :- 268.198
      • We can see most of cost is IO
      • IO if 95% of Total
    • Estimated Number of Executions :-1
      • The controller is a year table variable
      • For Table variable, # of records is assumed to 1
  4. Number of Reads
    • Number :-
      • Estimated Number of Rows :- 12228000
      • Actual Number of Rows :- 61140025

 

Query Plan – ReadOut
  1. Operator :- Sort
    • On all of approaches the Sort Operator is the most expensive
    • Row Store Index
      • Operator used tempdb to spill data during execution with spill level 1 and 1 spilled thread(s), Sort wrote 476430 pages to and read 476430 pages from tempdb with granted memory 4845520KB and used memory 4845520 KB
      • High granted and used Memory
        • 4845520 KB
          • 4845.52 MB
          • 4.84552 GB
  2. Operator :- Sequence Project
    • Norm of Windowing Functions

 

Statistics I/O

Image

 

Tabulated

 

Table Type Table / Target IO Stats
Row Store – Clustered Index
SaleLargeRowStore Scan count 5, logical reads 1722395, physical reads 2, read-ahead reads 344480, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Worktable Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 158810, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Workfile Table ‘Workfile’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table Variable – Year Table ‘#A3F66ED9’. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Column Store – Clustered Index
SaleLargeColumnStoreClustered Table ‘SaleLargeColumnStoreClustered’. Scan count 5, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 10349, lob physical reads 0, lob read-ahead reads 0.

Table ‘SaleLargeColumnStoreClustered’. Segment reads 70, segment skipped 0.

Worktable Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table Variable – Year Table ‘#A3F66ED9’. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Column Store – Non Clustered Index
SaleLargeColumnStoreNonClustered Table ‘SaleLargeColumnStoreNonClustered’. Scan count 5, logical reads 1719525, physical reads 0, read-ahead reads 343604, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Worktable Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table Variable – Year Table ‘#A3F66ED9’. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Explanation

  1. The Row Store Clustered Index & the Heap Column Store Index have very high I/O
  2. On the other hand, the Column Store – Clustered Index, have much lower I/O

 

Timing

Image

Explanation
  1. Duration
    • Percentile
      • The Column Stored Clustered Index is a bit better at 27%
      • While the Row Store Clustered Index & Column Store Heap are at 35 and 37%
    • Actual Time – Individual Queries
      • Row Store – Clustered Index
        • 433 seconds or 7.21 minutes
      • Column Store – Clustered Index
        • 330 seconds or 5.5 minutes
      • Column Stored – Non Clustered Index
        • 408 seconds or 6.8 minutes
    • Actual Time – All Queries
      • 19.09 minutes

 

Correlated Join / Max – group By Function [ datepart(year) ]

SQL


dbcc dropcleanbuffers with no_infomsgs
go

set nocount on
go

set statistics io on;
go

use [WideWorldImportersDW]
go

declare @dateStart1 datetime
declare @dateEnd1 datetime

declare @dateStart2 datetime
declare @dateEnd2 datetime

declare @dateStart3 datetime
declare @dateEnd3 datetime

declare @datediffSum  int

declare @topN int

declare @tblYear TABLE
(
	  [id] int not null identity(1,1)
	, [year] int not null

	, primary key
	(
		[year]
	)
)

insert into @tblYear
([year])
select 2012
union
select 2013
union
select 2014
union
select 2015
union
select 2016

set @topN = 2

print replicate('*', 120)

set @dateStart1 = getdate()

; with cteYearCustomer
(
	  [InvoiceDateKeyYear]
	, [CustomerKey]
	, [profit]
)
as
(
	select
			  datepart(year, tblFS.[Invoice Date Key])
			 -- tblFS.[InvoiceDateKeyYear]
			, [CustomerKey] = tblFS.[Customer Key]
			, [profit] = max(tblFS.[profit])

	from   [Fact].[SaleLargeRowStore] tblFS --with ( INDEX =1)

	group by
			  datepart(year, tblFS.[Invoice Date Key])
			  --tblFS.[InvoiceDateKeyYear]
			, tblFS.[Customer Key]

)

select
		  cteYC.[InvoiceDateKeyYear]
	    , cteYC.[CustomerKey]
		, cteYC.[profit]

from   @tblYear cteY

cross apply
		(
		    select top (@topN) with ties *

			from   cteYearCustomer tblFS

			where  cteY.[year] = tblFS.[InvoiceDateKeyYear]

			order by tblFS.[profit]

		) cteYC

order by
		  cteYC.[InvoiceDateKeyYear]
		, cteYC.[profit] desc
		, cteYC.[CustomerKey]

set @dateEnd1 = getdate()

print replicate('*', 120)

set @dateStart2 = getdate()

; with cteYearCustomer
(
	  [InvoiceDateKeyYear]
	, [CustomerKey]
	, [profit]
)
as
(
	select
			 datepart(year, tblFS.[Invoice Date Key])
			 -- tblFS.[InvoiceDateKeyYear]
			, [CustomerKey] = tblFS.[Customer Key]
			, [profit] = max(tblFS.[profit])

	from  [Fact].[SaleLargeColumnStoreClustered] tblFS

	group by
			  datepart(year, tblFS.[Invoice Date Key])
			  --tblFS.[InvoiceDateKeyYear]
			, tblFS.[Customer Key]

)

select
		  cteYC.[InvoiceDateKeyYear]
	    , cteYC.[CustomerKey]
		, cteYC.[profit]

from   @tblYear cteY

cross apply
		(
		    select top (@topN) with ties *

			from   cteYearCustomer tblFS

			where  cteY.[year] = tblFS.InvoiceDateKeyYear

			order by tblFS.[profit]

		) cteYC

order by
		  cteYC.[InvoiceDateKeyYear]
		, cteYC.[profit] desc
		, cteYC.[CustomerKey]

set @dateEnd2 = getdate()

print replicate('*', 120)

set @dateStart3 = getdate()

; with cteYearCustomer
(
	  [InvoiceDateKeyYear]
	, [CustomerKey]
	, [profit]
)
as
(
	select
			  datepart(year, tblFS.[Invoice Date Key])
			  --tblFS.[InvoiceDateKeyYear]
			, [CustomerKey] = tblFS.[Customer Key]
			, [profit] = max(tblFS.[profit])

	from  [Fact].[SaleLargeColumnStoreNonClustered] tblFS

	group by
			  datepart(year, tblFS.[Invoice Date Key])
			 -- tblFS.[InvoiceDateKeyYear]
			, tblFS.[Customer Key]
)

select
		  cteYC.[InvoiceDateKeyYear]
	    , cteYC.[CustomerKey]
		, cteYC.[profit]

from   @tblYear cteY

cross apply
		(
		    select top (@topN) with ties *

			from   cteYearCustomer tblFS

			where  cteY.[year] = tblFS.InvoiceDateKeyYear

			order by tblFS.[profit]

		) cteYC

order by
		  cteYC.[InvoiceDateKeyYear]
		, cteYC.[profit] desc
		, cteYC.[CustomerKey]

set @dateEnd3 = getdate()

print replicate('*', 120)

set @datediffSum =
			      datediff(second, @dateStart1, @dateEnd1)
				+ datediff(second, @dateStart2, @dateEnd2)
				+ datediff(second, @dateStart3, @dateEnd3)

select 

		  [sourceID] = [sourceID]
		,  = 
		, [dateStart] = dateStart1
		, [dateEnd] = dateEnd1
		, [duration] =  [duration]
		, [%] = cast([%] as decimal(6, 2))

from
	(

		select
				  [sourceID] = 1
				,  = '[Fact].[SaleLargeRowStore]'
				, [dateStart1] = @dateStart1
				, [dateEnd1] = @dateEnd1
				, [duration] = datediff(second, @dateStart1, @dateEnd1)
				, [%] = (datediff(second, @dateStart1, @dateEnd1)) * 100.00 / @datediffSum

		union

		select
				  [sourceID] = 2
				,  = '[Fact].[SaleLargeColumnStoreClustered]'
				, [dateStart2] = @dateStart2
				, [dateEnd2] = @dateEnd2
				, [duration2] = datediff(second, @dateStart2, @dateEnd2)
				, [%] = (datediff(second, @dateStart2, @dateEnd2)) * 100.00 / @datediffSum

		union

		select
				  [sourceID] = 3
				,  = '[Fact].[SaleLargeColumnStoreNonClustered]'
				, [dateStart3] = @dateStart3
				, [dateEnd3] = @dateEnd3
				, [duration3] = datediff(second, @dateStart3, @dateEnd3)
				, [%] = (datediff(second, @dateStart3, @dateEnd3)) * 100.00 / @datediffSum

	) tblA

go

 

Query Plan

Query Plan – Individual Queries Plan
Query Plan – RowStore Clustered Index

Image

Explanation

  1. The Clustered Index Scan is the most expensive operation
  2. The Sort came in at 0

 

Query Plan – Column Store Clustered Index

Image

Explanation

  1. The Column Index Scan is at 49%
  2. The compute scalar is at 3%
    • Processing datepart(year, [InvoiceDate])
  3. The Filter ( for each year is at 16 %)
  4. And, the Hash match is 31%
  5. The Sort is only 0%

 

Operator – Compute Scaler

Image

Defined Values

Explanation

  1. [Expr1003] = Scalar Operator(datepart(year,[WideWorldImportersDW].[Fact].[SaleLargeColumnStoreClustered].[Invoice Date Key] as [tblFS].[Invoice Date Key]))
Operator – Filter

Image – Properties

Image – Properties – Filter By Predicate

Explanation

  1. [Expr1003] = Scalar Operator(datepart(year,[WideWorldImportersDW].[Fact].[SaleLargeColumnStoreClustered].[Invoice Date Key] as [tblFS].[Invoice Date Key]))

 

Query Plan – Column Store Non Clustered Index

Image

Explanation

  1. The Column Index Scan is at 94%
  2. The compute scalar is at 0%
    • Processing datepart(year, [InvoiceDate])
  3. No Filtering
    • It looks like we have just data rows that we need
  4. The Sort is only 0%

Statistics I/O

Image

Tabulated
Table Type Table / Target IO Stats

Row Store – Clustered Index
SaleLargeRowStore Table ‘SaleLargeRowStore’. Scan count 5, logical reads 1722395, physical reads 2, read-ahead reads 323184, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Worktable Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Workfile Table ‘Workfile’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table Variable – Year Table ‘#A7C6FFBD’. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Column Store – Clustered Index
SaleLargeColumnStoreClustered Table ‘SaleLargeColumnStoreClustered’. Scan count 5, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 13464, lob physical reads 0, lob read-ahead reads 0.

Table ‘SaleLargeColumnStoreClustered’. Segment reads 70, segment skipped 0.

Worktable Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table Variable – Year Table ‘#A7C6FFBD’. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Column Store – Non Clustered Index
SaleLargeColumnStoreNonClustered Table ‘SaleLargeColumnStoreNonClustered’. Scan count 5, logical reads 1719525, physical reads 0, read-ahead reads 342488, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Worktable Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table Variable – Year Table ‘#A3F66ED9’. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Explanation

  1. The Row Store Clustered Index & the Heap Column Store Index have very high I/O
    • They are table scans, one is searching through a clustered index and the other through a heap
  2. On the other hand, the Column Store – Clustered Index, has much lower I/O
    • Likely based on the higher compression that is used for Column Store Indexes

 

Timing

Image

Explanation
  1. Duration
    • Percentile
      • The Column Stored Clustered Index is a bit better at 30%
      • The Row Store Clustered Index is at 37.78%
      • And, the Column Stored Heap is at 32.22%
    • Actual Time – Individual Queries
      • Row Store – Clustered Index
        • 27 seconds
      • Column Store – Clustered Index
        • 29 seconds
      • Column Stored – Non Clustered Index
        • 34 seconds
    • Actual Time – All Queries
      • 1.36 minutes
      • or 96 seconds

 

Correlated Join / Max – group By Actual Column – year

SQL


use [WideWorldImportersDW]
go

dbcc dropcleanbuffers with no_infomsgs
go

set nocount on
go

set statistics io on;
go

declare @dateStart1 datetime
declare @dateEnd1 datetime

declare @dateStart2 datetime
declare @dateEnd2 datetime

declare @dateStart3 datetime
declare @dateEnd3 datetime

declare @topN int

declare @tblYear TABLE
(
	  [id] int not null identity(1,1)
	, [year] int not null

	, primary key
	(
		[year]
	)
)

insert into @tblYear
([year])
select 2012
union
select 2013
union
select 2014
union
select 2015
union
select 2016

set @topN = 2

print replicate('*', 120)

print replicate('*', 120)

set @dateStart1 = getdate()

; with cteYearCustomer
(
	  [InvoiceDateKeyYear]
	, [CustomerKey]
	, [profit]
)
as
(
	select
			  --datepart(year, tblFS.[Invoice Date Key])
			  tblFS.[InvoiceDateKeyYear]
			, [CustomerKey] = tblFS.[Customer Key]
			, [profit] = max(tblFS.[profit])

	from   [Fact].[SaleLargeRowStore] tblFS --with ( INDEX =1)

	group by
			  --datepart(year, tblFS.[Invoice Date Key])
			  tblFS.[InvoiceDateKeyYear]
			, tblFS.[Customer Key]

)

select
		  cteYC.[InvoiceDateKeyYear]
	    , cteYC.[CustomerKey]
		, cteYC.[profit]

from   @tblYear cteY

cross apply
		(
		    select top (@topN) with ties *

			from   cteYearCustomer tblFS

			where  cteY.[year] = tblFS.[InvoiceDateKeyYear]

			order by tblFS.[profit]

		) cteYC

order by
		  cteYC.[InvoiceDateKeyYear]
		, cteYC.[profit] desc
		, cteYC.[CustomerKey]

set @dateEnd1 = getdate()

print replicate('*', 120)

set @dateStart2 = getdate()

; with cteYearCustomer
(
	  [InvoiceDateKeyYear]
	, [CustomerKey]
	, [profit]
)
as
(
	select
			--  datepart(year, tblFS.[Invoice Date Key])
			  tblFS.[InvoiceDateKeyYear]
			, [CustomerKey] = tblFS.[Customer Key]
			, [profit] = max(tblFS.[profit])

	from  [Fact].[SaleLargeColumnStoreClustered] tblFS

	group by
			  --datepart(year, tblFS.[Invoice Date Key])
			  tblFS.[InvoiceDateKeyYear]
			, tblFS.[Customer Key]

)

select-- top 5 with ties
		  cteYC.[InvoiceDateKeyYear]
	    , cteYC.[CustomerKey]
		, cteYC.[profit]

--from   cteYear
from   @tblYear cteY

cross apply
		(
		    select top (@topN) with ties *

			from   cteYearCustomer tblFS

			where  cteY.[year] = tblFS.InvoiceDateKeyYear

			order by tblFS.[profit]

		) cteYC

order by
		  cteYC.[InvoiceDateKeyYear]
		, cteYC.[profit] desc
		, cteYC.[CustomerKey]

set @dateEnd2 = getdate()

print replicate('*', 120)

set @dateStart3 = getdate()

; with cteYearCustomer
(
	  [InvoiceDateKeyYear]
	, [CustomerKey]
	, [profit]
)
as
(
	select
			  --datepart(year, tblFS.[Invoice Date Key])
			  tblFS.[InvoiceDateKeyYear]
			, [CustomerKey] = tblFS.[Customer Key]
			, [profit] = max(tblFS.[profit])

	from  [Fact].[SaleLargeColumnStoreNonClustered] tblFS

	group by
			 -- datepart(year, tblFS.[Invoice Date Key])
			  tblFS.[InvoiceDateKeyYear]
			, tblFS.[Customer Key]
)

select-- top 5 with ties
		  cteYC.[InvoiceDateKeyYear]
	    , cteYC.[CustomerKey]
		, cteYC.[profit]

from   @tblYear cteY

cross apply
		(
		    select top (@topN) with ties *

			from   cteYearCustomer tblFS

			where  cteY.[year] = tblFS.InvoiceDateKeyYear

			order by tblFS.[profit]

		) cteYC

order by
		  cteYC.[InvoiceDateKeyYear]
		, cteYC.[profit] desc
		, cteYC.[CustomerKey]

set @dateEnd3 = getdate()

print replicate('*', 120)

declare @datediffSum  int

set @datediffSum =
					  datediff(second, @dateStart1, @dateEnd1)
					+ datediff(second, @dateStart2, @dateEnd2)
					+ datediff(second, @dateStart3, @dateEnd3)

select 

		  [sourceID] = [sourceID]
		,  = 
		, [dateStart] = dateStart1
		, [dateEnd] = dateEnd1
		, [duration] =  [duration]
		, [%] = cast([%] as decimal(6, 2))

from   (

	select
			  [sourceID] = 1
			,  = '[Fact].[SaleLargeRowStore]'
			, [dateStart1] = @dateStart1
			, [dateEnd1] = @dateEnd1
			, [duration] = datediff(second, @dateStart1, @dateEnd1)
			, [%] = (datediff(second, @dateStart1, @dateEnd1)) * 100.00 / @datediffSum

	union

	select
			  [sourceID] = 2
			,  = '[Fact].[SaleLargeColumnStoreClustered]'
			, [dateStart2] = @dateStart2
			, [dateEnd2] = @dateEnd2
			, [duration2] = datediff(second, @dateStart2, @dateEnd2)
			, [%] = (datediff(second, @dateStart2, @dateEnd2)) * 100.00 / @datediffSum

	union

	select
			  [sourceID] = 3
			,  = '[Fact].[SaleLargeColumnStoreNonClustered]'
			, [dateStart3] = @dateStart3
			, [dateEnd3] = @dateEnd3
			, [duration3] = datediff(second, @dateStart3, @dateEnd3)
			, [%] = (datediff(second, @dateStart3, @dateEnd3)) * 100.00 / @datediffSum
	) tblA

Query Plan

Query Plan – Individual Queries Plan
Query Plan – RowStore Clustered Index
Image

Explanation
  1. The Clustered Index Scan takes up 94%
  2. Sort is 0%
  3. We are told of the missing Index
Image – Operator – Clustered Index Scan
  1. Description :- Scanning a clustered index, entirely or only a range.
    • Table Scan
  2. Cost
    • Estimated CPU Cost :- 13.451
    • Estimated IO Cost :- 254.633
    • Estimated Operator Cost :- 268.084
      • We can see most of cost is IO
  3. Number of Executions
    • Estimated Number of Executions :-1
      • The controller is a year table variable
      • For Table variable, # of records is assumed to 1
    • Actual Number of Executions :- 5
      • Number of records in Table variables ( year )
  4. Number of Reads
    • Estimated Number of Rows :-
      • 12228000 ( 12 million )
    • Number of Rows Reads :-
      • 61140025 ( 61 million )
        • Number of Record in Table :- 12 million
        • Number of batches :- 5
Image – Operator – Hash Matched – Defined Values

Image – Hash Matched – Defined Values

Explanation
  1. Grouping by InvoiceDateKeyYear

 

Image – Missing Index

Explanation
  1. We are told, actually warned, of a missing index
  2. The Index definition is “[Fact].[SaleLargeRowStore] ([InvoiceDateKeyYear]) INCLUDE ([Customer Key],[Profit])
  3. And, so we are asked to look into having an index keyed on our grouping column ( InvoiceDateKeyYear )
  4. And, the other columns that we are referencing ( Customer Key and profit) included

 

Query Plan – RowStore Clustered Index – Clustered Index Scan ( Clustered )

Query Operator – Properties

Explanation

  1. Actual Number of Rows :- 61140025 ( 6.1 million )
  2. Description :- Scanning a clustered index, entirely or only a range.
    • Table Scan
  3. Cost
    • Estimated CPU Cost :- 13.451
    • Estimated IO Cost :- 254.633
    • Estimated Operator Cost :- 268.084
      • We can see most of our cost is IO
    • Estimated Number of Executions :-1
      • The controller is a year table variable
      • For Table variable, # of records is assumed to 1
  4. Number of Reads
    • Number :-
      • Estimated Number of Rows :- 12228000
      • Actual Number of Rows :- 61140025
    • Explanation :-
      • Estimate is just
      • Likely because of predicate push down we ended up reading only half the full number of records

 

Query Plan – RowStore Clustered Index – Warning – “Operator used tempdb to spill data
Query Plan – ColumnStore Clustered Index

Explanation
  1. The Clustered Index Scan takes up 46%
  2. Missing Index
    • We are told of the missing Index
    • Impact of missing index :- 40%
  3. Hash Match Aggregate Operator
    • Hash Match operator use in calculating max
  4. Filter Operator
    • Filter on Year
  5. Sort is 0%

 

Query Plan – Column Store Clustered Index – Clustered Index Scan ( Clustered )

Query Operator – Properties

Query Operator – Explanation
  1. Actual Number of Rows :- 61140025 ( 6.1 million )
  2. Description :- Scanning a clustered index, entirely or only a range.
    • Table Scan
  3. Cost
    • Estimated CPU Cost :- 13.451
    • Estimated IO Cost :- 3.63868
    • Estimated Operator Cost :- 17.0896
      • IO Cost is 25% of CPU Cost
    • Estimated Number of Executions :-1
      • The controller is a year table variable
      • For Table variable, # of records is assumed to 1
  4. Number of Reads
    • Number :-
      • Estimated Number of Rows :- 12228000
      • Actual Number of Rows :- 61140025
    • Explanation :-
      • Estimate is just
      • Likely because of predicate push down we ended up reading only half the full number of records

 

 

Query Plan – ColumnStore Non-Clustered Index

Image

Explanation
  1. The Clustered Index Scan takes up 94%
  2. Hash Match Aggregate Operator
    • takes up 6%
    • Hash Match operator use in calculating max
  3. Sort is 0%
Query Plan – Column Store Clustered Index – NonClustered Index Scan ( Clustered )

Query Operator – Properties – Columnstore Index Scan

Explanation

  1. Actual Execution Mode
    • Row
  2. Description
    • Scan a columnstore index, entirely or only a range.
  3. Cost
    • Estimated CPU Cost :- 13.451
    • Estimated IO Cost :- 3.63868
    • Estimated Operator Cost :- 17.0896
  4. Number of Executions
    • Estimated Number of Executions :-1
      • The controller is a year table variable
      • For Table variable, # of records is assumed to 1
  5. Number of Reads
    • Estimated Number of Rows :- 12228000 ( 12 million )
    • Actual Number of Rows :- 12228000 ( 12 million )
    • Number of Rows Read :- 61140025 ( 61 million )

Statistics I/O

Image

Tabulated
Table Type Table / Target IO Stats

Row Store – Clustered Index
SaleLargeRowStore Table ‘SaleLargeRowStore’. Scan count 5, logical reads 1722395, physical reads 2, read-ahead reads 344480, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Worktable Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Workfile Table ‘Workfile’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table Variable – Year Table ‘#A7C6FFBD’. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Column Store – Clustered Index
SaleLargeColumnStoreClustered Table ‘SaleLargeColumnStoreClustered’. Scan count 5, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 10349, lob physical reads 0, lob read-ahead reads 0.

Table ‘SaleLargeColumnStoreClustered’. Segment reads 70, segment skipped 0.

Worktable Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table Variable – Year Table ‘#A7C6FFBD’. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Column Store – Non Clustered Index
SaleLargeColumnStoreNonClustered Table ‘SaleLargeColumnStoreNonClustered’. Scan count 5, logical reads 1719525, physical reads 0, read-ahead reads 343604, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Worktable Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table Variable – Year Table ‘#A3F66ED9’. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Explanation

  1. The Row Store Clustered Index & the Heap Column Store Index have very high I/O
    • They are table scans, one is searching through a clustered index and the other through a heap
  2. On the other hand, the Column Store – Clustered Index, has much lower I/O
    • Likely based on the higher compression that is used for Column Store Indexes

 

Timing

Image

Explanation
  1. Duration
    • Percentile
      • The Column Stored Clustered Index is a bit better at 30.77%
      • The Row Store Clustered Index is at 35.38%
      • And, the Column Stored Heap is at 33.85%
    • Actual Time – Individual Queries
      • Row Store – Clustered Index
        • 23 seconds
      • Column Store – Clustered Index
        • 20 seconds
      • Column Stored – Non Clustered Index
        • 22 seconds
    • Actual Time – All Queries
      • 1.06 minutes
      • or 66 seconds

 

Source Code Control

GitHub

WideWorldImportersDWFactSale/Query/QueryMaxByYear.InvoiceDate/
Link

 

Methodology Group By Filename
Windowing Function Group by InvoiceDateKeyYear, CustomerKey queryMaxByYearCorrelatedJoin_GroupByActualColumn.sql
Correlated Join / Max Group by Datepart(year, InvoiceDateKey), CustomerKey queryMaxByYearCorrelatedJoin_GroupByFunctionDatePartYear.sql
Correlated Join / Max Group by InvoiceDateKeyYear, CustomerYear queryMaxByYearWindowsFunction_GroupByActualColumn.sql

 

References

  1. Windowing Functions
    • Dwain Camps
      • The Performance of the T-SQL Window Functions
        Link

 

Summary

Thankfully, there is a lot here.

We only looked at two ways of identifying the Max Record; and those two approaches are Window Functions and Correlated Max.

The Windowing Function seems to be more expensive.

The Column Store Clustered Index is the most performant.

Likely specific Row Store Indexes would have been better; but we did not take the time to create one.

Still have yet to find queries that will make us choose a Non-Clustered Column Index over a Clustered Column Index.

 

WideWorldImportersDW – Fattening Up – fact.Sale

Background

In an earlier post we spoke about restoring Microsoft’s new sample database, WideWorldImportersDW.

The end game is to have a database to try out Columnar Indexes.

Lineage

It is just one post so far, but let us claim space for additional journaling along the same thoughts:

  1. World Wide Importers – Using On SQL Server 2014
    Published On :- 2017-August-23rd
    Link

 

Metadata – Microsoft

Number of Records

Code


use [WideWorldImportersDW]
go

select
		
          [table]
            =  tblSS.[name]
                + '.'
                + tblSO.[name]
 
        , [indexName]
            = tblSI.[name]
                     
        , [indexType]
          = tblSI.[type_desc]
 
        , [numberofRecords]
            = sum(tblSP.[rows])
 
        , numberofPagesData
            = sum(tblSPS.in_row_data_page_count)
 
        , numberofPagesLOB
            = sum(tblSPS.lob_used_page_count)
 
        , usedPageCount
            = sum(used_page_count)
 
from   sys.schemas tblSS
 
inner join sys.objects tblSO
 
    on tblSS.schema_id = tblSO.schema_id
 
inner join sys.indexes tblSI
 
    on tblSO.object_id = tblSI.object_id
 
 
inner join sys.partitions tblSP
 
    on tblSI.object_id = tblSP.object_id
    and tblSI.index_id = tblSP.index_id
 
 
inner join sys.dm_db_partition_stats tblSPS
 
    on tblSP.object_id = tblSPS.object_id
    and tblSP.index_id = tblSPS.index_id
    and tblSP.partition_id = tblSPS.partition_id
 
 
where tblSI.index_id in (0,1) 

and   tblSS.[name] = 'Fact'
 
group by
          tblSS.[name]
        , tblSO.[name]
        , tblSI.index_id
        , tblSI.[name]
        , tblSI.[type_desc]
 
order by
          tblSS.[name]
        , tblSO.[name]
        , tblSI.index_id



Image

Explanation

Our three biggest Fact tables ( Fact.Movement, Fact.Order, Fact.Sale ) are about 2.3 million records.

 

Make Bigger

fact.Sale

We chose fact.Sale out of our largest Fact tables.

The reason been the presence of a Stored Procedure (SP), [Application].[Configuration_PopulateLargeSaleTable].

Here are the other tooling available SPs in the same schema, Application.

Objective

We will leave the original sales table, dbo.Fact, as is.

And, create new ones.

 

List of Tables

Here are the tables.

Usage Table
Sales Table – Original Fact.Sale
Sales Table – Row Store – Clustered Index Fact.SaleLargeRowStore
Sales Table – Column Store – Clustered Index Fact.SaleLargeColumnStoreClustered
Sales Table – Column Store – NonClustered Index Fact.SaleLargeColumnStoreNonClustered

Create Tables

Code

Fact.SaleLargeRowStore


SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

USE [WideWorldImportersDW]
GO

/*
    DROP TABLE [Fact].[SaleLargeRowStore]
*/

if object_id('[Fact].[SaleLargeRowStore]') is null
begin

    CREATE TABLE [Fact].[SaleLargeRowStore]
    (
        [Sale Key] [bigint] IDENTITY(1,1) NOT NULL,
        [City Key] [int] NOT NULL,
        [Customer Key] [int] NOT NULL,
        [Bill To Customer Key] [int] NOT NULL,
        [Stock Item Key] [int] NOT NULL,
        
        [Invoice Date Key] [date] NOT NULL,

        [InvoiceDateKeyYear] [int] NOT NULL,
        
        [Delivery Date Key] [date] NULL,
        
        [DeliveryDateKeyYear] [int] NULL,
        
        [Salesperson Key] [int] NOT NULL,
        [WWI Invoice ID] [int] NOT NULL,
        [Description] [nvarchar](100) NOT NULL,
        [Package] [nvarchar](50) NOT NULL,
        [Quantity] [int] NOT NULL,
        [Unit Price] [decimal](18, 2) NOT NULL,
        [Tax Rate] [decimal](18, 3) NOT NULL,
        [Total Excluding Tax] [decimal](18, 2) NOT NULL,
        [Tax Amount] [decimal](18, 2) NOT NULL,
        [Profit] [decimal](18, 2) NOT NULL,
        [Total Including Tax] [decimal](18, 2) NOT NULL,
        [Total Dry Items] [int] NOT NULL,
        [Total Chiller Items] [int] NOT NULL,
        [Lineage Key] [int] NOT NULL
    
            , CONSTRAINT [PK_Fact_SaleLargeRowStore] PRIMARY KEY CLUSTERED 
            (
                [Sale Key] ASC
            )
            WITH 
            (
                  PAD_INDEX = OFF
                , STATISTICS_NORECOMPUTE = OFF
                , IGNORE_DUP_KEY = OFF
                , ALLOW_ROW_LOCKS = ON
                , ALLOW_PAGE_LOCKS = ON
            ) ON [USERDATA]

        ) ON [USERDATA]

end
go

Fact.SaleLargeColumnStoreClustered

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

USE [WideWorldImportersDW]
GO

if object_id('[Fact].[SaleLargeColumnStoreClustered]') is null
begin

	CREATE TABLE [Fact].[SaleLargeColumnStoreClustered]
	(
		[Sale Key] [bigint] IDENTITY(1,1) NOT NULL,
		[City Key] [int] NOT NULL,
		[Customer Key] [int] NOT NULL,
		[Bill To Customer Key] [int] NOT NULL,
		[Stock Item Key] [int] NOT NULL,
	
		[Invoice Date Key] [date] NOT NULL,
		[InvoiceDateKeyYear] [int] NOT NULL,
			
		[Delivery Date Key] [date] NULL,
		[DeliveryDateKeyYear] [int] NULL,
			
		[Salesperson Key] [int] NOT NULL,
		[WWI Invoice ID] [int] NOT NULL,
		[Description] [nvarchar](100) NOT NULL,
		[Package] [nvarchar](50) NOT NULL,
		[Quantity] [int] NOT NULL,
		[Unit Price] [decimal](18, 2) NOT NULL,
		[Tax Rate] [decimal](18, 3) NOT NULL,
		[Total Excluding Tax] [decimal](18, 2) NOT NULL,
		[Tax Amount] [decimal](18, 2) NOT NULL,
		[Profit] [decimal](18, 2) NOT NULL,
		[Total Including Tax] [decimal](18, 2) NOT NULL,
		[Total Dry Items] [int] NOT NULL,
		[Total Chiller Items] [int] NOT NULL,
		[Lineage Key] [int] NOT NULL
	
			

		) ON [USERDATA]

		CREATE CLUSTERED COLUMNSTORE INDEX [INDX_FACT_SaleLargeColumnStoreClustered]
			ON [Fact].[SaleLargeColumnStoreClustered]
			ON [USERDATA]
			;  
	)

end
go
	
Fact.SaleLargeColumnStoreNonClustered

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO


USE [WideWorldImportersDW]
GO

if object_id('[Fact].[SaleLargeColumnStoreNonClustered]') is null
begin

	CREATE TABLE [Fact].[SaleLargeColumnStoreNonClustered]
	(
		[Sale Key] [bigint] IDENTITY(1,1) NOT NULL,
		[City Key] [int] NOT NULL,
		[Customer Key] [int] NOT NULL,
		[Bill To Customer Key] [int] NOT NULL,
		[Stock Item Key] [int] NOT NULL,

		[Invoice Date Key] [date] NOT NULL,
		[InvoiceDateKeyYear] [int] NOT NULL,

		[Delivery Date Key] [date] NULL,
		[DeliveryDateKeyYear] [int] NULL,
				
		[Salesperson Key] [int] NOT NULL,
		[WWI Invoice ID] [int] NOT NULL,
		[Description] [nvarchar](100) NOT NULL,
		[Package] [nvarchar](50) NOT NULL,
		[Quantity] [int] NOT NULL,
		[Unit Price] [decimal](18, 2) NOT NULL,
		[Tax Rate] [decimal](18, 3) NOT NULL,
		[Total Excluding Tax] [decimal](18, 2) NOT NULL,
		[Tax Amount] [decimal](18, 2) NOT NULL,
		[Profit] [decimal](18, 2) NOT NULL,
		[Total Including Tax] [decimal](18, 2) NOT NULL,
		[Total Dry Items] [int] NOT NULL,
		[Total Chiller Items] [int] NOT NULL,
		[Lineage Key] [int] NOT NULL
	
			

		) ON [USERDATA]

end

declare @tableName sysname
declare @objectID  int

set @tableName = '[Fact].[SaleLargeColumnStoreNonClustered]'
set @objectID = object_id(@tableName)


/*
	Msg 35339, Level 16, State 1, Line 162
	Multiple nonclustered columnstore indexes are not supported.
*/
if @objectID is not null
begin

	if not exists
	(
		select *
		from   sys.indexes tblSI
		where  tblSI.object_id = @objectID
		and    tblSI.[name] = 'INDX_InvoiceDateKeyYear_DeliveryDateKeyYear'

	)
	begin

		CREATE NONCLUSTERED COLUMNSTORE INDEX [INDX_InvoiceDateKeyYear_DeliveryDateKeyYear]
			ON [Fact].[SaleLargeColumnStoreNonClustered]
			(
				  [InvoiceDateKeyYear]
				, [DeliveryDateKeyYear]
			)
			ON [USERDATA]
			;  

	end


end
go


Populate Tables

Approach

We take a uniform approach in populating each of our tables.

And, that approach is to:

  1. Decide on our target year
    • In our case always 2012
  2. Does Index need to be disabled to be able to manage data?
    • If Clustered Row Store or Clustered Column Store, then  “No, Indexes does not need to be frozen
    • If Non-Clustered Column Store exists, then “Yes, Indexes need to be frozen
  3. Remove any existing records for our target year
  4. If base data does not exist, add it
    • Remember base data is data after Jan 1st 2013
  5. Add data for Jan 1st 2012 to Dec 31st 2012
    • For each data add data @desiredTargetRecordCount / 365
  6. Defragment data
    • If Re-org requested
      • Reorg data
      • Update Statistics
    • If rebuild requested
      • Rebuild data

 

Code

Fact.usp_SaleLargeRowStorePopulateTable

use [WideWorldImportersDW]
go

if object_id('[Fact].[usp_SaleLargeRowStorePopulateTable]') is null
begin

    exec('create procedure [Fact].[usp_SaleLargeRowStorePopulateTable] as ')

end
go

alter procedure [Fact].[usp_SaleLargeRowStorePopulateTable]
(
      @EstimatedRowsFor2012 bigint = 12000000
    , @defrag bit = 0
)
as

begin


    DECLARE @OrderCounter bigint = 0;
    DECLARE @NumberOfSalesPerDay bigint
    DECLARE @DateCounter date;
    DECLARE @DateCounterBegin date
    DECLARE @DateCounterEnd date
    DECLARE @StartingSaleKey bigint;
    DECLARE @MaximumSaleKey bigint = (SELECT MAX([Sale Key]) FROM Fact.Sale);
    DECLARE @OutputCounter varchar(20);

    DECLARE @LineageKey int

    DECLARE @iCycle int
    DECLARE @lNumberofRecordsDeleted bigint
    DECLARE @lNumberofRecordsDeletedTotal bigint

    DECLARE @lNumberofRecordsAdded   bigint

    DECLARE @CHAR_TAB  varchar(10)

    DECLARE @tableDestination sysname

    set @CHAR_TAB = char(9);
    set @DateCounter = '20120101';
    set @DateCounterEnd = '20121231'

    set @DateCounterBegin = @DateCounter;

    set @MaximumSaleKey 
            = (
                SELECT MAX([Sale Key]) 
                FROM   Fact.Sale
              );

    set @LineageKey =1

    set @NumberOfSalesPerDay   = @EstimatedRowsFor2012 / 365;


    BEGIN


        SET NOCOUNT ON;
        SET XACT_ABORT ON;

        SET @tableDestination = '[Fact].[SaleLargeRowStore]';
        set @DateCounterEnd = '20121231'

        PRINT 'EstimatedRowsFor2012 Total '
                + cast(@EstimatedRowsFor2012 as varchar(30));

        PRINT 'Targeting ' 
                + CAST(@NumberOfSalesPerDay AS varchar(20)) 
                + ' sales per day.'
                ;

        --added by dadeniji 2017-08-23 7:01 AM
        PRINT 'DateCounter ' + convert(varchar(30), @DateCounter, 100)
        PRINT 'DateCounterMax ' + convert(varchar(30), '20121231', 100)

        IF @NumberOfSalesPerDay > 50000
        BEGIN
            PRINT 'WARNING: Limiting sales to 40000 per day';
            SET @NumberOfSalesPerDay = 50000;
        END;

        set @lNumberofRecordsDeleted = 1
        set @iCycle = 0
        set @lNumberofRecordsDeletedTotal = 0

        while (@lNumberofRecordsDeleted != 0)
        begin

            set @iCycle = @iCycle + 1

            delete top ( 10000)
            from    [Fact].[SaleLargeRowStore]
            where  [Invoice Date Key] <= @DateCounterEnd set @lNumberofRecordsDeleted = @@ROWCOUNT set @lNumberofRecordsDeletedTotal = @lNumberofRecordsDeletedTotal + @lNumberofRecordsDeleted PRINT @CHAR_TAB + cast(@iCycle as varchar(10)) + ') ' + 'Pruning records prior to ' + convert(varchar(30), @DateCounterEnd, 100) + ' Number of record(s) removed is ' + cast(@lNumberofRecordsDeleted as varchar(10)) + ' :: Total thus far ' + cast(@lNumberofRecordsDeletedTotal as varchar(10)) + '- TS is ' + convert(varchar(30), getdate(), 100) end if not exists ( select * from [Fact].[SaleLargeRowStore] where [Invoice Date Key] > @DateCounterEnd
        )
        begin

    
            PRINT 'Adding base data Fact.Sale - those greater than '  
                    + convert(varchar(30), @DateCounterEnd, 100)

            set identity_insert [Fact].[SaleLargeRowStore] on 

            insert into [Fact].[SaleLargeRowStore]
                (
                      [Sale Key]
                    , [City Key]
                    , [Customer Key]
                    , [Bill To Customer Key]
                    , [Stock Item Key]

                    , [Invoice Date Key]
                    , [InvoiceDateKeyYear]
                                        
                    , [Delivery Date Key]
                    , [DeliveryDateKeyYear]
                    
                    , [Salesperson Key]
                    , [WWI Invoice ID]
                    , [Description]
                    , Package
                    , Quantity
                    , [Unit Price]
                    , [Tax Rate]
                    , [Total Excluding Tax]
                    , [Tax Amount]
                    , Profit
                    , [Total Including Tax]
                    , [Total Dry Items]
                    , [Total Chiller Items]
                    , [Lineage Key]
                )

            select 
                      [Sale Key]
                    , [City Key]
                    , [Customer Key]
                    , [Bill To Customer Key]
                    , [Stock Item Key]

                    , [Invoice Date Key]
                    , [InvoiceDateKeyYear]
                        = datepart(year, [Invoice Date Key])

                    , [Delivery Date Key]
                    , [DeliveryDateKeyYear]
                        = datepart(year, [Delivery Date Key])

                    , [Salesperson Key]
                    , [WWI Invoice ID]
                    , [Description]
                    , Package
                    , Quantity
                    , [Unit Price]
                    , [Tax Rate]
                    , [Total Excluding Tax]
                    , [Tax Amount]
                    , Profit
                    , [Total Including Tax]
                    , [Total Dry Items]
                    , [Total Chiller Items]
                    , [Lineage Key]

            from   [Fact].[Sale]
    
            where  [Invoice Date Key] > @DateCounterEnd

            set @lNumberofRecordsAdded = @@ROWCOUNT


            set identity_insert [Fact].[SaleLargeRowStore] off
                
            PRINT 'Added base data Fact.Sale - ' 
                    + ' those greater than '  
                    + convert(varchar(30), @DateCounterEnd, 100)
                    + ' - Number of record(s) inserted during Initial loading is ' 
                    + cast(@lNumberofRecordsAdded as varchar(10))
                    + '- TS is '
                    + convert(varchar(30), getdate(), 100)

        end

    
        set @iCycle = 0;

        set identity_insert [Fact].[SaleLargeRowStore] off
    
        WHILE @DateCounter < '20121231' BEGIN SET @OutputCounter = CONVERT(varchar(20), @DateCounter, 112); --RAISERROR(@OutputCounter, 0, 1) WITH NOWAIT; -- PRINT 'Processing entry for ' + @OutputCounter SET @StartingSaleKey = @MaximumSaleKey - @NumberOfSalesPerDay - FLOOR(RAND() * 20000); SET @OrderCounter = 0; set @iCycle = @iCycle + 1 INSERT [Fact].[SaleLargeRowStore] WITH (TABLOCK) ( [City Key] , [Customer Key] , [Bill To Customer Key] , [Stock Item Key] , [Invoice Date Key] , [InvoiceDateKeyYear] , [Delivery Date Key] , [DeliveryDateKeyYear] , [Salesperson Key] , [WWI Invoice ID] , [Description] , Package , Quantity , [Unit Price] , [Tax Rate] , [Total Excluding Tax] , [Tax Amount] , Profit , [Total Including Tax] , [Total Dry Items] , [Total Chiller Items] , [Lineage Key] ) SELECT TOP(@NumberOfSalesPerDay) [City Key] , [Customer Key] , [Bill To Customer Key] , [Stock Item Key] , @DateCounter as [Invoice Date Key] , [InvoiceDateKeyYer] = datepart(year, @DateCounter) , [Delivery Date Key] = DATEADD(day, 1, @DateCounter) , [DeliveryDateKeyYer] = datepart ( year , DATEADD ( day , 1 , @DateCounter ) ) , [Salesperson Key] , [WWI Invoice ID] , [Description] , Package , Quantity , [Unit Price] , [Tax Rate] , [Total Excluding Tax] , [Tax Amount] , Profit = tblFS.[Profit] + ( tblFS.[Profit] * rand ( tblFS.[City Key] * tblFS.[Unit Price] ) ) , [Total Including Tax] , [Total Dry Items] , [Total Chiller Items] , @LineageKey FROM Fact.Sale tblFS WHERE tblFS.[Sale Key] > @StartingSaleKey
        
            and   tblFS.[Invoice Date Key] >='2013-01-01'
        
            ORDER BY 
                    [Sale Key];

            set @lNumberofRecordsAdded = @@ROWCOUNT
        
            PRINT @CHAR_TAB
                    + cast(@iCycle as varchar(10)) 
                    + ' - '
                    + 'Day : '
                    + convert(varchar(30), @DateCounter, 101) 
                    + ' - Number of record(s) added to [Fact].[SaleLargeRowStore] is ' 
                    + ' ' + cast(@lNumberofRecordsAdded as varchar(10))
                    + '- TS is '
                    + convert(varchar(30), getdate(), 100)

            SET @DateCounter = DATEADD(day, 1, @DateCounter);

        END;
    
    
        set identity_insert [Fact].[SaleLargeRowStore] off


        if (@defrag = 0)
        begin


            print @CHAR_TAB
                 + 'Reorg Index [Fact].[SaleLargeRowStore].[PK_Fact_SaleLargeRowStore] '
                 + convert(varchar(30), getdate(), 100)
                 +  '... '

            ALTER INDEX [PK_Fact_SaleLargeRowStore]
                on [Fact].[SaleLargeRowStore] REORGANIZE
                ;

            print @CHAR_TAB
                 + 'Re-orged Index [Fact].[SaleLargeRowStore].[PK_Fact_SaleLargeRowStore] '
                 + convert(varchar(30), getdate(), 100)

            print @CHAR_TAB
                 + 'Update Statistics on Index [Fact].[SaleLargeRowStore].[PK_Fact_SaleLargeRowStore] '
                 + convert(varchar(30), getdate(), 100)
                 + ' ....'

            update statistics [Fact].[SaleLargeRowStore] [PK_Fact_SaleLargeRowStore]
                with FULLSCAN;

            print @CHAR_TAB
                 + 'Updated Statistics on Index [Fact].[SaleLargeRowStore].[PK_Fact_SaleLargeRowStore] '
                 + convert(varchar(30), getdate(), 100)
                 + ' ....'


        end
        else if (@defrag = 1)
        begin
        

            print @CHAR_TAB
                 + 'Rebuilding Index [Fact].[SaleLargeRowStore].[PK_Fact_SaleLargeRowStore] '
                 + convert(varchar(30), getdate(), 100)
                 +  '... '

            ALTER INDEX [PK_Fact_SaleLargeRowStore]
                on [Fact].[SaleLargeRowStore] REBUILD
                ;

            print @CHAR_TAB
                 + 'Rebuilt Index [Fact].[SaleLargeRowStore].[PK_Fact_SaleLargeRowStore] '
                 + convert(varchar(30), getdate(), 100)

        end     

    END

    dbcc shrinkfile('WWI_Log') with no_infomsgs;

END

go


Fact.usp_SaleLargeColumnStoreClusteredPopulateTable

use [WideWorldImportersDW]
go

if object_id('[Fact].[usp_SaleLargeColumnStoreClusteredPopulateTable]') is null
begin

	exec('create procedure [Fact].[usp_SaleLargeColumnStoreClusteredPopulateTable] as ')

end
go

alter procedure [Fact].[usp_SaleLargeColumnStoreClusteredPopulateTable]
(
	  @EstimatedRowsFor2012 bigint = 12000000
	, @defrag bit = 0
)
as

begin


	DECLARE @OrderCounter bigint = 0;
	DECLARE @NumberOfSalesPerDay bigint;
	DECLARE @DateCounter date ;
	DECLARE @DateCounterBegin date
	DECLARE @DateCounterEnd date
	DECLARE @StartingSaleKey bigint;
	DECLARE @MaximumSaleKey bigint;
	DECLARE @OutputCounter varchar(20);

	DECLARE @LineageKey	int

	DECLARE @iCycle int
	DECLARE @lNumberofRecordsDeleted bigint
	DECLARE @lNumberofRecordsDeletedTotal bigint

	DECLARE @lNumberofRecordsAdded	 bigint

	DECLARE @CHAR_TAB  varchar(10)

	DECLARE @tableDestination sysname

	set @CHAR_TAB = char(9);
	set @DateCounter = '20120101';
	set @DateCounterEnd = '20121231'

	set @DateCounterBegin = @DateCounter;

	set @MaximumSaleKey 
			= (
				SELECT MAX([Sale Key]) 
				FROM   Fact.Sale
			  );

	set @LineageKey	=1

	set @NumberOfSalesPerDay   = @EstimatedRowsFor2012 / 365;


	BEGIN


		SET NOCOUNT ON;
		SET XACT_ABORT ON;

		SET @tableDestination = '[Fact].[SaleLargeColumnStoreClustered]';
		set @DateCounterEnd = '20121231'

		PRINT 'EstimatedRowsFor2012 Total '
				+ cast(@EstimatedRowsFor2012 as varchar(30));

		PRINT 'Targeting ' 
				+ CAST(@NumberOfSalesPerDay AS varchar(20)) 
				+ ' sales per day.'
				;

		--added by dadeniji 2017-08-23 7:01 AM
		PRINT 'DateCounter ' + convert(varchar(30), @DateCounter, 100)
		PRINT 'DateCounterMax ' + convert(varchar(30), '20121231', 100)

		IF @NumberOfSalesPerDay > 50000
		BEGIN
			PRINT 'WARNING: Limiting sales to 40000 per day';
			SET @NumberOfSalesPerDay = 50000;
		END;

		set @lNumberofRecordsDeleted = 1
		set @iCycle = 0
		set @lNumberofRecordsDeletedTotal = 0

		print @CHAR_TAB
			+ 'Pruning data '
			+ ' Begin Date ' + convert ( varchar(30), @DateCounter, 100)
			+ ' End Date '+   convert ( varchar(30), @DateCounterEnd, 100)
			+ ' from ' + @tableDestination
			+ convert(varchar(30), getdate(), 100)
			+  '... '

		while (@lNumberofRecordsDeleted != 0)
		begin

			set @iCycle = @iCycle + 1

			delete top ( 10000)
			from   [Fact].[SaleLargeColumnStoreClustered] 
			where  [Invoice Date Key] <= @DateCounterEnd set @lNumberofRecordsDeleted = @@ROWCOUNT set @lNumberofRecordsDeletedTotal = @lNumberofRecordsDeletedTotal + @lNumberofRecordsDeleted PRINT @CHAR_TAB + cast(@iCycle as varchar(10)) + ') ' + 'Pruning records prior to ' + convert(varchar(30), @DateCounterEnd, 100) + ' Number of record(s) removed is ' + cast(@lNumberofRecordsDeleted as varchar(10)) + ' :: Total thus far ' + cast(@lNumberofRecordsDeletedTotal as varchar(10)) end print @CHAR_TAB + 'Pruned data ' + ' Begin Date ' + convert ( varchar(30), @DateCounter, 100) + ' End Date '+ convert ( varchar(30), @DateCounterEnd, 100) + ' from ' + @tableDestination + '. TS is ' + convert(varchar(30), getdate(), 100) if not exists ( select * from [Fact].[SaleLargeColumnStoreClustered] where [Invoice Date Key] > @DateCounterEnd
		)
		begin

	
			PRINT 'Adding base data Fact.Sale - those greater than '  
					+ convert(varchar(30), @DateCounterEnd, 100)
					+ '. TS is '
					+ convert(varchar(30), getdate(), 100)

			set identity_insert [Fact].[SaleLargeColumnStoreClustered]  on 

			insert into [Fact].[SaleLargeColumnStoreClustered] 
				(
					  [Sale Key]
					, [City Key]
					, [Customer Key]
					, [Bill To Customer Key]
					, [Stock Item Key]

					, [Invoice Date Key]
					, [InvoiceDateKeyYear]

					, [Delivery Date Key]
					, [DeliveryDateKeyYear]

					, [Salesperson Key]
					, [WWI Invoice ID]
					, [Description]
					, Package
					, Quantity
					, [Unit Price]
					, [Tax Rate]
					, [Total Excluding Tax]
					, [Tax Amount]
					, Profit
					, [Total Including Tax]
					, [Total Dry Items]
					, [Total Chiller Items]
					, [Lineage Key]
				)

			select 
					  [Sale Key]
					, [City Key]
					, [Customer Key]
					, [Bill To Customer Key]
					, [Stock Item Key]

					, [Invoice Date Key]
					, [InvoiceDateKeyYer]
						= datepart(year, [Invoice Date Key])

					, [Delivery Date Key]
					, [DeliveryDateKeyYear]
						= datepart(year, [Delivery Date Key])

					, [Salesperson Key]
					, [WWI Invoice ID]
					, [Description]
					, Package
					, Quantity
					, [Unit Price]
					, [Tax Rate]
					, [Total Excluding Tax]
					, [Tax Amount]
					, Profit
					, [Total Including Tax]
					, [Total Dry Items]
					, [Total Chiller Items]
					, [Lineage Key]

			from   [Fact].[Sale]
	
			where  [Invoice Date Key] > @DateCounterEnd

			set @lNumberofRecordsAdded = @@ROWCOUNT


			set identity_insert [Fact].[SaleLargeColumnStoreClustered]  off
				
			PRINT 'Added base data Fact.Sale - ' 
					+ ' those greater than '  
					+ convert(varchar(30), @DateCounterEnd, 100)
					+ ' - Number of record(s) inserted during Initial loading is ' 
					+ cast(@lNumberofRecordsAdded as varchar(10))
					+ '- TS is '
					+ convert(varchar(30), getdate(), 100)

		end

	
		set @iCycle = 0;

		set identity_insert [Fact].[SaleLargeColumnStoreClustered]  off

		print @CHAR_TAB
				 + 'Adding data '
				 + ' Begin Date ' + convert ( varchar(30), @DateCounter, 100)
				 + ' End Date '+   convert ( varchar(30), @DateCounterEnd, 100)
				 + ' into ' + @tableDestination
				 + ' - TS :- ' + convert(varchar(30), getdate(), 100)
				 +  '... '
	
		WHILE @DateCounter < '20121231' BEGIN SET @OutputCounter = CONVERT(varchar(20), @DateCounter, 112); --RAISERROR(@OutputCounter, 0, 1) WITH NOWAIT; -- PRINT 'Processing entry for ' + @OutputCounter SET @StartingSaleKey = @MaximumSaleKey - @NumberOfSalesPerDay - FLOOR(RAND() * 20000); SET @OrderCounter = 0; set @iCycle = @iCycle + 1 INSERT [Fact].[SaleLargeColumnStoreClustered] WITH (TABLOCK) ( [City Key] , [Customer Key] , [Bill To Customer Key] , [Stock Item Key] , [Invoice Date Key] , [InvoiceDateKeyYear] , [Delivery Date Key] , [DeliveryDateKeyYear] , [Salesperson Key] , [WWI Invoice ID] , [Description] , Package , Quantity , [Unit Price] , [Tax Rate] , [Total Excluding Tax] , [Tax Amount] , Profit , [Total Including Tax] , [Total Dry Items] , [Total Chiller Items] , [Lineage Key] ) SELECT TOP(@NumberOfSalesPerDay) [City Key] , [Customer Key] , [Bill To Customer Key] , [Stock Item Key] , [Invoice Date Key] = @DateCounter , [InvoiceDateKeyYear] = datepart(year, [Invoice Date Key]) , [Delivery Date Key] = DATEADD(day, 1, @DateCounter) , [DeliveryDateKeyYear] = datepart(year, [Delivery Date Key]) , [Salesperson Key] , [WWI Invoice ID] , [Description] , Package , Quantity , [Unit Price] , [Tax Rate] , [Total Excluding Tax] , [Tax Amount] , Profit = tblFS.[Profit] + ( tblFS.[Profit] * rand ( tblFS.[City Key] * tblFS.[Unit Price] ) ) , [Total Including Tax] , [Total Dry Items] , [Total Chiller Items] , @LineageKey FROM Fact.Sale tblFS WHERE tblFS.[Sale Key] > @StartingSaleKey
		
			and   tblFS.[Invoice Date Key] >='2013-01-01'
		
			ORDER BY 
					[Sale Key];

			set @lNumberofRecordsAdded = @@ROWCOUNT
		
			PRINT @CHAR_TAB
					+ cast(@iCycle as varchar(10)) 
					+ ' - '
					+ 'Day : '
					+ convert(varchar(30), @DateCounter, 101) 
					+ ' - Number of record(s) added to '
					+ @tableDestination 
					+ ' ' + cast(@lNumberofRecordsAdded as varchar(10))
					+ ' TS :- ' + convert(varchar(30), getdate(), 100)

			SET @DateCounter = DATEADD(day, 1, @DateCounter);

		END;
	

		print @CHAR_TAB
				 + 'Added data '
				 + ' Begin Date ' + convert ( varchar(30), @DateCounter, 100)
				 + ' End Date '+   convert ( varchar(30), @DateCounterEnd, 100)
				 + ' into ' + @tableDestination
				 + ' TS :- ' + convert(varchar(30), getdate(), 100)
				 +  ''
	
		set identity_insert [Fact].[SaleLargeColumnStoreClustered]  off


		if (@defrag = 0)
		begin


			print @CHAR_TAB
				 + 'Reorg Index '
				 + '[Fact].[SaleLargeColumnStoreClustered].[INDX_FACT_SaleLargeColumnStoreClustered] '
				 + convert(varchar(30), getdate(), 100)
				 +  '... '

			ALTER INDEX [INDX_FACT_SaleLargeColumnStoreClustered]
				on [Fact].[SaleLargeColumnStoreClustered]  REORGANIZE
				;

			print @CHAR_TAB
				 + 'Re-orged Index '
				 + '[Fact].[SaleLargeColumnStoreClustered].[INDX_FACT_SaleLargeColumnStoreClustered] '
				 + convert(varchar(30), getdate(), 100)

			print @CHAR_TAB
				 + 'Update Statistics on Index ' 
				 + ' [Fact].[SaleLargeColumnStoreClustered].[INDX_FACT_SaleLargeColumnStoreClustered]  '
				 + convert(varchar(30), getdate(), 100)
				 + ' ....'

			update statistics [Fact].[SaleLargeColumnStoreClustered] 
					[INDX_FACT_SaleLargeColumnStoreClustered] 
				with FULLSCAN;

			print @CHAR_TAB
				 + 'Updated Statistics on Index '
				 + ' [Fact].[SaleLargeColumnStoreClustered] [INDX_FACT_SaleLargeColumnStoreClustered] '
				 + convert(varchar(30), getdate(), 100)
				 + ' ....'


		end
		else if (@defrag = 1)
		begin
		

			print @CHAR_TAB
				 + 'Rebuilding Index '
				 + '[Fact].[SaleLargeRowStore].[PK_Fact_SaleLargeRowStore] '
				 + convert(varchar(30), getdate(), 100)
				 +  '... '

			ALTER INDEX [INDX_FACT_SaleLargeColumnStoreClustered]
				on  [Fact].[SaleLargeColumnStoreClustered] REBUILD
				;

			print @CHAR_TAB
				 + 'Rebuilt Index '
				 + ' [Fact].[SaleLargeColumnStoreClustered].[INDX_FACT_SaleLargeColumnStoreClustered] '
				 + convert(varchar(30), getdate(), 100)
		

		end		
	END

	dbcc shrinkfile('WWI_Log') with no_infomsgs;

END

go

Fact.usp_SaleLargeColumnStoreNonClusteredPopulateTable

use [WideWorldImportersDW]
go

if object_id('[Fact].[usp_SaleLargeColumnStoreNonClusteredPopulateTable]') is null
begin

	exec('create procedure [Fact].[usp_SaleLargeColumnStoreNonClusteredPopulateTable] as ')

end
go

alter procedure [Fact].[usp_SaleLargeColumnStoreNonClusteredPopulateTable]
(
	  @EstimatedRowsFor2012 bigint = 12000000
	, @defrag bit = 0
)
as

begin


	DECLARE @OrderCounter bigint = 0;
	DECLARE @NumberOfSalesPerDay bigint;
	DECLARE @DateCounter date;
	DECLARE @DateCounterBegin date
	DECLARE @DateCounterEnd date
	DECLARE @StartingSaleKey bigint;
	DECLARE @MaximumSaleKey bigint;
	DECLARE @OutputCounter varchar(20);

	DECLARE @LineageKey	int

	DECLARE @iCycle int
	DECLARE @lNumberofRecordsDeleted bigint
	DECLARE @lNumberofRecordsDeletedTotal bigint

	DECLARE @lNumberofRecordsAdded	 bigint
	DECLARE @lNumberofRecordsAddedTotal bigint

	DECLARE @CHAR_TAB  varchar(10)

	DECLARE @tableDestination sysname

	set @CHAR_TAB = char(9);
	set @DateCounter = '20120101';
	set @DateCounterEnd = '20121231'

	set @DateCounterBegin = @DateCounter;

	set @MaximumSaleKey 
			= (
				SELECT MAX([Sale Key]) 
				FROM   Fact.Sale
			  );

	set @LineageKey	=1

	set @NumberOfSalesPerDay   = @EstimatedRowsFor2012 / 365;



	set @lNumberofRecordsAddedTotal = 0

	BEGIN


		SET NOCOUNT ON;
		SET XACT_ABORT ON;

		-- exec sp_helpindex '[Fact].[SaleLargeColumnStoreNonClustered]'
		SET @tableDestination = '[Fact].[SaleLargeColumnStoreNonClustered]';
		set @DateCounterEnd = '20121231'

		PRINT 'EstimatedRowsFor2012 Total '
				+ cast(@EstimatedRowsFor2012 as varchar(30));

		PRINT 'Targeting ' 
				+ CAST(@NumberOfSalesPerDay AS varchar(20)) 
				+ ' sales per day.'
				;

		--added by dadeniji 2017-08-23 7:01 AM
		PRINT 'DateCounter ' + convert(varchar(30), @DateCounter, 100)
		PRINT 'DateCounterMax ' + convert(varchar(30), '20121231', 100)

		IF @NumberOfSalesPerDay > 50000
		BEGIN
			PRINT 'WARNING: Limiting sales to 40000 per day';
			SET @NumberOfSalesPerDay = 50000;
		END;

		set @lNumberofRecordsDeleted = 1
		set @iCycle = 0
		set @lNumberofRecordsDeletedTotal = 0

		/*
			Msg 35330, Level 15, State 1, Procedure usp_SaleLargeColumnStoreNonClusteredPopulateTable, Line 119 [Batch Start Line 2]
			DELETE statement failed because data cannot be updated in a table that has a nonclustered columnstore index. Consider disabling the columnstore index before issuing the DELETE statement, and then rebuilding the columnstore index after DELETE has completed.
		*/

		print @CHAR_TAB
			+ 'Disabling Index '
			+ '[Fact].[SaleLargeColumnStoreNonClustered].[INDX_FACT_Customer_Profit] '
			+ convert(varchar(30), getdate(), 100)
			+  '... '

		exec('
				ALTER INDEX ALL
				on   [Fact].[SaleLargeColumnStoreNonClustered] DISABLE
				;
			')


		--ALTER INDEX [INDX_FACT_Customer_Profit]
		ALTER INDEX ALL
				on   [Fact].[SaleLargeColumnStoreNonClustered] DISABLE
				;

		/*
		ALTER INDEX [INDX_InvoiceDateKeyYear_DeliveryDateKeyYear]
				on   [Fact].[SaleLargeColumnStoreNonClustered] DISABLE
				;
		*/

		print @CHAR_TAB
			+ 'Disabled Index '
			+ '[Fact].[SaleLargeColumnStoreNonClustered].[INDX_FACT_Customer_Profit] '
			+ convert(varchar(30), getdate(), 100)


		print @CHAR_TAB
			+ 'Pruning data '
			+ ' Begin Date ' + convert ( varchar(30), @DateCounter, 100)
			+ ' End Date '+   convert ( varchar(30), @DateCounterEnd, 100)
			+ ' from ' + @tableDestination
			+ convert(varchar(30), getdate(), 100)
			+  '... '

		while (@lNumberofRecordsDeleted != 0)
		begin

			set @iCycle = @iCycle + 1

			delete top ( 10000)
			from   [Fact].[SaleLargeColumnStoreNonClustered] 
			where  [Invoice Date Key] <= @DateCounterEnd set @lNumberofRecordsDeleted = @@ROWCOUNT set @lNumberofRecordsDeletedTotal = @lNumberofRecordsDeletedTotal + @lNumberofRecordsDeleted PRINT @CHAR_TAB + cast(@iCycle as varchar(10)) + ') ' + 'Pruning records prior to ' + convert(varchar(30), @DateCounterEnd, 100) + ' Number of record(s) removed is ' + cast(@lNumberofRecordsDeleted as varchar(10)) + ' :: Total thus far ' + cast(@lNumberofRecordsDeletedTotal as varchar(10)) end print @CHAR_TAB + 'Pruned data ' + ' Begin Date ' + convert ( varchar(30), @DateCounter, 100) + ' End Date '+ convert ( varchar(30), @DateCounterEnd, 100) + ' from ' + @tableDestination + '. TS is ' + convert(varchar(30), getdate(), 100) if not exists ( select * from [Fact].[SaleLargeColumnStoreNonClustered] where [Invoice Date Key] > @DateCounterEnd
		)
		begin

	
			PRINT 'Adding base data Fact.Sale - those greater than '  
					+ convert(varchar(30), @DateCounterEnd, 100)
					+ '. TS is '
					+ convert(varchar(30), getdate(), 100)

			set identity_insert [Fact].[SaleLargeColumnStoreNonClustered]  on 

			insert into [Fact].[SaleLargeColumnStoreNonClustered] 
				(
					  [Sale Key]
					, [City Key]
					, [Customer Key]
					, [Bill To Customer Key]
					, [Stock Item Key]

					, [Invoice Date Key]
					, [InvoiceDateKeyYear]

					, [Delivery Date Key]
					, [DeliveryDateKeyYear]

					, [Salesperson Key]
					, [WWI Invoice ID]
					, [Description]
					, Package
					, Quantity
					, [Unit Price]
					, [Tax Rate]
					, [Total Excluding Tax]
					, [Tax Amount]
					, Profit
					, [Total Including Tax]
					, [Total Dry Items]
					, [Total Chiller Items]
					, [Lineage Key]
				)

			select 
					  [Sale Key]
					, [City Key]
					, [Customer Key]
					, [Bill To Customer Key]
					, [Stock Item Key]

					, [Invoice Date Key]
					, [InvoiceDateKeyYear]
						= datepart(year, [Invoice Date Key])

					, [Delivery Date Key]

					, [DeliveryDateKeyYear]
						= datepart(year, [Delivery Date Key])

					, [Salesperson Key]
					, [WWI Invoice ID]
					, [Description]
					, Package
					, Quantity
					, [Unit Price]
					, [Tax Rate]
					, [Total Excluding Tax]
					, [Tax Amount]
					, Profit
					, [Total Including Tax]
					, [Total Dry Items]
					, [Total Chiller Items]
					, [Lineage Key]

			from   [Fact].[Sale]
	
			where  [Invoice Date Key] > @DateCounterEnd

			set @lNumberofRecordsAdded = @@ROWCOUNT


			set identity_insert [Fact].[SaleLargeColumnStoreNonClustered]  off
				
			PRINT 'Added base data Fact.Sale - ' 
					+ ' those greater than '  
					+ convert(varchar(30), @DateCounterEnd, 100)
					+ ' - Number of record(s) inserted during Initial loading is ' 
					+ cast(@lNumberofRecordsAdded as varchar(10))
					+ '- TS is '
					+ convert(varchar(30), getdate(), 100)

		end

	
		set @iCycle = 0;

		set identity_insert [Fact].[SaleLargeColumnStoreNonClustered]  off

		print @CHAR_TAB
				 + 'Adding data '
				 + ' Begin Date ' + convert ( varchar(30), @DateCounter, 100)
				 + ' End Date '+   convert ( varchar(30), @DateCounterEnd, 100)
				 + ' into ' + @tableDestination
				 + ' - TS :- ' + convert(varchar(30), getdate(), 100)
				 +  '... '
	
		WHILE @DateCounter < '20121231' BEGIN SET @OutputCounter = CONVERT(varchar(20), @DateCounter, 112); --RAISERROR(@OutputCounter, 0, 1) WITH NOWAIT; -- PRINT 'Processing entry for ' + @OutputCounter SET @StartingSaleKey = @MaximumSaleKey - @NumberOfSalesPerDay - FLOOR(RAND() * 20000); SET @OrderCounter = 0; set @iCycle = @iCycle + 1 INSERT [Fact].[SaleLargeColumnStoreNonClustered] WITH (TABLOCK) ( [City Key] , [Customer Key] , [Bill To Customer Key] , [Stock Item Key] , [Invoice Date Key] , [InvoiceDateKeyYear] , [Delivery Date Key] , [DeliveryDateKeyYear] , [Salesperson Key] , [WWI Invoice ID] , [Description] , Package , Quantity , [Unit Price] , [Tax Rate] , [Total Excluding Tax] , [Tax Amount] , Profit , [Total Including Tax] , [Total Dry Items] , [Total Chiller Items] , [Lineage Key] ) SELECT TOP(@NumberOfSalesPerDay) [City Key] , [Customer Key] , [Bill To Customer Key] , [Stock Item Key] , [Invoice Date Key] = @DateCounter , [InvoiceDateKeyYear] = datepart(year, @DateCounter) , [Delivery Date Key] = DATEADD(day, 1, @DateCounter) , [DeliveryDateKeyYer] = datepart ( year , DATEADD ( day , 1 , @DateCounter ) ) , [Salesperson Key] , [WWI Invoice ID] , [Description] , Package , Quantity , [Unit Price] , [Tax Rate] , [Total Excluding Tax] , [Tax Amount] , Profit = tblFS.[Profit] + ( tblFS.[Profit] * rand ( tblFS.[City Key] * tblFS.[Unit Price] ) ) , [Total Including Tax] , [Total Dry Items] , [Total Chiller Items] , @LineageKey FROM Fact.Sale tblFS WHERE tblFS.[Sale Key] > @StartingSaleKey
		
			and   tblFS.[Invoice Date Key] >='2013-01-01'
		
			ORDER BY 
					[Sale Key];

			set @lNumberofRecordsAdded = @@ROWCOUNT

			set @lNumberofRecordsAddedTotal = @lNumberofRecordsAddedTotal
												+ @lNumberofRecordsAdded
		
			PRINT @CHAR_TAB
					+ cast(@iCycle as varchar(10)) 
					+ ' - '
					+ 'Day : '
					+ convert(varchar(30), @DateCounter, 101) 
					+ ' - Number of record(s) added to '
					+ @tableDestination 
					+ ' ' + cast(@lNumberofRecordsAdded as varchar(10))
					+ ' TS :- ' + convert(varchar(30), getdate(), 100)

			SET @DateCounter = DATEADD(day, 1, @DateCounter);

		END;
	

		print @CHAR_TAB
				 + 'Added data '
				 + ' Begin Date ' + convert ( varchar(30), @DateCounter, 100)
				 + ' End Date '+   convert ( varchar(30), @DateCounterEnd, 100)
				 + ' into ' + @tableDestination
				 + ' TS :- ' + convert(varchar(30), getdate(), 100)
				 +  ''
	

		print @CHAR_TAB
				 + cast(@lNumberofRecordsAddedTotal as varchar(10))
				 + ' records added '
				 + ' into ' + @tableDestination
				 + ' TS :- ' + convert(varchar(30), getdate(), 100)
				 +  ''
	
		set identity_insert [Fact].[SaleLargeColumnStoreNonClustered]  off


		if 
			(
				    ( @defrag = 0) 
				and ( 1= 0)
			)	 
		begin


			print @CHAR_TAB
				 + 'Rebuilding Index [Fact].[SaleLargeColumnStoreNonClustered].[INDX_FACT_Customer_Profit] '
				 + convert(varchar(30), getdate(), 100)
				 +  '... '

			--ALTER INDEX [INDX_FACT_Customer_Profit]
			--ALTER INDEX [INDX_InvoiceDateKeyYear_DeliveryDateKeyYear]
			ALTER INDEX ALL
				on  [Fact].[SaleLargeColumnStoreNonClustered] REORGANIZE
				;

			print @CHAR_TAB
				 + 'Rebuilt Index [Fact].[SaleLargeColumnStoreNonClustered].[INDX_FACT_Customer_Profit] '
				 + convert(varchar(30), getdate(), 100)

			print @CHAR_TAB
				 + 'Update Statistics on Index ' 
				 + ' [Fact].[SaleLargeColumnStoreNonClustered].[INDX_FACT_SaleLargeColumnStoreClustered]  '
				 + convert(varchar(30), getdate(), 100)
				 + ' ....'

			update statistics [Fact].[SaleLargeColumnStoreNonClustered]
					 --[INDX_FACT_Customer_Profit]
					 --[INDX_InvoiceDateKeyYear_DeliveryDateKeyYear]
					 --ALL
				with FULLSCAN;

			print @CHAR_TAB
				 + 'Updated Statistics on Index '
				 +  ' [Fact].[SaleLargeColumnStoreNonClustered].[INDX_FACT_SaleLargeColumnStoreClustered]  '
				 + convert(varchar(30), getdate(), 100)
				 + ' ....'


		end
		else if 
				(
					   ( @defrag = 1)
					or ( 1=1)
				)
		begin
		

			print @CHAR_TAB
				 + 'Rebuilding Index [Fact].[SaleLargeColumnStoreClustered].[INDX_FACT_Customer_Profit] '
				 + convert(varchar(30), getdate(), 100)
				 +  '... '

			--ALTER INDEX [INDX_FACT_Customer_Profit]
			--ALTER INDEX [INDX_InvoiceDateKeyYear_DeliveryDateKeyYear]
			ALTER INDEX ALL
				on   [Fact].[SaleLargeColumnStoreNonClustered] REBUILD
				;

			print @CHAR_TAB
				 + 'Rebuilt Index [Fact].[SaleLargeColumnStoreNonClustered].[INDX_FACT_Customer_Profit] '
				 + convert(varchar(30), getdate(), 100)
		

		end		
	END

	dbcc shrinkfile('WWI_Log') with no_infomsgs;

END

go


 

Invoke SP to load data

Invoke SP to load data into tables

Fact.usp_SaleLargeRowStorePopulateTable

use [WideWorldImportersDW]
go

/*
	truncate table [Fact].[SaleLargeRowStore]
*/
declare	 @EstimatedRowsFor2012 bigint
declare  @defrag bit 

set @EstimatedRowsFor2012 = 12000000
set  @defrag =1

exec [Fact].[usp_SaleLargeRowStorePopulateTable]
	  @EstimatedRowsFor2012 = @EstimatedRowsFor2012
	, @defrag = @defrag

Fact.SaleLargeColumnStoreClustered

use [WideWorldImportersDW]
go

/*
	truncate table [Fact].[SaleLargeColumnStoreClustered]
*/

declare	 @EstimatedRowsFor2012 bigint
declare  @defrag bit 

set @EstimatedRowsFor2012 = 12000000
set  @defrag =1

exec [Fact].[usp_SaleLargeColumnStoreClusteredPopulateTable]
	  @EstimatedRowsFor2012 = @EstimatedRowsFor2012
	, @defrag = @defrag


Fact.SaleLargeColumnStoreNonClustered


use [WideWorldImportersDW]
go

/*

	truncate table [Fact].[SaleLargeColumnStoreNonClustered]

*/

declare	 @EstimatedRowsFor2012 bigint
declare  @defrag bit

set @EstimatedRowsFor2012 = 12000000
set  @defrag =0


exec [Fact].[usp_SaleLargeColumnStoreNonClusteredPopulateTable]
	  @EstimatedRowsFor2012 = @EstimatedRowsFor2012
	, @defrag = @defrag


Metadata – Fact.Sales*

Number of Records

Code


set nocount on
go
set XACT_ABORT on
go

use [WideWorldImportersDW]
go

; with cteTable
(
      [id]
	, [table]
	, [objectID]
)
as
(
	select 
			  tblA.[id]
			, tblA.[name]
			, [objectID]
				 = object_id(tblA.[name])
	from  
		(

			select 1 as [id], '[Fact].[SaleLargeRowStore]' as [name]
    
			union

			select 2  as [id], '[Fact].[SaleLargeColumnStoreClustered]' as [name]
    
			union

			select 2 as [id], '[Fact].[SaleLargeColumnStoreNonClustered]' as [name]

		) tblA
) 
select
	      cteT.id

        , [table]
            =  tblSS.[name]
                + '.'
                + tblSO.[name]
 
        , [indexName]
            = tblSI.[name]
                     
        , [indexType]
          = tblSI.[type_desc]
 
        , [numberofRecords]
            = sum(tblSP.[rows])
 
        , numberofPagesData
            = sum(tblSPS.in_row_data_page_count)
 
        , numberofPagesLOB
            = sum(tblSPS.lob_used_page_count)
 
        , usedPageCount
            = sum(used_page_count)

        , reservedPageCount
            = sum(tblSPS.reserved_page_count)
 
        , [reservedPageInMB]
            = sum(tblSPS.reserved_page_count *1)  / 128

from   sys.schemas tblSS
 
inner join sys.objects tblSO
 
    on tblSS.schema_id = tblSO.schema_id
 
inner join sys.indexes tblSI
 
    on tblSO.object_id = tblSI.object_id
 
 
inner join sys.partitions tblSP
 
    on tblSI.object_id = tblSP.object_id
    and tblSI.index_id = tblSP.index_id
 
 
inner join sys.dm_db_partition_stats tblSPS
 
    on tblSP.object_id = tblSPS.object_id
    and tblSP.index_id = tblSPS.index_id
    and tblSP.partition_id = tblSPS.partition_id
 
 
inner join cteTable cteT
        on cteT.[objectID] = tblSO.object_id
 
group by
		  cteT.id
        , tblSS.[name]
        , tblSO.[name]
        , tblSI.index_id
        , tblSI.[name]
        , tblSI.[type_desc]
 
order by
	  cteT.id
        , tblSS.[name]
        , tblSO.[name]
        , tblSI.index_id


Output

Explanation

  1. We have above 12 million records in each of our targeted table ( Row Store  Clustered/ Column Store-Clustered / Column Store – NonClustered)
  2. Row Store Clustered
    • Clustered Index => 2.691 GB
    • Index on Clustered Key & Profit – 360 MB
  3. Column Store Clustered
    • 196 MB
  4. Column Stored Non Clustered
    • Heap => 2.687 GB
    • Column Store Index on “Invoice Date Year” & “Delivery Date Year” – 17 MB

 

Source Control

GitHub

Availed in Github here.

 

Summary

Clustered Column Store offers the best savings in terms of storage footprint.

We have about 12 million records which should hopefully be a good start for untarget queries.

Untargeted, that is, queries that we have not hard-crafted specific indexes for.