SqlServer – ColumnStore Index – AdventureWorksDW

Background

A couple of weeks ago I touched on the fact that I was interested in Columnar Databases.

In SQL Server, the Columnar Engine is integrated into the traditional RDBMS Row Engine.

 

Use Cases

Went online to search out use cases for Columnar Databases.

Dennis Forbes

The best I found is Dennis Forbes work and it is availed here.

 

DataSets

Wanted to find suitable datasets and explored Sample databases courtesy of Microsoft’s AdventureWorks.

AdventureWorks / AdventureWorksDW

The AdventureWorks dataset are packaged as AdventureWorks & AdventureWorksDW.

GitHub Microsoft SQL Server Sample Databases

The main project site for Microsoft’s sample DB is here.

Per SQL Server v2016, Adventure Works appears to be deprecated in favor of Wide World Importers.

The 2014 Version specific files for Adventure Works are available here.

 

Download & Avail

Please download and avail the backup file for AdventureWorksDW.

 

Tables

Tables in a Data Warehouse database can be viewed through the lenses of Dimension or Fact.

Dimension being what is being measured, and Fact been the value measured.

Table Dictionary

Type Table Description
Dimension
DimAccount  Account
DimCustomer  Customer
DimDate  Date
DimEmployee  Employee
DimProduct  Product
Fact
FactInternetSales  Direct Customer Sales
FactResellerSales  Sales  to resellers

 

List of Tables

Here are the tables that we will be using.

A traditional Clustered Index table, a Clustered Column Index, and a Non-Clustered Column Index.

Table Index
FactResellerSales Row Store – Clustered Index
FactResellerSales.ColumnStore Column Store – Clustered Column Index
FactResellerSales.ColumnStore.NC Row Store with Non-Clustered Column Index

 

Table Metadata

Index Column Names
SQL

use [AdventureWorksDW2014]
go

; with cteTable
as
(
	select 'DimAccount' as [table]
	union 
	select 'DimCustomer'
	union 
	select 'DimDate'
	union
	select 'DimEmployee'
	union
	select 'DimProduct'
	union
	select 'FactInternetSales'
	union
	select 'FactResellerSales'
	union
	select 'FactResellerSales.ColumnStore'
	union
	select 'FactResellerSales.ColumnStore.NC'
) 

, cteTableIndexColumn
as
(

	select 
			[table]
				=  tblSS.[name]
					+ '.'
					+ tblSO.[name]

			, [indexID]
				= tblSI.[index_id]
						
			, [indexType]
				= tblSI.[type_desc]
							
			, [index]
				= tblSI.[name]

			, [column]
				= tblSC.[name]

			, [indexColumnKeyOrdinal]
				= tblSIC.key_ordinal

			, [indexColumnID]
				= tblSIC.index_column_id

	from   sys.schemas tblSS

	inner join sys.objects tblSO

		on tblSS.schema_id = tblSO.schema_id

	inner join sys.indexes tblSI

		on tblSO.object_id = tblSI.object_id

	inner join sys.index_columns tblSIC

		on  tblSI.object_id = tblSIC.object_id
		and tblSI.index_id = tblSIC.index_id

	inner join sys.columns tblSC

		on  tblSIC.object_id = tblSC.object_id
		and tblSIC.column_id = tblSC.column_id

	inner join cteTable cteT
		on tblSO.[name] = cteT.[table]

	where  tblSS.[name] = 'dbo'

)

SELECT 
		[table]
		  = cteTIC.[table]

		, cteTIC.[indexID]

		, cteTIC.[index]

		, cteTIC.[indexType]

		, [column]
		  = STUFF
		  (

			(
				
				select ', ' 
						+ [column]
						+ case
							when indexColumnKeyOrdinal =0 then '^'
							else ''
						  end

				from   cteTableIndexColumn cteTIC_Inner
				
				where  cteTIC.[table] = cteTIC_Inner.[table]
				
				and    cteTIC.[indexID] = cteTIC_Inner.[indexID]

				and    cteTIC.[index] = cteTIC_Inner.[index]

				order by
						  cteTIC_Inner.indexColumnKeyOrdinal
						, [cteTIC_Inner].indexColumnID
							
				FOR XML PATH('')

			)
			, 1 ,1 ,''

		 )

from   cteTableIndexColumn cteTIC

group by
		  cteTIC.[table]
		, cteTIC.[indexID]
		, cteTIC.[index]
		, cteTIC.[indexType]

order by
		  cteTIC.[table]
		, cteTIC.[indexID]
		, cteTIC.[index]

 

Output

Number of Records
SQL

; with cteTable
as
(
	select 'DimAccount' as [table]
	union 
	select 'DimCustomer'
	union 
	select 'DimDate'
	union
	select 'DimEmployee'
	union
	select 'DimProduct'
	union
	select 'FactInternetSales'
	union
	select 'FactResellerSales'
	union
	select 'FactResellerSales.ColumnStore'
	union
	select 'FactResellerSales.ColumnStore.NC'

) 
select 
		[table]
			=  tblSS.[name]
				+ '.'
				+ tblSO.[name]

		, [indexName]
			= tblSI.[name]
					
		, [indexType]
		  = tblSI.[type_desc]

		, [numberofRecords]
			= sum(tblSP.[rows])

		, numberofPagesData
			= sum(tblSPS.in_row_data_page_count)

		, numberofPagesLOB
			= sum(tblSPS.lob_used_page_count)

		, usedPageCount
			= sum(used_page_count)

from   sys.schemas tblSS

inner join sys.objects tblSO

	on tblSS.schema_id = tblSO.schema_id

inner join sys.indexes tblSI

	on tblSO.object_id = tblSI.object_id


inner join sys.partitions tblSP

	on tblSI.object_id = tblSP.object_id
	and tblSI.index_id = tblSP.index_id


inner join sys.dm_db_partition_stats tblSPS

	on tblSP.object_id = tblSPS.object_id
	and tblSP.index_id = tblSPS.index_id
	and tblSP.partition_id = tblSPS.partition_id


inner join cteTable cteT
		on tblSO.[name] = cteT.[table]

where  tblSS.[name] = 'dbo'

group by
		  tblSS.[name]
		, tblSO.[name]
		, tblSI.index_id
		, tblSI.[name]
		, tblSI.[type_desc]

order by
		  tblSS.[name]
		, tblSO.[name]
		, tblSI.index_id

Output

Tabulated
Type Table #Records # of Pages # of Pages LOB # of Used Pages
Dimension
DimAccount  99  2  0  4
DimCustomer  36968  978  0  996
DimDate  7304 57  0  59
DimEmployee  296  29  635  666
DimProduct  1212  251  498  751
Fact
FactInternetSales  60398  1236  0  1254
FactResellerSales  60855  1672  0  1690
FactResellerSales.ColumnStore ( Clustered )  60855  0  242  240
FactResellerSales.ColumnStore ( Non Clustered – Heap)  60855  1681  42  1682
FactResellerSales.ColumnStore ( Non Clustered – INDX_COLUMNSTORE_NONCLUSTERED_SalesAmount)  60855  0  28  28
Explanation
  1. Column Store Indexes
    • Stored in LOB Data pages

 

Space Used
SQL

use [AdventureWorksDW2014]
go

exec sp_spaceused '[dbo].[FactResellerSales]'

exec sp_spaceused '[dbo].[FactResellerSales.ColumnStore]'

exec sp_spaceused '[dbo].[FactResellerSales.ColumnStore.NC]'


Output

 

Explanation
  1. Table Sizes
    • Row Stored Clustered :- 13.063 MB
    • Column Stored – Clustered :-  1.875 MB
    • Row Stored  Heap – Non Clustered Column Index :- 13.352 MB
  2. Compare Row Store To Column Store
    • Column Store is 15% of Row Store

Scenario

Let us ask a fairly straightforward question.

Get top 5 Sales from [dbo].[FactResellerSales].

Table

dbo.FactResellerSales

SQL

select top 5
		  --  ProductStandardCost * OrderQuantity
		  [salesAmount]
			= tblFRS.SalesAmount

from   [dbo].[FactResellerSales] tblFRS

order by
		tblFRS.SalesAmount desc

Query Plan

Query Plan – Warning
Textual

The query memory grant detected "ExcessiveGrant", which may impact the reliability.
Grant size: Initial 1024 KB, Final 1024 KB, Used 16 KB.

Image

 

Query Plan – Explanation
  1. [dbo].[FactResellerSales]
    • The [dbo].[FactResellerSales] table has a lone index, PK_FactInternetSales_SalesOrderNumber_SalesOrderLineNumber
    • The columns  been SalesOrderNumber and SalesOrderLineNumber
  2. We are looking for top Sales Amount ( SalesAmount)
    • We are not going to be helped by the Primary Key Column

 

dbo.[FactResellerSales.ColumnStore]

Objective

We will create a tracking table, [dbo].[FactResellerSales.ColumnStore].

The table will track the [dbo].[FactResellerSales] table.

The only difference between the two tables is that [dbo].[FactResellerSales] will have a Clustered Index and [dbo].[FactResellerSales.ColumnStore] will have a Clustered Column Store Index.

 

Create Table
SQL

DDL is posted on GitHub and here is the Link.

Index - ColumnStore - Clustered
SQL

CREATE CLUSTERED COLUMNSTORE INDEX [INDX_COLUMNSTORE_CLUSTERED]
ON [dbo].[FactResellerSales.ColumnStore]

Explanation
  1. Unlike a Row Store Index:
    • The column list is not explicitly stated
    • All columns are noted as so called “Included Columns
  2. Like traditional Row Store Clustered Index
    • All columns are stored in the Clustered Index
  3. Singularity
    • When a Clustered Column Index exists on table, no other indexes can be defined on that table
Populate Data
SQL

USE [AdventureWorksDW2014]
GO

set nocount on
go

truncate table [dbo].[FactResellerSales.ColumnStore]
GO

insert into [dbo].[FactResellerSales.ColumnStore]
select *
from   [dbo].[FactResellerSales]
go

alter index [INDX_COLUMNSTORE_CLUSTERED]
	on [dbo].[FactResellerSales.ColumnStore] rebuild;
go

update statistics [dbo].[FactResellerSales.ColumnStore] with FULLSCAN;
go

Get Top 5 Records from ColumnStore – Clustered Index
SQL

select top 5 

		  [salesAmount]
			= tblFRS.SalesAmount

from   [dbo].[FactResellerSales.ColumnStore] tblFRS

order by
		tblFRS.SalesAmount desc

Output
Explanation
  1. Columnstore Index Scan ( Clustered )
    • 2%
  2. Sort
    • 98%

 

 

dbo.[FactResellerSales.ColumnStore.NC]

Objective

We will create a new tracking table, [dbo].[FactResellerSales.ColumnStore.NC].

The table will also track the [dbo].[FactResellerSales] table.

Again, the difference between the two tables is that [dbo].[FactResellerSales] will have a Clustered Index and [dbo].[FactResellerSales.ColumnStore] will have a Non Clustered Column Store Index.

 

Create Table
SQL

The SQL for creating the dbo.[FactResellerSales.ColumnStoreNC] is here.


Index – ColumnStore – Non Clustered
SQL

CREATE NONCLUSTERED COLUMNSTORE INDEX [INDX_COLUMNSTORE_NONCLUSTERED_SalesAmount]
ON [dbo].[FactResellerSales.ColumnStore.NC]
(
   [SalesAmount]
)
Populate Data
SQL

USE [AdventureWorksDW2014]
GO

alter index [INDX_COLUMNSTORE_NONCLUSTERED_SalesAmount]
	on [dbo].[FactResellerSales.ColumnStore.NC] disable;

--truncate table [dbo].[FactResellerSales.ColumnStore.NC]

delete from [dbo].[FactResellerSales.ColumnStore.NC]
GO

insert into [dbo].[FactResellerSales.ColumnStore.NC]
select *
from   [dbo].[FactResellerSales]
go

alter index [INDX_COLUMNSTORE_NONCLUSTERED_SalesAmount]
	on [dbo].[FactResellerSales.ColumnStore.NC]	 rebuild;

update statistics [dbo].[FactResellerSales.ColumnStore.NC] with FULLSCAN;

Get Top 5 Records from ColumnStore – NonClustered Index
SQL

select top 5 

	  [salesAmountFormatted]
		= cast(tblFRS.SalesAmount as money)

from   [dbo].[FactResellerSales.ColumnStore.NC] tblFRS

order by
	tblFRS.SalesAmount desc

Output
Explanation
  1. Columnstore Index Scan ( NonClustered )
    • 2%
  2. Sort
    • 98%

 

Compare Queries

Let us compare queries targeting our tables.

Query Plan

Image

Explanation
  1. Original Table
    • Query Cost :- 39%
    • Warning :- The query memory grant detected “ExcessiveGrant”, which may impact the reliability.
      Grant size: Initial 1024 KB, Final 1024 KB, Used 16 KB.
  2. ColumnStore – Clustered
    • Query Cost :- 31%
    • Columnstore Index Scan (Clustered)
  3. ColumnStore – NonClustered
    • Query Cost :- 31%
    • Columnstore Index Scan (NonClustered)

Statistics IO

v-2014
Image – v2014

Explanation – v2014
  1. dbo.FactResellerSales
    • Scan count :- 1
    • Reads
      • Logical reads :- 1682
      • Physical reads :- 3
      • Read Aheads :- 1720
    • LOB Reads
      • Logical reads :- 0
      • Physical reads :- 0
      • Read Aheads :- 0
  2. dbo.[FactResellerSales.ColumnStore]
    • Scan count :- 1
    • Reads
      • Logical reads :- 0
      • Physical reads :- 0
      • Read Aheads :- 0
    • LOB Reads
      • Logical reads :- 38
      • Physical reads :- 1
      • Read Aheads :- 60
  3. dbo.[FactResellerSales.ColumnStore.NC]
    • Scan count :- 1
    • Reads
      • Logical reads :- 0
      • Physical reads :- 0
      • Read Aheads :- 0
    • LOB Reads
      • Logical reads :- 13
      • Physical reads :- 1
      • Read Aheads :- 24
v-2016
Image – v2016

Explain – v2016
  1. Non-Clustered Column Index
    • It is curious that we have both a workfile and worktable reference on Non-Clustered Column Index Search

Time Taken

Image
Image – v2014
Explanation – v2014
  1. The original query took 33%
  2. The query that targeted the Clustered Column Index was at 15%
  3. And, the query that targeted the Non Clustered Column Index was at 52%

 

Image – v2016

Explanation – v2016
  1. The original query took 37%
  2. The query that targeted the Clustered Column Index was at 18%
  3. And, the query that targeted the Non Clustered Column Index was at 43%

 

GitHub

Commited to Github.

The repository is here

Dedication

Paying back Dennis Forbes.

Summary

Admittedly, our dataset is very small.

A Clustered Column Store is much, more smaller than a Row Store table.

This is due to the compression algorithms that lends themselves particularly to a Column Store.

It appears that a Clustered Columnar Index might reduce our query time.

In comparison to a Clustered Column Index, the Nonclustered Column Index is relatively slower.

Additionally in v2016, it has both workfile and worktable references in the Statistics IO.

 

References

  1. SQL Server Dynamic Management Views ( DMVs )
    • sys.dm_db_partition_stats (Transact-SQL)
      Link
    • sys.index_columns (Transact-SQL)
      Link
  2. Thomas LaRock
    • 05 MAY SIZE MATTERS: TABLE ROWS AND DATABASE DATA PAGES
      Link
  3. Martin Schoombee
    • ADVENTUREWORKS DW: GENERATING MORE FACTS
      Link

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s