MySQL – information_schema.statistics

Background

As a quick follow-up to our discussion of metadata on tables, let us discuss indexes.

BTW, the discussion on tables is here.

INFORMATION_SCHEMA.STATISTICS

Code


select

		tblStat.TABLE_NAME as 'table'

	  , tblStat.INDEX_NAME as 'index'

	  , case tblStat.NON_UNIQUE
			   when 0 then 'No'
			   when 1 then 'Yes'
			   else '?'
		end as 'Unique'


	  , GROUP_CONCAT( COLUMN_NAME order by SEQ_IN_INDEX ) as colList


	  , tblStat.CARDINALITY


from  INFORMATION_SCHEMA.STATISTICS tblStat


where  tblStat.TABLE_SCHEMA = 'drupal'


group by

		 tblStat.TABLE_SCHEMA
	   , tblStat.TABLE_NAME
	   , tblStat.INDEX_NAME


order by

		 tblStat.TABLE_SCHEMA
	  ,  tblStat.TABLE_NAME
	  ,  tblStat.INDEX_NAME


LIMIT 25

;


Output

 

Comparison with Other Databases

  1. GROUP_CONCAT
    • The group_concat is very, very powerful and has few equal in other database platforms
    • Function Documentation

 

SHOW INDEX

To get a bit more information, specifically cardinality ( Number of unique values in each column combination ), we can issue “show index” against the table.

Code

Syntax


show index from [table] from [database];

Sample


show index from search_index from drupal;

Output

References

  1. Reference
    • INFORMATION_SCHEMA Tables
      • INFORMATION_SCHEMA STATISTICS Table
    • Show Index
    •  Group_Function
      • Group By Function
        Link
    • Group_Concat

MySQL – information_schema.tables

Background

With SQL being a lingua-franca for quering data it should not be a far stretch to note that the various governing bodies have agreed on standards on how to expose dictionary.

 

Mysql

For instance, if we want to see which tables are in a database we can launch our favorite GUI SQL Editor or just access MySQL’s Client query tool eponymous named mysql.

 

information_schema.tables

SQL

Sample


select 
       TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME
     , TABLE_TYPE, ENGINE
     , TABLE_ROWS, AVG_ROW_LENGTH 

from information_schema.tables 

where TABLE_TYPE not in ("SYSTEM VIEW")  

LIMIT 10


 

Output

Explanation

  1. Database
    • def
  2. Schema
    • audit_repository
  3. Table
  4. Table Type
    • Available Options
      • SYSTEM VIEW
      • BASE TABLE
  5. Engine
    • Available Options
      • Memory – System Tables
      • InnoDB
      • MyISAM
  6. TABLE ROWS
    • Number of records in table
  7. AVG ROW LENGTH
    • Based on column data type and used size allocation

 

Comparison with Other Databases

SQL Server

  1. Able to use double-quotes in query
    • We are able to use double-quotes (“) to delimit column values
  2. Limit N
    • In Transact SQL we use top N.
    • On the other hand, in MySQL we use LIMIT N

 

show tables

Objective

List tables and views that fits specified name format.

SQL

Syntax


show full tables from [database] like [name] where TABLE_TYPE in [BASE_TABLE | VIEW ];

Sample


show full tables from drupal like 'role%';

 

Output

 

describe

Objective

Display table’s structure.

SQL

Syntax


describe [table];

Sample


describe drupal.role;

Output

 

References

  1. MySQL
    • Show Tables
      • Show Tables Syntax
        Link
    • Show Commands
      • Extensions to SHOW Statements
        Link

Column Stores Indexes – Query Comparison – Unpartitioned VS Partitioned

Objective

We had Column Store Tables created and now have tracking partitioned tables created, as well.

It is now time to see whether we benefit by partitioning the tables.

From our original set of tables, we only created tracking Partition tables for the Row Store Clustered Index and the Column Store Clustered Index.

For brevity sake, we skipped the Column Store Non-Clustered Index.

 

Query

Scenario

Our sample query will retrieve the top two sales within a date range.

SQL

Sample

Here is what the query likes when we target the [Fact].[SaleLargeRowStore] table.

BTW, that table is the unpartitioned Row Store table.

It has a Clustered Index on it’s primary key, Sale Key.


declare @dateBegin datetime
declare @dateEnd   datetime


set @dateBegin = '2013-04-15';
set @dateEnd = '2013-05-15';

; with cteYearCustomer --[Fact].[SaleLargeRowStore]
(
      [CustomerKey]
    , [profit]
)
as
(
    select  
              [CustomerKey] = tblFS.[Customer Key]
            , [profit] = max(tblFS.[profit])

    from  [Fact].[SaleLargeRowStore] tblFS

    where tblFS.[Invoice Date Key] between @dateBegin and @dateEnd

    group by
             tblFS.[Customer Key]

)

select top 2 *

from    cteYearCustomer cteYC

order by
        [profit] desc


Full


dbcc dropcleanbuffers with no_infomsgs
go

use [WideWorldImportersDW]
go

set nocount on
go

set statistics io on
go

declare @dateBegin datetime
declare @dateEnd   datetime


set @dateBegin = '2013-04-15';
set @dateEnd = '2013-05-15';

-- [Fact].[SaleLargeRowStore] 
declare @dateRSCBegin datetime
declare @dateRSCEnd datetime

declare @dateRSCPIndexIsChosenByEngineBegin datetime
declare @dateRSCPIndexIsChosenByEngineEnd datetime

declare @dateRSCPIndexIsClusteredBegin datetime
declare @dateRSCPIndexIsClusteredEnd datetime

declare @dateCSCIndexBegin datetime
declare @dateCSCIndexEnd datetime

declare @dateCSCPIndexBegin datetime
declare @dateCSCPIndexEnd datetime

declare @dateCSNCIndexBegin datetime
declare @dateCSNCIndexEnd datetime


declare @datediffSum bigint 

declare @NUMBER_OF_CHARS_IN_DIVIDER int

set @NUMBER_OF_CHARS_IN_DIVIDER = 180

set @dateRSCBegin = getdate();

print '@@version :- '
		+ cast(@@version as varchar(4000))

print replicate('=', @NUMBER_OF_CHARS_IN_DIVIDER);

print '[Fact].[SaleLargeRowStore]'
print '**************************'

-- [Fact].[SaleLargeRowStore]
; with cteYearCustomer --[Fact].[SaleLargeRowStore]
(
	  [CustomerKey]
	, [profit]
)
as
(
	select  
			  [CustomerKey] = tblFS.[Customer Key]
			, [profit] = max(tblFS.[profit])

	from  [Fact].[SaleLargeRowStore] tblFS

	where tblFS.[Invoice Date Key] between @dateBegin and @dateEnd

	group by
			 tblFS.[Customer Key]

)

select top 2 *

from    cteYearCustomer cteYC

order by
		[profit] desc

set @dateRSCEnd = getdate()

print replicate('=', @NUMBER_OF_CHARS_IN_DIVIDER);



/*
	 [Fact].[SaleLargeRowStoreClusteredPartition]
*/

print '[Fact].[SaleLargeRowStoreClusteredPartition] -- Clustered Index'
print '***************************************************************'

set @dateRSCPIndexIsClusteredBegin = getdate()

; with cteYearCustomer --[Fact].[SaleLargeRowStoreClusteredPartition] --Clustered Index
(
	  [CustomerKey]
	, [profit]
)
as
(
	select  
			  [CustomerKey] = tblFS.[Customer Key]
			, [profit] = max(tblFS.[profit])

	from [Fact].[SaleLargeRowStoreClusteredPartition]  tblFS with ( INDEX = 1)

	where tblFS.[Invoice Date Key] between @dateBegin and @dateEnd

	group by
			 tblFS.[Customer Key]

)

select top 2 *

from    cteYearCustomer cteYC

order by
		[profit] desc

set @dateRSCPIndexIsClusteredEnd = getdate()

print replicate('=', @NUMBER_OF_CHARS_IN_DIVIDER);

exec('dbcc dropcleanbuffers with no_infomsgs')

/*
	 [Fact].[SaleLargeRowStoreClusteredPartition]
*/

print '[Fact].[SaleLargeRowStoreClusteredPartition] -- Chosen by SQL Engine'
print '********************************************************************'

set @dateRSCPIndexIsChosenByEngineBegin = getdate()

; with cteYearCustomer --[Fact].[SaleLargeRowStoreClusteredPartition] -- Chosen by SQL Engine
(
	  [CustomerKey]
	, [profit]
)
as
(
	select  
			  [CustomerKey] = tblFS.[Customer Key]
			, [profit] = max(tblFS.[profit])

	from [Fact].[SaleLargeRowStoreClusteredPartition]  tblFS --with ( INDEX = 1)

	where tblFS.[Invoice Date Key] between @dateBegin and @dateEnd

	group by
			 tblFS.[Customer Key]

)

select top 2 *

from    cteYearCustomer cteYC

order by
		[profit] desc

set @dateRSCPIndexIsChosenByEngineEnd = getdate()

print replicate('=', @NUMBER_OF_CHARS_IN_DIVIDER);



/*
	[Fact].[SaleLargeColumnStoreClustered]
*/
print '[Fact].[SaleLargeColumnStoreClustered]'
print '**************************************'

set @dateCSCIndexBegin = getdate()

; with cteYearCustomer -- [Fact].[SaleLargeColumnStoreClustered]
(
	  [CustomerKey]
	, [profit]
)
as
(
	select  
			  [CustomerKey] = tblFS.[Customer Key]
			, [profit] = max(tblFS.[profit])

	from  [Fact].[SaleLargeColumnStoreClustered] tblFS

	where tblFS.[Invoice Date Key] between @dateBegin and @dateEnd

	group by
			--  tblY.[year]
			 tblFS.[Customer Key]

)

select top 2 *

from    cteYearCustomer cteYC

order by
		[profit] desc

set @dateCSCIndexEnd = getdate()

print replicate('=', @NUMBER_OF_CHARS_IN_DIVIDER);

/*
	[Fact].[SaleLargeColumnStoreClusteredPartition] 
*/

print '[Fact].[SaleLargeColumnStoreClusteredPartition]'
print '**********************************************'

set @dateCSCPIndexBegin = getdate()

; with cteYearCustomer -- [Fact].[SaleLargeColumnStoreClusteredPartition] 
(
	  [CustomerKey]
	, [profit]
)
as
(
	select  
			  [CustomerKey] = tblFS.[Customer Key]
			, [profit] = max(tblFS.[profit])

	from  [Fact].[SaleLargeColumnStoreClusteredPartition] tblFS

	where tblFS.[Invoice Date Key] between @dateBegin and @dateEnd

	group by
			 tblFS.[Customer Key]

)

select top 2 *

from    cteYearCustomer cteYC

order by
		[profit] desc

set @dateCSCPIndexEnd = getdate()

print replicate('=', @NUMBER_OF_CHARS_IN_DIVIDER)

/*
	[Fact].[SaleLargeColumnStoreNonClustered] 
*/

print '[Fact].[SaleLargeColumnStoreNonClustered]'
print '*****************************************'

set @dateCSNCIndexBegin = getdate()

; with cteYearCustomer -- [Fact].[SaleLargeColumnStoreNonClustered 
(
	  [CustomerKey]
	, [profit]
)
as
(
	select  
			  [CustomerKey] = tblFS.[Customer Key]
			, [profit] = max(tblFS.[profit])

	from  [Fact].[SaleLargeColumnStoreNonClustered] tblFS

	where tblFS.[Invoice Date Key] between @dateBegin and @dateEnd

	group by
			 tblFS.[Customer Key]

)

select top 2 *

from    cteYearCustomer cteYC

order by
		[profit] desc

set @dateCSNCIndexEnd = getdate()


print replicate('=', @NUMBER_OF_CHARS_IN_DIVIDER)

set @datediffSum =
			        isNull(datediff(millisecond, @dateRSCBegin, @dateRSCEnd), 0)
				  + isNull(datediff(millisecond, @dateRSCPIndexIsChosenByEngineBegin, @dateRSCPIndexIsChosenByEngineEnd), 0)
				  + isNull(datediff(millisecond, @dateRSCPIndexIsClusteredBegin, @dateRSCPIndexIsClusteredEnd), 0)
				  + isNull(datediff(millisecond, @dateCSCIndexBegin, @dateCSCIndexEnd), 0)
				  + isNull(datediff(millisecond, @dateCSCPIndexBegin, @dateCSCPIndexEnd), 0)
				  + isNull(datediff(millisecond, @dateCSNCIndexBegin, @dateCSNCIndexEnd), 0)


select 

		  [sourceID] = [sourceID]
		,  =  
		, [dateStart] = dateStart
		, [dateEnd] = dateEnd
		, [durationMilliseconds] =  [duration] 
		, [%] = cast([%] as decimal(6, 2))

from   
	(

		select 
				  [sourceID] = 1
				,  = '[Fact].[SaleLargeRowStore]'
				, [dateStart] = @dateRSCBegin
				, [dateEnd] = @dateRSCEnd
				, [duration] = datediff(millisecond, @dateRSCBegin, @dateRSCEnd)
				, [%] = (datediff(millisecond, @dateRSCBegin, @dateRSCEnd)) * 100.00 / @datediffSum


		union all

		select 
				  [sourceID] = 2.1
				,  = '[Fact].[SaleLargeRowStoreClusteredPartition] - Clustered Index'
				, [dateStart] = @dateRSCPIndexIsClusteredBegin
				, [dateEnd] = @dateRSCPIndexIsClusteredEnd
				, [duration] = datediff(millisecond, @dateRSCPIndexIsClusteredBegin, @dateRSCPIndexIsClusteredEnd)
				, [%] = (datediff(millisecond, @dateRSCPIndexIsClusteredBegin, @dateRSCPIndexIsClusteredEnd)) * 100.00 / @datediffSum

		union all

		select 
				  [sourceID] = 2.2
				,  = '[Fact].[SaleLargeRowStoreClusteredPartition] - Chosen by Engine'
				, [dateStart] = @dateRSCPIndexIsChosenByEngineBegin
				, [dateEnd] = @dateRSCPIndexIsChosenByEngineBegin
				, [duration] = datediff(millisecond, @dateRSCPIndexIsChosenByEngineBegin, @dateRSCPIndexIsChosenByEngineEnd)
				, [%] = (datediff(millisecond, @dateRSCPIndexIsChosenByEngineBegin, @dateRSCPIndexIsChosenByEngineEnd)) * 100.00 / @datediffSum

		union all

		select 
				  [sourceID] = 3
				,  = '[Fact].[SaleLargeColumnStoreClustered]'
				, [dateStart] = @dateCSCIndexBegin
				, [dateEnd] = @dateCSCIndexEnd
				, [duration] = datediff(millisecond, @dateCSCIndexBegin, @dateCSCIndexEnd)
				, [%] = (datediff(millisecond, @dateCSCIndexBegin, @dateCSCIndexEnd)) * 100.00 / @datediffSum


		union all

		select 
				  [sourceID] = 4
				,  = '[Fact].[SaleLargeColumnStoreClusteredPartition]'
				, [dateStart] = @dateCSCPIndexBegin
				, [dateEnd] = @dateCSCPIndexEnd
				, [duration] = datediff(millisecond, @dateCSCPIndexBegin, @dateCSCPIndexEnd)
				, [%] = (datediff(millisecond, @dateCSCPIndexBegin, @dateCSCPIndexEnd)) * 100.00 / @datediffSum

		union all

		select 
				  [sourceID] = 5
				,  = '[Fact].[SaleLargeColumnStoreNonClustered]'
				, [dateStart] = @dateCSNCIndexBegin
				, [dateEnd] = @dateCSNCIndexEnd
				, [duration] = datediff(millisecond, @dateCSNCIndexBegin, @dateCSNCIndexEnd)
				, [%] = (datediff(millisecond, @dateCSNCIndexBegin, @dateCSNCIndexEnd)) * 100.00 / @datediffSum
		

	) tblA

order by

	  [sourceID] asc


Query Plan

Query Plan – v2014

Row Store – Unpartition Table ( Fact.SaleLargeRowStore )

Image

Explanation
  1. Missing Index
    • Improvement Projection
      • The Query Processor estimates that implementing the following index could improve the query cost by 93.1978%.
    • Index Statement
      • CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>] ON [Fact].[SaleLargeRowStore] ([Invoice Date Key]) INCLUDE ([Customer Key],[Profit])

Row Store – Partition Table ( Fact.SaleLargeRowStoreClusteredPartition ) – Index Chosen By Engine

Image

Explanation
  1. Missing Index
    • Improvement Projection
      • The Query Processor estimates that implementing the following index could improve the query cost by 93.1978%.
    • Index Statement
      • CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>] ON [Fact].[SaleLargeRowStore] ([Invoice Date Key]) INCLUDE ([Customer Key],[Profit])

Row Store – Partition Table ( Fact.SaleLargeRowStoreClusteredPartition ) – Index Chosen By Engine

Image

Column Store – Unpartitioned Table ( Fact.SaleLargeColumnStoreClustered )

Image

  1. Column Index Scan
    • With a Clustered Column Store Index, the Clustering Index is inclusive of all columns

Column Store – Clustered – Partitioned Table ( Fact.SaleLargeColumnStoreClusteredPartition )

Image

 

Explanation
  1. Column Index Scan
    • With a Clustered Column Store Index, the Clustering Index is inclusive of all columns

 

Column Store – Non Clustered – Non-Partitioned Table –  ( Fact.SaleLargeColumnStoreNonClustered)

Image

Explanation
  1. Table Scan
    • We do not have a targeted Index

 

Statistics I/O

Image

Statistics – v2014

Statistics I/O – v2016

Tabulated

 

Table v2014 v2016
[Fact].[SaleLargeRowStore]
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Workfile’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table ‘Workfile’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘SaleLargeRowStore’. Scan count 1, logical reads 344481, physical reads 2, read-ahead reads 344467, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table ‘SaleLargeRowStore’. Scan count 5, logical reads 344945, physical reads 3, read-ahead reads 344456, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
[Fact].[SaleLargeRowStoreClusteredPartition] – Clustered Index ( Hardcoded to use clustered Index )
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Workfile’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table ‘Workfile’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘SaleLargeRowStoreClusteredPartition’. Scan count 13, logical reads 344557, physical reads 12, read-ahead reads 344531, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table ‘SaleLargeRowStoreClusteredPartition’. Scan count 13, logical reads 344611, physical reads 10, read-ahead reads 344547, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
[Fact].[SaleLargeRowStoreClusteredPartition] – Chosen by SQL Engine
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Workfile’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table ‘Workfile’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘SaleLargeRowStoreClusteredPartition’. Scan count 13, logical reads 37, physical reads 6, read-ahead reads 26, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table ‘SaleLargeRowStoreClusteredPartition’. Scan count 13, logical reads 37, physical reads 6, read-ahead reads 54, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
[Fact].[SaleLargeColumnStoreClustered]
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.  Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Workfile’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.  ( No References to workfile )
Table ‘SaleLargeColumnStoreClustered’. Scan count 1, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 1032, lob physical reads 5, lob read-ahead reads 2706.  Table ‘SaleLargeColumnStoreClustered’. Scan count 2, logical reads 4, physical reads 2, read-ahead reads 8, lob logical reads 268, lob physical reads 5, lob read-ahead reads 497.

Table ‘SaleLargeColumnStoreClustered’. Segment reads 1, segment skipped 0.

[Fact].[SaleLargeColumnStoreClusteredPartition]
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.  Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Workfile’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.  ( No References to workfile )
Table ‘SaleLargeColumnStoreClusteredPartition’. Scan count 1, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 78, lob physical reads 1, lob read-ahead reads 90.  Table ‘SaleLargeColumnStoreClusteredPartition’. Scan count 1, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
[Fact].[SaleLargeColumnStoreNonClustered]
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Workfile’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.  ( No References to workfile )
Table ‘SaleLargeColumnStoreNonClustered’. Scan count 1, logical reads 344132, physical reads 0, read-ahead reads 344098, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table ‘SaleLargeColumnStoreNonClustered’. Scan count 5, logical reads 344005, physical reads 0, read-ahead reads 343533, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Summary

Here a couple of findings:

  1. Vertical partitioning
    • Helpful
      • Especially when queries are date driven
      • Please do some prep work and determine which date columns users often target
    • Not as Helpful
      • In a multi-column Index, not as helpful when the Date Column is not the first column referenced in the Index
  2. SQL Server Version 2016/Enterprise Versus 2014/Developer Edition
    • Optimization
      • In v2016, Enterprise Edition
        • Workfile not referenced in Statistics IO
        • Segment Reads and Segments Skipped explicitly Stated
          • Table ‘SaleLargeColumnStoreClustered’. Segment reads 1, segment skipped 0.

Column Stores Indexes & Partitioning – Partition Tables

Objective

In a recent post, we created the infrastructure objects ( File Groups, Files, Partition Functions, and Partition Schemes).

In this post, we will create tables that will utilize those objects.

 

Outline

 

Table Row or Column Store Partition Index Columns
Fact.SaleLargeRowStore Row Store [Sale Key]
Fact.SaleLargeRowStoreClusteredPartition Row Store InvoiceDateKeyYear [Sale Key]

[Invoice Date Key]

[Invoice Date Key]

[Customer Key]*
[Profit]*

 Fact.SaleLargeColumnStoreClustered  Column Store
 Fact.SaleLargeColumnStoreClusteredPartition  Column Store  InvoiceDateKeyYear
 Fact.SaleLargeColumnStoreNonClustered  Column Store

 

 

Exercise

Tables

[Fact].[SaleLargeRowStoreClusteredPartition]

SQL

Code

USE [WideWorldImportersDW]
GO

/*

    DROP TABLE [Fact].[SaleLargeRowStoreClusteredPartition]

*/

/*

    exec sp_helpindex '[Fact].[SaleLargeRowStoreClusteredPartition]'

*/

if object_id('[Fact].[SaleLargeRowStoreClusteredPartition]') is null
begin

    CREATE TABLE [Fact].[SaleLargeRowStoreClusteredPartition]
    (
        [Sale Key] [bigint] IDENTITY(1,1) NOT NULL,
        [City Key] [int] NOT NULL,
        [Customer Key] [int] NOT NULL,
        [Bill To Customer Key] [int] NOT NULL,
        [Stock Item Key] [int] NOT NULL,
        
        [Invoice Date Key] [date] NOT NULL,

        [InvoiceDateKeyYear] [int] NOT NULL,
        
        [Delivery Date Key] [date] NULL,
        
        [DeliveryDateKeyYear] [int] NULL,
        
        [Salesperson Key] [int] NOT NULL,
        [WWI Invoice ID] [int] NOT NULL,
        [Description] [nvarchar](100) NOT NULL,
        [Package] [nvarchar](50) NOT NULL,
        [Quantity] [int] NOT NULL,
        [Unit Price] [decimal](18, 2) NOT NULL,
        [Tax Rate] [decimal](18, 3) NOT NULL,
        [Total Excluding Tax] [decimal](18, 2) NOT NULL,
        [Tax Amount] [decimal](18, 2) NOT NULL,
        [Profit] [decimal](18, 2) NOT NULL,
        [Total Including Tax] [decimal](18, 2) NOT NULL,
        [Total Dry Items] [int] NOT NULL,
        [Total Chiller Items] [int] NOT NULL,
        [Lineage Key] [int] NOT NULL
    

        ) 

        --ON [USERDATA] 
        ON [pschemeYear]
        (
            [Invoice Date Key]
        )

end
go

declare @object sysname
declare @objectID int
declare @constraint sysname
declare @index  sysname

declare @addPartitionedIndex   bit
declare @addUnpartitionedIndex bit

set @object = '[Fact].[SaleLargeRowStoreClusteredPartition]'
set @constraint = 'PK_Fact_SaleLargeRowStoreClusteredPartition'

set @objectID = object_id(@object)

if object_id(@object) is not null
begin

    /*

        alter table [Fact].[SaleLargeRowStoreClusteredPartition]
            drop CONSTRAINT [PK_Fact_SaleLargeRowStoreClusteredPartition] 

    */
    if not exists
        (

            select *
            from  sys.key_constraints tblSKC
            where tblSKC.parent_object_id = @objectID
            and   tblSKC.[name] = @constraint

        )
    begin

        print 'Create Primary Key ' 
                + @object 
                + '.' 
                + quoteName(@constraint) 
                + ' ...'

        alter table [Fact].[SaleLargeRowStoreClusteredPartition]
        ADD CONSTRAINT [PK_Fact_SaleLargeRowStoreClusteredPartition] 
        PRIMARY KEY CLUSTERED 
            (

                [Sale Key] ASC
                /*
                Msg 1908, Level 16, State 1, Line 19
                Column 'Invoice Date Key' is partitioning column of the index 'PK_Fact_SaleLargeRowStoreClusteredPartition'. Partition columns for a unique index must be a subset of the index key.
                Msg 1750, Level 16, State 0, Line 19
                Could not create constraint or index. See previous errors.
                */
            , [Invoice Date Key]

            )

        --ON [USERDATA]
        ON [pschemeYear]
        (
            [Invoice Date Key]
        )
            
        update statistics [Fact].[SaleLargeRowStoreClusteredPartition]
            [PK_Fact_SaleLargeRowStoreClusteredPartition] 
            with FULLSCAN;


        print 'Created Constraint - Primary Key'
                + @object + '.' 
                + quoteName(@constraint)

    end



    /*
        drop index [Fact].[SaleLargeRowStoreClusteredPartition].[INDX_InvoiceDateKey_WithIncludedColumns_UnPartition]
    */
    set @addPartitionedIndex = 0;
    set @addPartitionedIndex = 1;
    set @index = 'INDX_InvoiceDateKey_WithIncludedColumns_Partition';

    if not exists
        (
            select *
            from  sys.indexes tblSI
            where tblSI.object_id = @objectID
            and   tblSI.[name] = @index
        )
        and (@addPartitionedIndex = 1)
    begin

        print 'Create Index ' + @object + '.' + quoteName(@index) + ' ...'


        CREATE NONCLUSTERED INDEX [INDX_InvoiceDateKey_WithIncludedColumns_Partition]
        ON [Fact].[SaleLargeRowStoreClusteredPartition] 
        (
            [Invoice Date Key]
        )
        INCLUDE 
        (
              [Customer Key]
            , [Profit]
        )
        with
        (
            FILLFACTOR=100
        )
        /*
        ON [USERDATA]
        */
        ON [pschemeYear]
        (
            [Invoice Date Key]
        )


        update statistics [Fact].[SaleLargeRowStoreClusteredPartition]
            [INDX_InvoiceDateKey_WithIncludedColumns_Partition]
            with FULLSCAN;


        print 'Created Index ' + @object + '.' + quoteName(@index)

    end

end


go

 

[Fact].[SaleLargeColumnStoreClusteredPartition]

SQL

Code

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

USE [WideWorldImportersDW]
GO

/*

	DROP TABLE [Fact].[SaleLargeColumnStoreClusteredPartition]

	exec sp_helpindex '[Fact].[SaleLargeColumnStoreClusteredPartition]'

*/
if object_id('[Fact].[SaleLargeColumnStoreClusteredPartition]') is null
begin


	CREATE TABLE [Fact].[SaleLargeColumnStoreClusteredPartition]
	(
		[Sale Key] [bigint] IDENTITY(1,1) NOT NULL,
		[City Key] [int] NOT NULL,
		[Customer Key] [int] NOT NULL,
		[Bill To Customer Key] [int] NOT NULL,
		[Stock Item Key] [int] NOT NULL,
	
		[Invoice Date Key] [date] NOT NULL,
		[InvoiceDateKeyYear] [int] NOT NULL,
			
		[Delivery Date Key] [date] NULL,
		[DeliveryDateKeyYear] [int] NULL,
			
		[Salesperson Key] [int] NOT NULL,
		[WWI Invoice ID] [int] NOT NULL,
		[Description] [nvarchar](100) NOT NULL,
		[Package] [nvarchar](50) NOT NULL,
		[Quantity] [int] NOT NULL,
		[Unit Price] [decimal](18, 2) NOT NULL,
		[Tax Rate] [decimal](18, 3) NOT NULL,
		[Total Excluding Tax] [decimal](18, 2) NOT NULL,
		[Tax Amount] [decimal](18, 2) NOT NULL,
		[Profit] [decimal](18, 2) NOT NULL,
		[Total Including Tax] [decimal](18, 2) NOT NULL,
		[Total Dry Items] [int] NOT NULL,
		[Total Chiller Items] [int] NOT NULL,
		[Lineage Key] [int] NOT NULL
	
			

	) ON [USERDATA]

end


/*
	Object:  Index [INDX_FACT_SaleLargeColumnStoreClustered] 
*/

declare @table sysname
declare @index sysname

set @table = '[Fact].[SaleLargeColumnStoreClusteredPartition]';
set @index = 'INDX_FACT_SaleLargeColumnStoreClusteredPartition'

if object_id(@table) is not null
begin

	if not exists
	(
		select *
		from   sys.indexes tblSI
		where  tblSI.object_id = object_id(@table) 
		and    tblSI.[name] = @index
	) --and (1=0)
	begin

		print 'Create Index ' 
				+ @table
				+ ' - Index '
				+ @index
		
		CREATE CLUSTERED INDEX [INDX_FACT_SaleLargeColumnStoreClusteredPartition] 
		ON [Fact].[SaleLargeColumnStoreClusteredPartition] 
		(
			[Invoice Date Key]
		)
		WITH 
		(
				DROP_EXISTING = OFF
			--, FILLFACTOR = 100
			--, COMPRESSION_DELAY = 0
		) 
		--ON [USERDATA]
		ON [pschemeYear]
			(
				[Invoice Date Key]
			)
		
	end


	if exists
	(
		select *
		from   sys.indexes tblSI
		where  tblSI.object_id = object_id(@table) 
		and    tblSI.[name] = @index
		and    tblSI.type_desc != 'CLUSTERED COLUMNSTORE'
	)
	begin

		print 'Create Index ' 
				+ @table
				+ ' - Index '
				+ @index

		CREATE CLUSTERED COLUMNSTORE INDEX [INDX_FACT_SaleLargeColumnStoreClusteredPartition] 
		ON [Fact].[SaleLargeColumnStoreClusteredPartition] 
		WITH 
		(
			DROP_EXISTING = ON
		) 
		--ON [USERDATA]
		ON [pschemeYear]
		(
			[Invoice Date Key]
		)


	end

end
GO


Source Code Control

GitHub

The Code for creating the tables is posted on Github here

 

Summary

We will add corresponding data to our newly created partitioned tables and run a few queries and see the effect of the partitioning.

 

Column Stores Indexes & Partitioning – Foundational Work

Background

As data grows bigger and ages it becomes prudent to look at partitioning as an avenue to better reduce the overall footprint of data read from disk into memory.

Column Store by its very definition delivers on the promises of Vertical Partitioning.

Let us see review the steps we have to take to achieve Range Partitioning.

 

Design Goals

Our goal is to divvy up our data into years.

In each table that we will be targeting we will ensure that a date column exists.

And, hope to have one of those date columns track the occurrence date.

 

Overview

Here are the plumbing areas:

  1. Create new Filegroups and files within the new file groups
  2. Create Partitioning Function
  3. Create Partitioning Scheme

 

Steps

Platform

File Groups & Files

Objective

Create File groups for each business year.

And, create a single file in each file group.

SQL


USE [WideWorldImportersDW]   
go

-- Year NULL

if not exists  
(      
	select *      
	from   sys.filegroups tblSFG      
	where  tblSFG.name = 'fg_Partition_Year_Null'  
)  
begin      

	ALTER DATABASE [WideWorldImportersDW] 
		ADD FILEGROUP  [fg_Partition_Year_Null]   
		
end 
go

if not exists  
	(      
		select *      
		from   sys.master_files tblMF      
		where  tblMF.[database_id] = db_id('WideWorldImportersDW')      
		and    tblMF.name = 'fg_Partition_Year_Null__01'  
	)  
begin 

   ALTER DATABASE [WideWorldImportersDW]  
   ADD FILE   
   (      
      NAME = [fg_Partition_Year_Null__01]   
      , FILENAME = 'Z:\Microsoft\SQLServer\DataFiles\WideWorldImportersDW_fg_Partition_Year_Null_file_001.ndf'  
   )  
   TO FILEGROUP [fg_Partition_Year_Null]  
	
end 


-- Year 2010
if not exists  
(      
    select *      
    from   sys.filegroups tblSFG      
    where  tblSFG.name = 'fg_Partition_Year_2010'  
)  
begin

   ALTER DATABASE [WideWorldImportersDW]         
      ADD FILEGROUP  [fg_Partition_Year_2010]   

end 
    
if not exists  
(      
   select *      
   from   sys.master_files tblMF      
   where  tblMF.[database_id] = db_id('WideWorldImportersDW')      
   and    tblMF.name = 'fg_Partition_Year_2010__01'  
)  
begin
   
   ALTER DATABASE [WideWorldImportersDW]  
   ADD FILE   
   (      
         NAME = [fg_Partition_Year_2010__01]   
       , FILENAME = 'Z:\Microsoft\SQLServer\DataFiles\WideWorldImportersDW_fg_Partition_Year_2010_file_001.ndf'  
  )  
  TO FILEGROUP [fg_Partition_Year_2010]  

end 


-- Year 2011
if not exists  
(      
    select *      
    from   sys.filegroups tblSFG      
    where  tblSFG.name = 'fg_Partition_Year_2011'  
)  
begin

   ALTER DATABASE [WideWorldImportersDW]         
      ADD FILEGROUP  [fg_Partition_Year_2011]   

end 
    
if not exists  
(      
   select *      
   from   sys.master_files tblMF      
   where  tblMF.[database_id] = db_id('WideWorldImportersDW')      
   and    tblMF.name = 'fg_Partition_Year_2011__01'  
)  
begin
   
   ALTER DATABASE [WideWorldImportersDW]  
   ADD FILE   
   (      
         NAME = [fg_Partition_Year_2011__01]   
       , FILENAME = 'Z:\Microsoft\SQLServer\DataFiles\WideWorldImportersDW_fg_Partition_Year_2011_file_001.ndf'  
  )  
  TO FILEGROUP [fg_Partition_Year_2011]  

end 


-- Year 2012
if not exists  
(      
    select *      
    from   sys.filegroups tblSFG      
    where  tblSFG.name = 'fg_Partition_Year_2012'  
)  
begin

   ALTER DATABASE [WideWorldImportersDW]         
      ADD FILEGROUP  [fg_Partition_Year_2012]   

end 
    
if not exists  
(      
   select *      
   from   sys.master_files tblMF      
   where  tblMF.[database_id] = db_id('WideWorldImportersDW')      
   and    tblMF.name = 'fg_Partition_Year_2012__01'  
)  
begin
   
   ALTER DATABASE [WideWorldImportersDW]  
   ADD FILE   
   (      
         NAME = [fg_Partition_Year_2012__01]   
       , FILENAME = 'Z:\Microsoft\SQLServer\DataFiles\WideWorldImportersDW_fg_Partition_Year_2012_file_001.ndf'  
  )  
  TO FILEGROUP [fg_Partition_Year_2012]  

end 


-- Year 2013
if not exists  
(      
    select *      
    from   sys.filegroups tblSFG      
    where  tblSFG.name = 'fg_Partition_Year_2013'  
)  
begin

   ALTER DATABASE [WideWorldImportersDW]         
      ADD FILEGROUP  [fg_Partition_Year_2013]   

end 
    
if not exists  
(      
   select *      
   from   sys.master_files tblMF      
   where  tblMF.[database_id] = db_id('WideWorldImportersDW')      
   and    tblMF.name = 'fg_Partition_Year_2013__01'  
)  
begin
   
   ALTER DATABASE [WideWorldImportersDW]  
   ADD FILE   
   (      
         NAME = [fg_Partition_Year_2013__01]   
       , FILENAME = 'Z:\Microsoft\SQLServer\DataFiles\WideWorldImportersDW_fg_Partition_Year_2013_file_001.ndf'  
  )  
  TO FILEGROUP [fg_Partition_Year_2013]  

end 


-- Year 2014
if not exists  
(      
    select *      
    from   sys.filegroups tblSFG      
    where  tblSFG.name = 'fg_Partition_Year_2014'  
)  
begin

   ALTER DATABASE [WideWorldImportersDW]         
      ADD FILEGROUP  [fg_Partition_Year_2014]   

end 
    
if not exists  
(      
   select *      
   from   sys.master_files tblMF      
   where  tblMF.[database_id] = db_id('WideWorldImportersDW')      
   and    tblMF.name = 'fg_Partition_Year_2014__01'  
)  
begin
   
   ALTER DATABASE [WideWorldImportersDW]  
   ADD FILE   
   (      
         NAME = [fg_Partition_Year_2014__01]   
       , FILENAME = 'Z:\Microsoft\SQLServer\DataFiles\WideWorldImportersDW_fg_Partition_Year_2014_file_001.ndf'  
  )  
  TO FILEGROUP [fg_Partition_Year_2014]  

end 


-- Year 2015
if not exists  
(      
    select *      
    from   sys.filegroups tblSFG      
    where  tblSFG.name = 'fg_Partition_Year_2015'  
)  
begin

   ALTER DATABASE [WideWorldImportersDW]         
      ADD FILEGROUP  [fg_Partition_Year_2015]   

end 
    
if not exists  
(      
   select *      
   from   sys.master_files tblMF      
   where  tblMF.[database_id] = db_id('WideWorldImportersDW')      
   and    tblMF.name = 'fg_Partition_Year_2015__01'  
)  
begin
   
   ALTER DATABASE [WideWorldImportersDW]  
   ADD FILE   
   (      
         NAME = [fg_Partition_Year_2015__01]   
       , FILENAME = 'Z:\Microsoft\SQLServer\DataFiles\WideWorldImportersDW_fg_Partition_Year_2015_file_001.ndf'  
  )  
  TO FILEGROUP [fg_Partition_Year_2015]  

end 


-- Year 2016
if not exists  
(      
    select *      
    from   sys.filegroups tblSFG      
    where  tblSFG.name = 'fg_Partition_Year_2016'  
)  
begin

   ALTER DATABASE [WideWorldImportersDW]         
      ADD FILEGROUP  [fg_Partition_Year_2016]   

end 
    
if not exists  
(      
   select *      
   from   sys.master_files tblMF      
   where  tblMF.[database_id] = db_id('WideWorldImportersDW')      
   and    tblMF.name = 'fg_Partition_Year_2016__01'  
)  
begin
   
   ALTER DATABASE [WideWorldImportersDW]  
   ADD FILE   
   (      
         NAME = [fg_Partition_Year_2016__01]   
       , FILENAME = 'Z:\Microsoft\SQLServer\DataFiles\WideWorldImportersDW_fg_Partition_Year_2016_file_001.ndf'  
  )  
  TO FILEGROUP [fg_Partition_Year_2016]  

end 

-- Year 2017
if not exists  
(      
    select *      
    from   sys.filegroups tblSFG      
    where  tblSFG.name = 'fg_Partition_Year_2017'  
)  
begin

   ALTER DATABASE [WideWorldImportersDW]         
      ADD FILEGROUP  [fg_Partition_Year_2017]   

end 
    
if not exists  
(      
   select *      
   from   sys.master_files tblMF      
   where  tblMF.[database_id] = db_id('WideWorldImportersDW')      
   and    tblMF.name = 'fg_Partition_Year_2017__01'  
)  
begin
   
   ALTER DATABASE [WideWorldImportersDW]  
   ADD FILE   
   (      
         NAME = [fg_Partition_Year_2017__01]   
       , FILENAME = 'Z:\Microsoft\SQLServer\DataFiles\WideWorldImportersDW_fg_Partition_Year_2017_file_001.ndf'  
  )  
  TO FILEGROUP [fg_Partition_Year_2017]  

end 

Partition Function

Objective

  1. Create a new Partition Function
  2. Indicate Column’s data type
    • date
  3. Range Type
    • Left or Right
      • Right Values
  4. Values
    • null
      • When date is null
      • When null entries are accounted for, please be sure to use convert function
      • Else
        • Msg 7705, Level 16, State 1
        • Could not implicitly convert range values type specified at ordinal 1 to partition function parameter type

SQL

Code

 CREATE PARTITION FUNCTION [pfnYear]
 ( 
    date 
 )     
 AS RANGE RIGHT FOR VALUES    
 (       
	  convert ( date , null ) 
	, '2010-01-01'
	, '2011-01-01'
	, '2012-01-01'
	, '2013-01-01'
	, '2014-01-01'
	, '2015-01-01'
	, '2016-01-01'
	, '2017-01-01'
	, '2018-01-01'
	, '2019-01-01'
	, '2020-01-01'  
) 

Partition Scheme

SQL



CREATE PARTITION SCHEME [pschemeYear]   
AS PARTITION [pfnYear]    
TO  
(       
	  [fg_Partition_Year_Null]
	, [fg_Partition_Year_2010]
	, [fg_Partition_Year_2011]
	, [fg_Partition_Year_2012]
	, [fg_Partition_Year_2013]
	, [fg_Partition_Year_2014]
	, [fg_Partition_Year_2015]
	, [fg_Partition_Year_2016]
	, [fg_Partition_Year_2017]
	, [fg_Partition_Year_2018]
	, [fg_Partition_Year_2019]
	, [fg_Partition_Year_2020]
	, [fg_Partition_Year_Next]  
	
) 

Summary

To make it easier to come back and edit our documentation, we will address using the underlying Partitioning components in a secondary post

 

Denzil Ribeiro :- SQL Server – Column Store Index – Defrag

Background

Based on comments on numerous Blog Postings wanted to take a look at Denzil Riberio’s script for Identifying Column Store Indexes ripe for defragging.

BTW, those blog postings are Niko Neugebauer ( Link ) and Greg Low ( Link ).

 

Denzil Ribeiro

Denzil is MSFT and here is his blog post :

  1. SQL 2014 Clustered Columnstore index rebuild and maintenance considerations
    Published On :- 2015-July-8th
    Link

 

Code

Stored Procedure

dbo.sp_DenzilRibeiro_columnStoreIndexDefrag

 


use master
go


if object_id('[dbo].[sp_DenzilRibeiro_columnStoreIndexDefrag]') is null
begin
	
	exec('create procedure [dbo].[sp_DenzilRibeiro_columnStoreIndexDefrag] as ')

end
go

alter procedure [dbo].[sp_DenzilRibeiro_columnStoreIndexDefrag]
(
	  @debug bit = 0
	, @deletedRowsPercent int = 10
	, @rowgroupQualityNumberofRows int = 500000
	, @scriptOnly bit = 1
)
as

begin

	set nocount on;
	set XACT_ABORT on;

	/*
		The sample scripts are not supported under any Microsoft standard support program or service
		and are intented as a supplement to online documentation.The sample scripts are provided AS IS without warranty
		of any kind either expressed or implied. Microsoft further disclaims all implied warranties including,
		without limitation, any implied warranties of merchantability or of fitness for a particular purpose.
	*/

	/*
		Rebuild index statement is printed at partition level if

		a. RGQualityMeasure is not met for @PercentageRGQualityPassed Rowgroups

		   i) this is an arbitrary number, what we are saying is that if the average is above this number, don't bother rebuilding as we consider this number to be good quality rowgroups
		  ii) Second constraint is the Deleted rows, currently the default that is set am setting is 10% of the partition itself. If the partition is very large or small consider adjusting this

		c. In SQL 2014, post index rebuild,the dmv doesn't show why the RG is trimmed to < 1 million in this case in SQL 2014.
			i)  If the Dictionary is full ( 16MB) then no use in rebuilding this rowgroup as even after rebuild it may get trimmed
			ii) If dictionary is full only rebuild if deleted rows falls above the threshold

	*/

	if object_id('tempdb..#temp') IS NOT NULL
	begin

		drop table #temp

	end

	--Declare @DeletedRowsPercent Decimal(5,2)

	-- Debug = 1 if you need all rowgroup information regardless
	--Declare @Debug int =0

	--Percent of deleted rows for the partition
	--Set @DeletedRowsPercent = 10

	--RGQuality means we are saying anything over 500K compressed is good row group quality, anything less need to re-evaluate.
	--Declare @RGQuality int = 500000

	-- means 50% of rowgroups are < @RGQUality from the rows/rowgroup perspective
	--Declare @PercentageRGQualityPassed smallint = 20
	Declare @PercentageRGQualityPassed smallint --= 20

	--— Maxdop Hint optionally added to ensure we don't spread small amount of rows accross many threads
	--  IF we do that, we may end up with smaller rowgroups anyways.
	declare @maxdophint			smallint
	declare @effectivedop		smallint

	declare @iNumberofRecords	int
	declare @iRecordumber		int
	declare @command			nvarchar(4000)

	set @PercentageRGQualityPassed = 20


	;WITH CSAnalysis
	( 
		  [object_id]
		, TableName
		, SchemaName
		, index_id
		, indexName
		, partition_number
		, CountRGs
		, TotalRows
		, AvgRowsPerRG
		, CountRGLessThanQualityMeasure
		, RGQualityMeasure
		, PercentageRGLessThanQualityMeasure
		, DeletedRowsPercent
		, NumRowgroupsWithDeletedRows
	)
	AS
	(
		SELECT 
			  rg.object_id
			, object_name(rg.object_id) as TableName
			, SCHEMA_NAME(t.schema_id) AS SchemaName
			, rg.index_id
			, indexName = tblSI.[name]
			, rg.partition_number
			, COUNT(*) as CountRGs
			, SUM(total_rows) as TotalRows
			, AVG(total_rows) as AvgRowsPerRG
			, CountRGLessThanQualityMeasure
				= SUM(
						CASE 
							--WHEN rg.Total_Rows <@RGQuality THEN 1 
							WHEN rg.total_rows < @rowgroupQualityNumberofRows then 1
							ELSE 0 END
					) --as CountRGLessThanQualityMeasure

			--, @RGQuality as RGQualityMeasure
			, RGQualityMeasure
				= @rowgroupQualityNumberofRows

			, PercentageRGLessThanQualityMeasure
				= cast
				(
					(
						SUM
							(
								CASE 

									--WHEN rg.Total_Rows <@RGQuality THEN 1.0 ELSE 0 
									WHEN rg.Total_Rows <@rowgroupQualityNumberofRows THEN 1.0 ELSE 0 END ) /count(*) *100 ) as Decimal(5,2) ) -- PercentageRGLessThanQualityMeasure , [DeletedRowsPercent] = Sum ( rg.deleted_rows * 1.0 ) /sum (rg.total_rows *1.0) *100 --as 'DeletedRowsPercent' , [NumRowgroupsWithDeletedRows] = sum ( case when rg.deleted_rows >0 then 1 
						else 0 
					end 
				) --as 'NumRowgroupsWithDeletedRows'

		FROM sys.column_store_row_groups rg

		JOIN sys.tables t
			ON rg.object_id = t.object_id

		JOIN sys.indexes tblSI
			ON rg.object_id = tblSI.object_id
			and rg.index_id = tblSI.index_id

		where rg.state = 3

		group by 
			  rg.object_id
			, t.schema_id
			, rg.index_id
			, tblSI.[name]
			, rg.partition_number

	)
	, CSDictionaries 
	(
		  maxdictionarysize		  -- int
		, maxdictionaryentrycount -- int
		, maxpartition_number
		, [object_id]			  -- int
		, [index_id]
		, partition_number		  -- int
	)
	AS
	( 
		select 
				  max(dict.on_disk_size) as maxdictionarysize
				, max(dict.entry_count) as maxdictionaryentrycount
				, max(partition_number) as maxpartition_number
				, part.object_id
				, part.index_id
				, part.partition_number

		from sys.column_store_dictionaries dict

		join sys.partitions part 
			on dict.hobt_id = part.hobt_id

		group by 
			  part.object_id
			, part.index_id
			, part.partition_number
	)
	select 
			  [rowNumber] = rank()
								over
								(
									order by
									   a.SchemaName
									 , a.TableName
									 , a.index_id
									 , a.partition_number
								)
			, a.*
			, b.maxdictionarysize
			, b.maxdictionaryentrycount
			, maxpartition_number
			, [command] = cast(null as nvarchar(4000))
			, [identified] = cast(0	as bit)

	into #temp 

	from CSAnalysis a

	inner join CSDictionaries b
			on  a.object_id = b.object_id 
			and a.index_id = b.index_id
			and a.partition_number = b.partition_number


	--— True if running from the same context that will run the rebuild index.
	select @effectivedop=effective_max_dop 
	from sys.dm_resource_governor_workload_groups
	where group_id in 
			(	
				select group_id 
				from sys.dm_exec_requests 
				where session_id = @@spid
			)


	--— Get the Alter Index Statements.
	update a
	set    a.[command]
			= 'ALTER INDEX '
			 + QuoteName(a.IndexName) 
			 + ' ON ' + QUOTENAME(a.SchemaName) 
			 + '.' + QuoteName(a.TableName) 
			 + ' REBUILD ' 
			 +  Case
					when maxpartition_number = 1 THEN ' '
					else ' PARTITION = ' + cast(partition_number as varchar(10))
				End

			+ ' WITH (MAXDOP ='  
			+ cast(
					(
						Case  

							WHEN (TotalRows*1.0/1048576) < 1.0 
								THEN 1 

							WHEN (TotalRows*1.0/1048576) < @effectivedop THEN FLOOR(TotalRows*1.0/1048576) ELSE 0 END ) as varchar(10)) + ')' --as Command + ' -- comment ' + ' deletedRow% is ' + cast(DeletedRowsPercent as varchar(30)) + ' rowgroupQualityNumberofRows is ' + cast(AvgRowsPerRG as varchar(30)) , a.[identified] = 1 from #temp a where ( ( DeletedRowsPercent >= @DeletedRowsPercent)

			-- Rowgroup Quality trigger, percentage less than rowgroup quality as long as dictionary is not full
			OR 
				( 

					( 
						( 
								--( AvgRowsPerRG < @RGQuality )
								( AvgRowsPerRG < @rowgroupQualityNumberofRows) --and ( TotalRows > @RGQuality) 
							and ( TotalRows > @rowgroupQualityNumberofRows) 
							 
						)
						AND 
						(

							(
								PercentageRGLessThanQualityMeasure 
									>= @PercentageRGQualityPassed
							)

						)
					)

					---- DictionaryNotFull, lower threshold than 16MB.
					AND maxdictionarysize < ( 16*1000*1000)

				) 

		)

	-- Add logic to actually run those statements

	set @iRecordumber = 1
	set @iNumberofRecords = ( select max([rowNumber]) from #temp)

	while (@iRecordumber<= @iNumberofRecords) begin select @command = [command] from #temp where [rowNumber] = 1 and [identified] = 1 if (@command is not null) begin print @command if (@scriptOnly = 0) begin exec(@command) end end set @iRecordumber = @iRecordumber + 1 end -- Debug print if needed if @debug=1 begin Select * from #temp order by TableName , index_id , partition_number end else begin -- Deleted rows trigger Select * from #temp where ( DeletedRowsPercent >= @DeletedRowsPercent)

	end
end
go

exec sys.sp_MS_marksystemobject '[dbo].[sp_DenzilRibeiro_columnStoreIndexDefrag]'
go

 

Lab


use [WideWorldImportersDW]
go

declare  @debug bit
declare  @deletedRowsPercent int
declare  @rowgroupQualityNumberofRows int


set @debug  = 1
set @deletedRowsPercent = 10
set @rowgroupQualityNumberofRows = 500000

exec [dbo].[sp_DenzilRibeiro_columnStoreIndexDefrag]

   	  @debug = @debug
	, @deletedRowsPercent = @deletedRowsPercent
	, @rowgroupQualityNumberofRows = @rowgroupQualityNumberofRows


Output

Output #1

Source Code Control

Github

Placed in Github here

 

Summary

Nothing new here.

The whole point is to avail on Github so that it will be easier to digest and learn from.

 

References

  1. Denzil Ribeiro
    • SQL 2014 Clustered Columnstore index rebuild and maintenance considerations
      Link
  2. Niko Neugebauer
    • Clustered Columnstore Indexes – part 36 (“Maintenance Solutions for Columnstore”)
      Link
  3. Greg Low
    • Rebuild clustered columnstore indexes when they require maintenance
      Link

Sql Server – ColumnStore Indexes – Metadata – Encoding Type & Sizes

Background

Another post on metadata surrounding Column Store Indexes.

Lineage

  1. Sql Server – ColumnStore Indexes – Metadata – Stats & Fragmentation
    Published On :- 2017-Sept-2nd
    Link

 

Encoding

Tabulate

ID Type Description Data Type
1 VALUE_BASED non-string/binary with no dictionary (very similar to 4 with some internal variations) Non String
2 VALUE_HASH_BASED non-string/binary column with common values in dictionary Non String
3 STRING_HASH_BASED string/binary column with common values in dictionary String/Binary
4 STORE_BY_VALUE_BASED non-string/binary with no dictionary Non String/Binary
5 STRING_STORE_BY_VALUE_BASED string/binary with no dictionary String/Binary

 

Explanation

  1. Data types
    • Basically, we have two types String/Binary or non-string/non-binary
  2. Storage
    • Value
    • Hash Value

Metadata

Overview

  1. Column Store – Segment
    • sys.column_store_segments
      • Each Column is kept in each own segment.
      • And, encoded on scheme based on its data types and data richness
  2. Column Store – Row Groups
    • sys.column_store_row_groups
      • Column Indexes are vertically packaged into row groups
      • Each Row Group contains about a million records

 

Code

Column Segment

sys.column_store_segments

SQL

; with cteEncodingType
(
	  [encodingTypeID]
	, [encodingType]
	, [encodingDescription]
)
as
(

	--Type of encoding used for that segment:
	select 
			  [id] = 1
			, [encodingType] = 'VALUE_BASED'
			, [encodingDescription] = 'non-string/binary with no dictionary (very similar to 4 with some internal variations)'
	
	union
	
	select 
			  2
			, 'VALUE_HASH_BASED'
			, 'non-string/binary column with common values in dictionary'
	
	union

	select 
			  3
			, 'STRING_HASH_BASED'
			, 'string/binary column with common values in dictionary'
	
	union
	
	select 
			  4
			, 'STORE_BY_VALUE_BASED'
			, 'non-string/binary with no dictionary'
	union

	select 
			  5
			, 'STRING_STORE_BY_VALUE_BASED'
			, 'string/binary with no dictionary'

)
select 
		[object]
			=   object_schema_name(tblSP.object_id)
				+ '.'
				+ object_name(tblSP.object_id)

		, tblCSRG.partition_number

		, [totalRows]
			= tblCSRG.total_rows

		, [deletedRows]
			= tblCSRG.deleted_rows

		, [sizeInBytes]
			= tblCSRG.size_in_bytes

		, [sizeInKB]
			= tblCSRG.size_in_bytes/1024

from   sys.column_store_row_groups tblCSRG

inner join sys.partitions tblSP

	on	tblCSRG.[object_id] = tblSP.[object_id]
	and tblCSRG.[partition_number] = tblSP.[partition_number]



 

Image

 

Explanation
  1. Based on Column Type ( data type ) and data variance, one of 5 encoding types is chosen

 

Column Segment

sys.column_store_segments

SQL

use [WideWorldImportersDW]
go

with cteRowGroup
(
		  [object]
		, [indexName]
		, [type]
		, [partitionNumber]
		, [rowGroupID]
		, [rowGroupIDMax]
		, [stateDescription]
		, [totalRows]
		, [deletedRows]
		, [sizeInBytes]
		, [sizeInKB]

)
as
(
	select 
		[object]
			=   object_schema_name(tblCSRG.object_id)
				+ '.'
				+ object_name(tblCSRG.object_id)

		, [indexName]
			= tblSI.[name]
		
		, [type]
			= tblSI.[type_desc]
						
		, [partitionNumber]
			= tblCSRG.partition_number

		, [rowGroupID]
			= tblCSRG.row_group_id

		, [rowGroupIDMax]
			= max ( tblCSRG.row_group_id )
				over(
					partition by
							  tblCSRG.object_id
							, tblCSRG.index_id
				  )

		, [stateDescription]
			= tblCSRG.state_description

		, [totalRows]
			= tblCSRG.total_rows

		, [deletedRows]
			= tblCSRG.deleted_rows

		, [sizeInBytes]
			= tblCSRG.size_in_bytes

		, [sizeInKB]
			= tblCSRG.size_in_bytes/1024

from   sys.column_store_row_groups tblCSRG


inner join sys.partitions tblSP

	on	tblCSRG.[object_id] = tblSP.[object_id]
	and tblCSRG.[index_id] = tblSP.[index_id]
	and tblCSRG.[partition_number] = tblSP.[partition_number]

inner join sys.indexes tblSI
	on  tblCSRG.object_id = tblSI.object_id
	and tblCSRG.index_id = tblSI.index_id

)
select 
		  [object]
		, [indexName]
		, [type]
		, [partitionNumber]
		, [rowGroupID]
		, [rowGroupCurrent]
			 = case 
					when ([rowGroupID] = rowGroupIDMax) then 'Yes'
					else 'No'
				end
		, [stateDescription]
		, [totalRows]
		, [deletedRows]
		, [sizeInBytes]
		, [sizeInKB]


from   cteRowGroup cteRG

order by
	  [object]
	, [indexName]
	, [partitionNumber]
	, [rowGroupID]

Image

 

Explanation
  1. Grouping
    • Column Indexes are grouped into Row Groups of about a million records each ( 1024 * 1024 = 1048576)
    • As a  million records are packaged into a Row Group it is filed away and a new Row Group started
    • The current Group will be the Max (Row Group ID) for that partition ( Object ID / Index ID / Partition ID )

Summary

We can see that depending on our data type and the variance of the data set, we will have access to specific sets of encoding.

Also, Columnar Indexes enjoy both the benefits of Column and Vertical partitioning.

 

References

  1. Microsoft
    • Technet
      • Database Engine Features and Tasks > Database Features > Indexes (Database Engine)
        • Column Store Indexes
          Link
    • Docs / SQL / Relational databases / System catalog views/
      • sys.column_store_row_groups (Transact-SQL)
        Link
      • sys.column_store_segments (Transact-SQL)
        Link
  2. Patrick Keisler
    • Everyday SQL
      • Monitor the Number of Deleted Rows in a Clustered Columnstore Index
        Link
  3. Greg Low
    • Greg Low (The Bit Bucket: IDisposable)
      • Rebuild clustered columnstore indexes when they require maintenance
        Link
  4. SqlTimes
    • Playing with Database Servers
      • What is a hobt-id?
        Link