Sql Server – ColumnStore Indexes – Metadata – Encoding Type & Sizes

Background

Another post on metadata surrounding Column Store Indexes.

Lineage

  1. Sql Server – ColumnStore Indexes – Metadata – Stats & Fragmentation
    Published On :- 2017-Sept-2nd
    Link

 

Encoding

Tabulate

ID Type Description Data Type
1 VALUE_BASED non-string/binary with no dictionary (very similar to 4 with some internal variations) Non String
2 VALUE_HASH_BASED non-string/binary column with common values in dictionary Non String
3 STRING_HASH_BASED string/binary column with common values in dictionary String/Binary
4 STORE_BY_VALUE_BASED non-string/binary with no dictionary Non String/Binary
5 STRING_STORE_BY_VALUE_BASED string/binary with no dictionary String/Binary

 

Explanation

  1. Data types
    • Basically, we have two types String/Binary or non-string/non-binary
  2. Storage
    • Value
    • Hash Value

Metadata

Overview

  1. Column Store – Segment
    • sys.column_store_segments
      • Each Column is kept in each own segment.
      • And, encoded on scheme based on its data types and data richness
  2. Column Store – Row Groups
    • sys.column_store_row_groups
      • Column Indexes are vertically packaged into row groups
      • Each Row Group contains about a million records

 

Code

Column Segment

sys.column_store_segments

SQL

; with cteEncodingType
(
	  [encodingTypeID]
	, [encodingType]
	, [encodingDescription]
)
as
(

	--Type of encoding used for that segment:
	select 
			  [id] = 1
			, [encodingType] = 'VALUE_BASED'
			, [encodingDescription] = 'non-string/binary with no dictionary (very similar to 4 with some internal variations)'
	
	union
	
	select 
			  2
			, 'VALUE_HASH_BASED'
			, 'non-string/binary column with common values in dictionary'
	
	union

	select 
			  3
			, 'STRING_HASH_BASED'
			, 'string/binary column with common values in dictionary'
	
	union
	
	select 
			  4
			, 'STORE_BY_VALUE_BASED'
			, 'non-string/binary with no dictionary'
	union

	select 
			  5
			, 'STRING_STORE_BY_VALUE_BASED'
			, 'string/binary with no dictionary'

)
select 
		[object]
			=   object_schema_name(tblSP.object_id)
				+ '.'
				+ object_name(tblSP.object_id)

		, tblCSRG.partition_number

		, [totalRows]
			= tblCSRG.total_rows

		, [deletedRows]
			= tblCSRG.deleted_rows

		, [sizeInBytes]
			= tblCSRG.size_in_bytes

		, [sizeInKB]
			= tblCSRG.size_in_bytes/1024

from   sys.column_store_row_groups tblCSRG

inner join sys.partitions tblSP

	on	tblCSRG.[object_id] = tblSP.[object_id]
	and tblCSRG.[partition_number] = tblSP.[partition_number]



 

Image

 

Explanation
  1. Based on Column Type ( data type ) and data variance, one of 5 encoding types is chosen

 

Column Segment

sys.column_store_segments

SQL

use [WideWorldImportersDW]
go

with cteRowGroup
(
		  [object]
		, [indexName]
		, [type]
		, [partitionNumber]
		, [rowGroupID]
		, [rowGroupIDMax]
		, [stateDescription]
		, [totalRows]
		, [deletedRows]
		, [sizeInBytes]
		, [sizeInKB]

)
as
(
	select 
		[object]
			=   object_schema_name(tblCSRG.object_id)
				+ '.'
				+ object_name(tblCSRG.object_id)

		, [indexName]
			= tblSI.[name]
		
		, [type]
			= tblSI.[type_desc]
						
		, [partitionNumber]
			= tblCSRG.partition_number

		, [rowGroupID]
			= tblCSRG.row_group_id

		, [rowGroupIDMax]
			= max ( tblCSRG.row_group_id )
				over(
					partition by
							  tblCSRG.object_id
							, tblCSRG.index_id
				  )

		, [stateDescription]
			= tblCSRG.state_description

		, [totalRows]
			= tblCSRG.total_rows

		, [deletedRows]
			= tblCSRG.deleted_rows

		, [sizeInBytes]
			= tblCSRG.size_in_bytes

		, [sizeInKB]
			= tblCSRG.size_in_bytes/1024

from   sys.column_store_row_groups tblCSRG


inner join sys.partitions tblSP

	on	tblCSRG.[object_id] = tblSP.[object_id]
	and tblCSRG.[index_id] = tblSP.[index_id]
	and tblCSRG.[partition_number] = tblSP.[partition_number]

inner join sys.indexes tblSI
	on  tblCSRG.object_id = tblSI.object_id
	and tblCSRG.index_id = tblSI.index_id

)
select 
		  [object]
		, [indexName]
		, [type]
		, [partitionNumber]
		, [rowGroupID]
		, [rowGroupCurrent]
			 = case 
					when ([rowGroupID] = rowGroupIDMax) then 'Yes'
					else 'No'
				end
		, [stateDescription]
		, [totalRows]
		, [deletedRows]
		, [sizeInBytes]
		, [sizeInKB]


from   cteRowGroup cteRG

order by
	  [object]
	, [indexName]
	, [partitionNumber]
	, [rowGroupID]

Image

 

Explanation
  1. Grouping
    • Column Indexes are grouped into Row Groups of about a million records each ( 1024 * 1024 = 1048576)
    • As a  million records are packaged into a Row Group it is filed away and a new Row Group started
    • The current Group will be the Max (Row Group ID) for that partition ( Object ID / Index ID / Partition ID )

Summary

We can see that depending on our data type and the variance of the data set, we will have access to specific sets of encoding.

Also, Columnar Indexes enjoy both the benefits of Column and Vertical partitioning.

 

References

  1. Microsoft
    • Technet
      • Database Engine Features and Tasks > Database Features > Indexes (Database Engine)
        • Column Store Indexes
          Link
    • Docs / SQL / Relational databases / System catalog views/
      • sys.column_store_row_groups (Transact-SQL)
        Link
      • sys.column_store_segments (Transact-SQL)
        Link
  2. Patrick Keisler
    • Everyday SQL
      • Monitor the Number of Deleted Rows in a Clustered Columnstore Index
        Link
  3. Greg Low
    • Greg Low (The Bit Bucket: IDisposable)
      • Rebuild clustered columnstore indexes when they require maintenance
        Link
  4. SqlTimes
    • Playing with Database Servers
      • What is a hobt-id?
        Link

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s