SQL Server :- sys.dm_db_stats_histogram

Background

As always pressed to grasp something that my very little mind has not been able to understand.

And, that is how to read Statistics Histogram.

SQL

Thankfully Microsoft is always furnishing us with new pathways towards understanding.

sys.dm_db_stats_histogram

Today that way for me is via sys.dm_db_stats_histogram.

The DMV was introduced in v2016 SP1.

Syntax

It takes the ObjectID and Statistics ID and assumes that you are in the contextual database.


select *

from  sys.dm_db_stats_histogram
(
       @objectID
     , @statisticsID
)

Returned Record Set

We will make heavy use of the word range and define it as the preceding row’s range_high_key and our current row’s range_high_key.

Column Meaning Datatype
step_number Step Number from 1 to Maximum of 200 smallint
range_high_key Highest for specific step sql_variant ( Depends on data-type of first stats column)
range_rows Number of records in range real
equal_rows Number of rows that exactly match range_high_key real
distinct_range_rows Number of distinct records in range bigint
average_range_rows Number of records in range / Number of distinct records in range real

Usage


use [DBLab]
go

declare @table sysname
declare @objectID int
declare @index sysname
declare @indexID smallint

set @table = '[dbo].[sales]'
set @index = 'INDX_SalesDate'

set @objectID = object_id(@table)

select

      @indexID = index_id

from  sys.indexes tblSI

where  tblSI.object_id = @objectID

and    tblSI.[name] = @index

DBCC SHOW_STATISTICS
(
       @table
     , @index
)
with histogram

;

select
         [object]
         =
             quotename(tblSS.[name])
           + '.'
           + quotename(tblSO.[name])

       , [stat]
          = tblSS2.[name]

       , tblSH.step_number
       , tblSH.range_high_key
       , tblSH.range_rows
       , tblSH.equal_rows
       , tblSH.distinct_range_rows
       , tblSH.average_range_rows

from   sys.dm_db_stats_histogram
       (
           @objectID
         , @indexID
       ) tblSH

inner join sys.objects tblSO
   on tblSH.object_id = tblSO.object_id

inner join sys.schemas tblSS
   on tblSO.schema_id = tblSS.schema_id

inner join sys.stats tblSS2
   on  tblSH.object_id = tblSS2.object_id
   and tblSH.stats_id = tblSS2.stats_id

 

Output

sysDOTdm_db_stats_histogram_20180530_0642PM

Workshop

Let us pick up on the second row and see if we can understand what each column means.

Image

sysDOTdm_db_stats_histogram_20180530_0647PM

Explanation

  1. Step Number
    • Definition
      • There are at most 200 steps in a SQL Server Statistics Histogram
    • Step Number :- 1
      • Low Bar
    • Step Number :- 200 or last row
      • High Bar
  2. Range High Key
    • Definition
      • This is the maximum value of the current row’s range
    • Step Number :- 2
      • 1900-01-01 16:17:00.000
  3. Range Rows
    • Definition
      • This is the number of rows between the previous row’s max value and the current row’s max value
    • Step Number :- 2
    • 1016
  4. Equal Rows
    • Definition
      • This is the number of rows that exactly match this row
    • Step Number :- 2
    • 5
  5.  Distinct Range Rows
    • Definition
      • This is the number of rows that have distinct values for the current range
    • Step Number :- 2
    • 625
  6. Average Range Rows
    • Definition
      • Number of rows / number of rows that have distinct values for the current range
    • Step Number :- 2
    • 1.6256

Collaborative Query

We will instruct a full update statistics scan against our test table and issue a query against the first column of our statistics.

We will then compare the result set against what is listed in our histogram.

Query


set nocount on;
go

declare @dateRangeBegin datetime
declare @dateRangeEnd   datetime

set @dateRangeBegin = '1900-01-01 00:00:00.000'
set @dateRangeEnd = '1900-01-01 16:17:00.000'

select
         [numberofRows]
            = count(*)

       , [numberofEqualRows]
        = sum(
                case
                    when clock_time = @dateRangeEnd then 1
                    else 0
                    end
             )

       , [numberofDistinctRows]
            = count(distinct [clock_time])

       , [averageDistinctRows]
            = cast
                (
                    (
                          count(*) *1.00
                        / NULLIF
			(
			    count(distinct clock_time) * 1.00
			   , 0
			)
                    )
                    as decimal(20, 6)
                )

from   [dbo].[sales] tblI

where  tblI.clock_time
            between @dateRangeBegin and @dateRangeEnd

Output

sysDOTdm_db_stats_histogram_20180530_0704PM

Explanation

  1. Number of Rows
    select count(*) against table
    
  2. Number of Equal Rows
    sum
        (
           case
              when clock_time = @dateRangeEnd then 1
              else 0
             end
        )
    
  3. Number of Distinct Rows
    count(distinct [clock_time])
    
  4. Average Range Rows
    cast
    (
    	(
    		  count(*) *1.00
    		/ count(distinct clock_time) * 1.00
    	)
    	as decimal(20, 6)
    )
    

References

  1. Docs > SQL >Relational databases > System dynamic management views
    • sys.dm_db_stats_histogram (Transact-SQL)
      Link

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s