DBeaver

Background

Here I am having used SQL Server Management Studio ( SSMS ) for so long, I am satisfied.

It’s GUI is versely superior to SQL 2000 Query Analyzer and it exposes just about every SQL Functionality that I use on daily basis.

 

Stumped

Spent hours trying to capture a query’s grid display unto Excel, but Excel column size was getting in the way.

And, so googled for ways to display columns as rows.

 

DBeaver

Googled and found DBeaver.

The nice thing about it is that I can export queries result set into XML or HTML.

 

Download

DBeaver is available here.

 

Export Query

Screen Shot

Data Transfer Target Type and Format

Image

Data Transfer – Extraction Settings

Image

Data Transfer – Settings – See export settings

Image

 

Data Transfer – Output – Configure export output parameters

Image

Data Transfer – Confirm – Check results

Image

 

Sample Files

HTML

XML

 

Other Functionalities

Database Diagram

Database Modeling is also easy and nice.

 

Table Structure

Properties

References

DDL

 

Data

 

Database Supported

DBeaver supports both SQL and NoSQL Databases.

For SQL databases, it relies on JDBC.

Driver Manager

Transact SQL – Identify In-Memory Objects

Introduction

Let us identify In Memory Objects in our SQL Server Instance.

 

User Tables – sys.tables

Overview

In the sample code below, we identify In Memory Objects by checking the is_memory_optimized column in the sys.tables DMV.

Code



select
		 [table]
			= quotename(tblSS.[name])
				+ '.'
				+ quotename(tblST.[name])

from   sys.tables tblST

inner join sys.schemas tblSS

		on tblST.[schema_id] = tblSS.[schema_id]

where  tblST.is_memory_optimized = 1

order by
		[table] asc


Output

identifyinmemorytables-systables

User Tables & Indexes – Based on FileGroups

Overview

In the sample code below, we identify In Memory Objects by seeking out objects that are located on In-Memory Filegroups.

Code



select 

		  [object]
			= tblSS.[name]
				+ '.'
				+ tblSO.[name]

		, [objectType]
			= tblSO.[type_desc]


		, [indexName]
			= tblSI.[name]


		, [isPrimaryKey]
			= case
				when tblSI.[is_primary_key] = 1 then 'Yes'
				else 'No'
			  end

		, [indexType]
			= tblSI.[type_desc]
	
		, [fileGroup]
			= tblSFG.[name]

		, [fileGroupType]
			= tblSFG.[type_desc]

from   sys.objects tblSO

INNER JOIN sys.schemas tblSS

	ON tblSO.schema_id = tblSS.schema_id

inner join sys.indexes tblSI

	on tblSO.object_id = tblSI.object_id

inner join sys.partitions tblSP

	on  tblSI.object_id = tblSP.object_id
	and tblSI.index_id = tblSP.index_id

		
INNER JOIN sys.allocation_units tblSAU

	on tblSAU.container_id = tblSP.hobt_id
		 
INNER JOIN sys.filegroups tblSFG
 
	ON tblSFG.data_space_id = tblSAU.data_space_id 

where tblSFG.[type] = 'FX'



Output

identifyinmemoryobjects

 

 

User Tables

DML Stats

Overview

Review Statistics of DML Operations against In-Memory Tables.

Code


SELECT
	  [object]
		= quoteName(tblS.[name])
		   + '.'
		   + quoteName(tblO.[name])

        , tblXTPOS.*

FROM sys.dm_db_xtp_object_stats tblXTPOS

INNER JOIN sys.objects tblO

	ON tblXTPOS.object_id = tblO.object_id

INNER JOIN sys.schemas tblS

	ON tblO.schema_id = tblS.schema_id


 

Output

dmlstats

 

Index Stats

Overview

Review Statistics of DML Operations against In-Memory Tables.

Code





SELECT
        [object]
        = quoteName(tblS.[name])
           + '.'
           + quoteName(tblO.[name])
 
		, [index]
			= tblSI.[name]

		, [indexType]
			= tblSI.[type_desc]

		, [isPrimaryKey]
			= case
				when ( tblSI.[is_primary_key] = 1 ) then 'Y'
				else 'N'
			  end

        , [scansStarted]
			= tblXTPIS.scans_started	

		, [scansRetried]
			= tblXTPIS.scans_retries
			
		, [rowsReturned]
			= tblXTPIS.rows_returned

		, [rowsTouched]
			= tblXTPIS.rows_touched	

		/*

		--- Expiring -----
		, tblXTPIS.rows_expiring
		, tblXTPIS.rows_expired
		, tblXTPIS.rows_expired_removed	
		--- Expiring -----

		-- Phantom ----
		, tblXTPIS.phantom_scans_started	
		, tblXTPIS.phantom_scans_retries	
		, tblXTPIS.phantom_rows_touched	
		, tblXTPIS.phantom_expiring_rows_encountered	
		--, phantom_expired_rows_encountered	
		, tblXTPIS.phantom_expired_removed_rows_encountered	
		, tblXTPIS.phantom_expired_rows_removed
		-- Phantom ----

		*/

FROM sys.dm_db_xtp_index_stats tblXTPIS
 
INNER JOIN sys.objects tblO
 
    ON tblXTPIS.object_id = tblO.object_id

INNER JOIN sys.indexes tblSI
 
    ON  tblXTPIS.object_id = tblSI.object_id 
    AND tblXTPIS.index_id = tblSI.index_id 

INNER JOIN sys.schemas tblS
 
    ON tblO.schema_id = tblS.schema_id

order by
		  tblS.[name]
		, tblO.[name]
		, tblSI.[name]

Output

imtabledmlindexstats

Explanation:

  1. In-Memory tables does not support Clustered Indexes as all data is stored in memory
    • If Clustered,then memory will have to be continuously shuffled to ensure proper sequence
  2. It makes sense to review ScanStarted
    • Doing so is informative in terms of which indexes are actually being used
    • And, to gauge popularity
  3. The RowsReturned Column
    • Reviewing the RowsReturned column is also important to follow trend in terms of access method

Memory Usage Stats

Memory Usage Stats – Object

Overview

Review Memory allocated and in-use by In-Memory Tables.

Code


SELECT

	[object]
		= quotename(tblSS.name)
			+'.'
			+ quotename(tblSO.name)

    , [allocatedMB]
		= SUM(tblXTPMC.[allocated_bytes]) / (1024* 1024)

	, [usedMB]
		= SUM(tblXTPMC.[used_bytes]) / (1024 * 1024)

FROM  sys.objects tblSO

INNER JOIN sys.schemas tblSS
 
	ON tblSO.schema_id = tblSS.schema_id 

INNER JOIN sys.indexes tblSI
 
	ON tblSO.object_id = tblSI.object_id 

INNER JOIN sys.dm_db_xtp_memory_consumers tblXTPMC 

	ON  tblSI.object_id = tblXTPMC.object_id
	AND tblSI.index_id  = tblXTPMC.index_id

GROUP BY 
		  quotename(tblSS.name)
		, tblSO.schema_id
		, tblSO.object_id
		, tblSO.[name]
		
ORDER BY 
		  quotename(tblSS.name)
		, tblSO.[name]
;


Output

memoryallocatedandinusedbyobject

Memory Usage Stats – Index

Overview

Review Memory allocated and in-use by In-Memory Table Indexes.

Code


SELECT

	[object]
		= quotename(tblSS.name)
			+'.'
			+ quotename(tblSO.name)

	, [index]
		= tblSI.name

	, [indexType]
		= tblSI.[type_desc]

    , [isPrimaryKey]
        = case
            when tblSI.[is_primary_key] = 1 then 'Yes'
            else 'No'
            end

    , [allocatedMB]
		= SUM(tblXTPMC.[allocated_bytes]) / (1024* 1024)

	, [usedMB]
		= SUM(tblXTPMC.[used_bytes]) / (1024 * 1024)

FROM  sys.objects tblSO

INNER JOIN sys.schemas tblSS
 
	ON tblSO.schema_id = tblSS.schema_id 

INNER JOIN sys.indexes tblSI
 
	ON tblSO.object_id = tblSI.object_id 

INNER JOIN sys.dm_db_xtp_memory_consumers tblXTPMC 

	ON  tblSI.object_id = tblXTPMC.object_id
	AND tblSI.index_id  = tblXTPMC.index_id

GROUP BY 
		  quotename(tblSS.name)
		, tblSO.schema_id
		, tblSO.object_id
		, tblSO.[name]
		, tblSI.name
		, tblSI.[type_desc]
		, tblSI.[is_primary_key]
		
ORDER BY 
		  quotename(tblSS.name)
		, tblSO.[name]
		, tblSI.name


Output

memoryallocatedandinusedbyobjectandindex

Programmable Objects – sys.modules

Overview

In the sample code below, we identify In Memory Objects by checking the is_memory_optimized column in the sys.tables DMV.

Code



/*

	sys.sql_modules

		execute_as_principal_id

			Value of -2 indicates that the batch submitted does not depend on implicit name resolution and can be shared among different users. 
			This is the preferred method. Any other value represents the user ID of the user submitting the query in the database.

*/

select 
		  [object]
			= quoteName(tblSS.[name])
				+ '.'
				+ quoteName(object_name(tblSSM.[object_id]))

		, [type]
			= tblSO.type_desc

		, [createDate]
			= tblSO.[create_date]

		, [isSchemaBound]
			= case
					when tblSSM.is_schema_bound = 1 then 'Y'
					else 'N'
			  end	

		, [principal]
			= case
			 
					when ( tblSSM.execute_as_principal_id is null ) 
						then tblSS.[name]
			  
					when  ( tblSSM.execute_as_principal_id = -2) 
						then '--Shared--'

					else user_name(tblSSM.execute_as_principal_id)

						  
			  end			 
		      
		, [definition]
			= tblSSM.[definition]

from  sys.objects tblSO

inner join sys.schemas tblSS

		on tblSO.[schema_id] = tblSS.[schema_id]

inner join sys.sql_modules tblSSM

		on tblSO.[object_id] = tblSSM.[object_id]

where tblSSM.[uses_native_compilation] = 1

Output

identifynativecompiledobjects

Loaded Modules – sys.dm_os_loaded_modules

Overview

In the sample code below, we identify In Memory Objects by reviewing loaded OS Modules.

The relevant DMVs are sys.dm_os_loaded_modules and sys.dm_os_virtual_address_dump.

 

Code



; with cteOSLM
(
	  [modulename]
	, [description]
    , [base_address]
	, [filenameFull]
	, [filename]
	, [filenameMassaged]
)
as
(

	SELECT
			  [name]

			, [description]

			, [base_address]

			, [filenameFull] = tblOSLM.[name]

			, [filename]
				= reverse(left(reverse(tblOSLM.[name]),
                    charindex('\',reverse(tblOSLM.[name]), 1) - 1))

			, [filenameMassaged]
				= replace(
							replace(
									 reverse(left(reverse(tblOSLM.[name]),
										charindex('\',reverse(tblOSLM.[name]), 1) - 1))
										, '_'
									, '.'
									)
							, '.dll'
							, ''
						)				

	
	FROM   sys.dm_os_loaded_modules tblOSLM 

	WHERE  tblOSLM.[description] = 'XTP Native DLL'

	
)

, cteOSLMObject
(
	  [modulename]
	, [description]
    , [base_address]
	, [filenameFull]
	, [filename]
	, [filenameMassaged]

	, [objectID] 
	, [databaseID] 
	, [objectType]

)
as
(

	SELECT 

		  	  [modulename]
			, [description]
			, [base_address]
			, [filenameFull]
			, [filename]
			, [filenameMassaged]


			, [objectID] 
				= PARSENAME([filenameMassaged], 1)

			, [databaseID] 
				= PARSENAME([filenameMassaged], 2)
			
			, [objectType]
				= case PARSENAME([filenameMassaged], 3)
						when 't' then 'Table'
						when 'p' then 'Procedure'
						when 'f' then 'Function'
						else PARSENAME([filenameMassaged], 3)
				  end 

	from   cteOSLM

)
, cteVirtualAddress
(
	  [region_allocation_base_address]
	, [regionSizeInBytes]
)
as
(
	select 
			  tblOSVAD.[region_allocation_base_address]
			, [regionSizeInBytes]
				= sum(tblOSVAD.region_size_in_bytes)
	
	from   sys.dm_os_virtual_address_dump tblOSVAD
	
	group by
	 
			tblOSVAD.[region_allocation_base_address]

)	 
SELECT 

		  tblOSLM.[description]

		, tblOSLM.[modulename]

		, tblOSLM.[filename]


		, [database]
			= case
				when tblOSLM.[databaseID] = 32767 then 'Resource DB'
				else db_name(tblOSLM.[databaseID])
			   end

		, [objectName]
			= quoteName
				(
				    object_schema_name
					(
					  tblOSLM.objectID
					, tblOSLM.databaseID
					)
				)
				+ '.'
				+ quoteName
				(
					object_name
					(
					   tblOSLM.objectID
					 , tblOSLM.databaseID
					)
				)

		, tblOSLM.[objectType]

		, [sizeInKB]
			= (tblOSVAD.[regionSizeInBytes])
				/ ( 1024 )

FROM   cteOSLMObject tblOSLM 

INNER JOIN cteVirtualAddress tblOSVAD

		on tblOSLM.[base_address] = tblOSVAD.[region_allocation_base_address]

order by

			  [database]
			, [objectName]


Output

osloadedmodules

 

Explanation

  1. We can see that we can the SQL Server Engine generates Database Object specific modules (dlls)
  2. The modules are aptly named once one know where to look
  3. And, they are very small in size

 

Summary

In summary, to identify In Memory Objects we have a couple of tracks we can take.

Those pathways includes:

  1. Checking Table – sys.tables
    • Column :- is_memory_optimized
  2. Checking objects that are sitting on Memory Optimized Filegroup by relying on the sys.filegroups.

We reviewed user usage by digging into the sys.dm_db_xtp_object_stats & sys.dm_db_xtp_index_stats.

We also touched on memory consumed by In Memory Objects and Indexes by exploring the sys.dm_db_xtp_memory_consumers view.

To identify natively compiled objects we narrow in on the uses_native_compilation column of the sys.sql_modules view

To dig a bit deeper and identify OS Modules that are produced once In-Memory tables are compiled, we look into sys.dm_os_loaded_modules and to size up them up we check the region_size_in_bytes column in the sys.dm_os_virtual_address_dump table.

 

Source Code Repository

GitHub

Blogs are not where to keep code, and so for easier and more polish consumption shipped out to GitHub.

Here is the Repository.

 

Dedicated

Again, the good thing about blogging is that one publicly acknowledges the source of Information.

Here is mine:

  1. SqlHints.com, Basavaraj Biradar
    • Working with In-Memory OLTP (a.k.a. Hekaton) enabled Databases, Memory Optimized Tables and Natively Compiled Stored Procedures and it’s Internals with extensive list of Examples – Sql Server 2014
      Link
  2.  Help: SQL Server, Balmukund
    • A-Z of In-Memory OLTP : Behind the scenes
      Link

 

References

Microsoft

    1. Dynamic Management Views and Objects
      • Transact-SQL Reference (Database Engine) > System Views (Transact-SQL) > Dynamic Management Views and Functions (Transact-SQL)
        Memory-Optimized Table Dynamic Management Views (Transact-SQL)
        Link

        • sys.dm_db_xtp_memory_consumers (Transact-SQL)
          Link
        • sys.dm_db_xtp_index_stats ( Transact SQL )
          Link
        • sys.dm_db_xtp_object_stats ( Transact-SQL )
          Link
        • sys.memory_optimized_tables_internal_attributes ( Transact-SQL )
          Link
    2. Developer Network
      • Database Features > In-Memory OLTP (In-Memory Optimization)  > Memory-Optimized Tables
        Native Compilation of Tables and Stored Procedures
        Link

MySQL :- Secondary Indexes and the Clustering Keys – Day 2

Preface

In our last post, we spoke about how one might not need to add the Clustering columns when defining a Secondary Index.

At that point, we touched on the fact that just reviewing the Query Plan via “Visual Explain” might not fully reveal whether the Index used is “covering” or whether the Clustered Index is also accessed.

 

Metadata

Indexes

Index – Key name & Columns

List the index names and the corresponding columns

Code


SHOW INDEX FROM dblab.errorLog;

Output

ShowIndex-20160620-1114AM

Explanation

  1. Clustered Index
    • PRIMARY
      • Column Names :- id
      • Cardinality :- 111209
  2. Non Clustered Indexes
    • INDX_DBA_RECORDCREATED
      • Column Names :- record_created
      • Cardinality :- 184

 

Index – Get Index Size

Get Index Size

Code


set @database := 'dblab';
set @table := 'errorlog';
set @convertToMB := 1024 * 1000;

select 
           database_name
         , table_name
         , index_name
         , stat_name
         , @@innodb_page_size as innodb_page_sizeInBytes
         , stat_value*@@innodb_page_size as IndexSizeInBytes
         , (stat_value*@@innodb_page_size) / (@convertToMB) as IndexSizeMB

from mysql.innodb_index_stats 

where stat_name = 'size'

and   database_name = @database

and   table_name = @table

;

Output

IndexStats-20160620-0111PM

Explanation

  1. Clustered Index
    • PRIMARY
      • Size :- 28.2240 MB
  2. Non Clustered Indexes
    • INDX_DBA_RECORDCREATED
      • Size :- 2.5760
  3. Note
    • Note that our table is INNODB, and as such
      • The default page size is 16384 bytes
      • Clustered ( PRIMARY )
        • The data is saved within the Clustered Index

 

Instrumentation

 

Visual Explain

Index Covered

IndexCovered

Index Not Covered

IndexNotCovered

 

Explanation

In the screenshots above, the Visual Explain:

  1. Shows the same exact singular operator for a covered and non-covering Index Scan
  2. The cost is different
    • Covered Index :- 1.41
    • Non Covering Index :- 2.41

 

 

Tabular Explain

Covering Index Exists

Guide

In the Query below, we are filtering on record_created and fetching same column ( record_created ) , and the Clustering Column ( id)

SQL


set @currentTime := Now();
set @currentDate := curdate();
	 
explain select  
            tblEL.record_created
          , tblEL.id
from    dblab.ErrorLog tblEL
where   tblEL.record_created 
         between @currentDate and @currentTime
;

 

Output

Explain-Tabular-20160620-1111AM

 

 

Non-Covering Index Exists

Guide

In the Query below, we are filtering on record_created and fetching same column ( record_created ) , the Clustering Column ( id), and an additional column ( userid) which is not part of the Indexed Columns, nor part of the Clustering keys.

SQL


set @currentTime := Now();
set @currentDate := curdate();
	 
explain select  
            tblEL.record_created
          , tblEL.id
          , tblEL.userid
from    dblab.ErrorLog tblEL
where   tblEL.record_created 
         between @currentDate and @currentTime
;

Output

Explain-Tabular-20160620-1107AM

 

Explanation

  1. The columns of the Index, INDX_DBA_RECORDCREATED, that we are using is record_created
  2. Here is what Explain returns as a Tabulated Output
  3.  Columns
    • Extra
      • In the first Query, we are covering and the Extra column indicates so by stating ‘Using where; Using index’
      • In the second Query, we are not covering as the Extra column reads ‘Using index condition

Summary

With the Explain command, both the graphical and the tabulated output have merits.

The Graphical has Costing information; while the tabulated furthers our understanding of whether the Index employed fully covers our need or whether additional work needs to be done upon processing the Index.

MySQL :- Secondary Indexes and the Clustering Keys

Preface

RDMS Database tables can either be stored as a Heap or Clustered. When stored as a Heap, data is appended as they come in.  When Clustered, data is sequenced based on the Clustering Columns.

Secondary Indexes

For Clustered tables, the Clustering data is written as an additional data on each record.  On the other hand for Heaps, the RID is recorded.

We will soon see that this is an important decision when selecting indexing columns for both Clustered and Secondary indexes.

 

Database

Let us consider the implication of storing the Clustering columns for Secondary Indexes.

MySQL

We will use the same table we used for our last post.

Here is what the table looks like.

Table Columns

TableColumns

Table Indexes

And, here are the Indexes.

TableIndexes

Explanation

We have two indexes:

  1. PRIMARY
    • Columns
      • id
  2. INDX_DBA_RecordCreated
    • Columns
      • record_created

 

Query

Let us issue a query against the table and issue explain to determine if an index is employed and any additional operators included.

SQL

SQL – Fetch on Secondary Index Columns and Clustered Index Columns

SQL Code


set @currentTime := Now();
set @currentDate := curdate();
	 
select  tblEL.record_created, id
from    dblab.ErrorLog tblEL
where   tblEL.record_created 
         between @currentDate and @currentTime
;

 

Output

Visual Explain

VisualQueryPlan-20160620-0351AM

SQL – Fetch on Secondary Index Columns, Clustered Index Column, and an additional column

SQL Code


set @currentTime := Now();
set @currentDate := curdate();
	 
select  
            tblEL.record_created
          , tblEL.id
          , tblEL.userid
from    dblab.ErrorLog tblEL
where   tblEL.record_created 
         between @currentDate and @currentTime
;

 

Output

Visual Explain

VisualQueryPlan

Index Range Scan

VisualQueryPlan_Clipped

Explanation

  1. Secondary Index and Clustering Columns
    • Fully satisfied with index on Secondary Index Columns
      • Operation
        • Index Range Scan
          • Index Name – INDX_DBA_RecordCreated
      • Secondary Index columns consulted for where clause
      • Secondary Index / Clustering Key consulted for projected clustering columns
      • Query Cost :- 1.41
  2. Secondary Index, Clustering Columns, and additional column(s)
    • Though, not all columns that need to be presented are available from the Index, the Index is still very useful for filtering, and it is used
    • Query Cost :- 2.41

 

Summary

Secondary Indexes do not need to include the Clustering Columns, as those columns are automatically hard-wired in.

When all filtering and projected columns are referenced in the Secondary Indexes, they can offer fast and sole access to the needed result.

On the other hand, when a query references other columns besides the columns that make up the Secondary Index and the Clustering Columns, it does not appear that the Access Path is fully realized through the Explain Guide.

But, if we pay close attention to Query Costs we are able to furtherance our comparison.

Technical: Microsoft – SQL Server – Analysis Services – MDX Query – Error – hierarchy already appears in the Axis1 axis

Technical: Microsoft – SQL Server – Analysis Services – MDX Query – Error – hierarchy already appears in the Axis1 axis

Introduction

Getting up to speed with SQL Server Analysis Service Cube Browsing and Reporting. And, finding that I need to get comfortable with MDX.

MDX is a query language for querying OLAPS.

Data Source View

DataSourceView


Fact Table Browser

What we are trying to do, that is filter on specific Country IDs and Names, is easy to do when using a query or reporting tool. Before Filtering Browser-Before-Filtering

Post Filtering Browser-After-Filtering

Query

But, when we resort to MDX and try to do same, we find out that we have to work a bit harder. In the next two examples, we attempt to filter by using the where clause.

Error Message :- The MDX function CURRENTMEMBER failed because current coordinate is empty.

In this scenario, we pass in a non-existence Country ID (47)


select

	{
		[Measures].[GDP Amount]

	} on COLUMNS

	, {

		[Country].[Country ID]

	  } on ROWS

from  [DBLAB]

where (

		{

			 [Country].[Country ID].&[47]

		}

	  )

Here is our error message:

Executing the query ...
The MDX function CURRENTMEMBER failed because current coordinate is empty.
Execution complete

Error Message – The Geography hierarchy already appears in the Axis1 axis.

In this scenario, we pass in an existing Country ID (4)


select

	{
		[Measures].[GDP Amount]

	} on COLUMNS

	, {

		[Country].[Country ID]

	  } on ROWS

from  [DBLAB]

where (

		{

			 [Country].[Country ID].&[4]

		}

     )

 

Error Message:

The Country ID hierarchy already appears in the Axis1 axis.

Resolution

There are a couple of ways to address our little problem.

Our proposed solution involves introducing our filtering at the Row sub-section.

To do so we can employ the exists or the Filter expressions.

Fix Problem by using exists keyword



select

	{
		[Measures].[GDP Amount]

	} on COLUMNS

	, EXISTS
		(
			  [Country].[Country ID].Members
			, {
			         [Country].[Country ID].&[3]			
			       , [Country].[Country ID].&[4]
                           }

	       ) on ROWS

from  [DBLAB]

Fix Problem by using “filter” keyword


select

	{
	    [Measures].[GDP Amount]

	} on COLUMNS

	, FILTER
		(
	 	    [Country].[Country ID].Members
			, (
				    ([Country].[Country ID].Member_value = 2)
				 or ([Country].[Country ID].Member_value = 3)
			  )	 

	    ) on ROWS

from  [DBLAB]

References

References – MDX Query

References – Restricting the Query with Query and Slicer Axes

References – Blogs

 

References – MDX Query – Q/A

Technical: Database – Dimension Modeling – Dimension – Time (Populating DimTime – SQL Server Implementation)

Technical: Database – Dimension Modeling – Dimension – Time (Populating DimTime – SQL Server Implementation)

Introduction

Hopefully, this is the first of many posts that tracks my foray into learning “Online analytical processing” (OLAP).

In this post, we will cover the Time Dimension.

Inspiration

As I read broadly about Data Mart, Data Warehouse, and Dimension Modeling the person that speaks most to me is Ralph Kimball.

The company he founded over shares on the Internet and leaves one with little reason not to read a bit.

Here is a beautifully written piece about the Time Dimension.

Time for Time

http://www.kimballgroup.com/1997/07/10/its-time-for-time/

Schema – Design

We are going to have one Dimension table called as dimTime.  And, two lookup lookup tables that we will use to populate our lone dimension table.

Schema – Design – Calendar – Weekday

dbo.lookupCalendarWeekday

Schema – Design – Calendar – Month

dbo.lookupCalendarMonth

Schema – Design – Calendar – Time

From Microsoft AdventureWorksDW database.

dbo.DimTime

Schema – SQL

Schema – SQL – Calendar – Weekday



set noexec off;
set nocount on;

use [DBLabDW]
go

if object_id('dbo.lookupCalendarWeekday') is not null
begin

	set noexec on;

end
go

create table dbo.lookupCalendarWeekday
(

      [id] tinyint not null

    , [dateAdded] datetime not null
		constraint [defaultlookupCalendarWeekdayDateAdded] default getdate()

    , [addedBy]   nvarchar(100) not null
		constraint [defaultlookupCalendarWeekdayAddedBy] default SYSTEM_USER

    , [dateModified] datetime null
    , [modifiedBy]   nvarchar(100) null

    , [weekday] tinyint not null
    , [sortOrder] tinyint not null 
		constraint [defaultlookupCalendarWeekdaySortOrder] default 0

    , [weekDayInEnglish] nvarchar(100) null
    , [weekDayInFrench] nvarchar(100) null
    , [weekDayInSpanish] nvarchar(100) null

)

ALTER TABLE dbo.lookupCalendarWeekday
    ADD CONSTRAINT [AK_lookupCalendarWeekday_weekday] UNIQUE NONCLUSTERED 
    (
	[weekday] ASC
    )

set noexec off;
go

Quick Points

Here are a couple of quick points:

  • We created a unique  constraint AK_lookupCalendarWeekday_weekday on the dbo.lookupCalendarWeekday table
  • Having this constraint will equip us with the uniqueness requirement that is needed for us to create a foreign key against this table (when we create the Time Dimension table)

Schema – SQL – Calendar – Month



set noexec off;
set nocount on;

use [DBLabDW]
go

if object_id('dbo.lookupCalendarMonth') is not null
begin

	set noexec on;

end
go

create table dbo.lookupCalendarMonth
(

      [id] tinyint not null

    , [dateAdded] datetime not null
		constraint [defaultlookupCalendarMonthDateAdded] default getdate()

    , [addedBy]   nvarchar(100) not null
		constraint [defaultlookupCalendarMonthAddedBy] default SYSTEM_USER

    , [dateModified] datetime null
    , [modifiedBy]   nvarchar(100) null

    , [sortOrder] tinyint not null 
		constraint [defaultlookupCalendarMonthSortOrder] default 0

    , [monthInEnglish] nvarchar(100) null
    , [monthInFrench] nvarchar(100) null
    , [monthInSpanish] nvarchar(100) null

)

ALTER TABLE [dbo].[lookupCalendarMonth]
	add constraint PK_LookupCalendarMonth
	    primary key
	(
		[id]
	)

set noexec off;
go

Schema – SQL – Calendar – dimTime

For the sake of consistency, knowing we are unlikely to do better, and since we do not currently have any need for customization, we will stick with the dimTime structural definition from Microsoft AdventureWorksDW database.



SET NOEXEC OFF
go

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

SET ANSI_PADDING ON
GO

use [DBLabDW]
go

if object_id('dbo.DimTime') is not null
begin

	set noexec on

end
go

CREATE TABLE [dbo].[DimTime]
(
	[TimeKey] [int] IDENTITY(1,1) NOT NULL,
	[FullDateAlternateKey] [datetime] NULL,
	[DayNumberOfWeek] [tinyint] NULL,
	[EnglishDayNameOfWeek] [nvarchar](10) NULL,
	[SpanishDayNameOfWeek] [nvarchar](10) NULL,
	[FrenchDayNameOfWeek] [nvarchar](10) NULL,
	[DayNumberOfMonth] [tinyint] NULL,
	[DayNumberOfYear] [smallint] NULL,
	[WeekNumberOfYear] [tinyint] NULL,
	[EnglishMonthName] [nvarchar](10) NULL,
	[SpanishMonthName] [nvarchar](10) NULL,
	[FrenchMonthName] [nvarchar](10) NULL,
	[MonthNumberOfYear] [tinyint] NULL,
	[CalendarQuarter] [tinyint] NULL,
	[CalendarYear] [char](4) NULL,
	[CalendarSemester] [tinyint] NULL,
	[FiscalQuarter] [tinyint] NULL,
	[FiscalYear] [char](4) NULL,
	[FiscalSemester] [tinyint] NULL,
    CONSTRAINT [PK_DimTime_TimeKey] PRIMARY KEY CLUSTERED 
    ( 
	[TimeKey] ASC
    )

WITH (
          PAD_INDEX = OFF
        , STATISTICS_NORECOMPUTE = OFF
        , IGNORE_DUP_KEY = OFF
        , ALLOW_ROW_LOCKS = ON
        , ALLOW_PAGE_LOCKS = ON
     ) 
     ON [PRIMARY],
 CONSTRAINT [AK_DimTime_FullDateAlternateKey] UNIQUE NONCLUSTERED 
(
	[FullDateAlternateKey] ASC
)
WITH (
           PAD_INDEX = OFF
         , STATISTICS_NORECOMPUTE = OFF
         , IGNORE_DUP_KEY = OFF
         , ALLOW_ROW_LOCKS = ON
         , ALLOW_PAGE_LOCKS = ON
     ) 
     ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[DimTime]
	add constraint FK_DimTime_DayNumberOfWeek
	    foreign key
	(
		[DayNumberOfWeek]
	)
	references dbo.lookupCalendarWeekday
	(
		[weekday]
	)
GO

ALTER TABLE [dbo].[DimTime]
	add constraint FK_DimTime_MonthNumberOfYear
	    foreign key
	(
		[MonthNumberOfYear]
	)
	references dbo.lookupCalendarMonth
	(
		[id]
	)
GO

SET NOEXEC OFF
go

SET ANSI_PADDING ON
GO

Quick Points

Here are a couple of quick points:

  • We are creating a foreign key FK_DimTime_DayNumberOfWeek on the dbo.DimTime table
  • The foreign key joins the resident DayNumberOfWeek column to the weekday table using the unique column (weekday)

 

Populate Date Table

SQL – Populate – Calendar – Weekday

There are a couple of important points:



use [DBLabDW]
go

/*
	dbo.lookupCalendarWeekday Population
*/

/*
	French Calendar Vocabulary
	http://french.about.com/od/vocabulary/a/calendar.htm

*/

/*
	Spanish - My Calendar
	http://www.bbc.co.uk/schools/primarylanguages/spanish/my_calendar/

*/

BEGIN TRAN;

	/*
		http://technet.microsoft.com/en-us/library/ms190766(v=sql.105).aspx
	*/
	-- Define the CTE expression name and column list.
	WITH cteWeekday
	(
		  [id]
                , [weekday]
		, [weekDayInEnglish]
		, [weekDayInFrench]
		, [weekDayInSpanish]
		, [sortOrder]
	)
	AS
	-- Define the CTE query.
	(
		SELECT 1, 1, 'Sunday', 'dimanche', 'domingo', 1
		UNION
		SELECT 2, 2, 'Monday', 'lundi', 'lunes', 2
		UNION
		SELECT 3, 3, 'Tuesday', 'mardi', 'martes', 3
		UNION
		SELECT 4, 4, 'Wednesday', 'mercredi', 'miércoles', 4
		UNION
		SELECT 5, 5, 'Thursday', 'jeudi', 'jueves', 5
		UNION
		SELECT 6, 6, 'Fridday', 'vendredi', 'viernes', 6
		UNION
		SELECT 7, 7, 'Saturday', 'samedi', 'sábado', 7
	)

	/*
		Merge Statement Definition
		http://technet.microsoft.com/en-us/library/bb522522(v=sql.105).aspx
	*/
	merge dbo.lookupCalendarWeekday as tblTarget
	using cteWeekday as tblSource

	ON (tblTarget.[id] = tblSource.[id]) 

		WHEN NOT MATCHED BY TARGET
			then Insert 
			(
				  [id]
                                , [weekday]
				, [weekDayInEnglish]
				, [weekDayInFrench]
				, [weekDayInSpanish]
			)
			values
			(
				  tblSource.[id]
                                , tblSource.weekday
				, tblSource.[weekDayInEnglish]
				, tblSource.[weekDayInFrench]
				, tblSource.[weekDayInSpanish]
			)

		WHEN MATCHED and 
			(

			    (
				isNull(tblTarget.[weekday], -1) 
				!= isNull(tblSource.[weekday], -1)
			    )

                         or
			    (
				isNull(tblTarget.[sortOrder], -1) 
				!= isNull(tblSource.[sortOrder], -1)
			    )

			  or
			   (
				isNull(tblTarget.[weekDayInEnglish], '') 
				!= isNull(tblSource.[weekDayInEnglish], '')
			  )

			  or
			  (
				isNull(tblTarget.[weekDayInFrench], '') 
				!= isNull(tblSource.[weekDayInFrench], '')
			  )

			  or
			 (
				isNull(tblTarget.[weekDayInSpanish], '') 
					!= isNull(tblSource.[weekDayInSpanish], '')
			 )

		)

		then Update set
                           [weekday] = tblSource.[weekday]
			 , [sortOrder] = tblSource.[sortOrder]
			 , [weekDayInEnglish] = tblSource.[weekDayInEnglish]
			 , [weekDayInFrench] = tblSource.[weekDayInFrench]
			 , [weekDayInSpanish] = tblSource.[weekDayInSpanish]
			 , [dateModified] = getdate()
			 , [modifiedBy] = SYSTEM_USER
	;

COMMIT TRAN;

SQL – Populate – Calendar – Month



use [DBLabDW]
go

/*
	dbo.lookupCalendarMonth Population
*/

/*
	French Calendar Vocabulary
	http://french.about.com/od/vocabulary/a/calendar.htm

*/

/*
	Spanish - My Calendar
	http://www.bbc.co.uk/schools/primarylanguages/spanish/my_calendar/

*/

/*
	delete from dbo.lookupCalendarMonth;
*/
BEGIN TRAN;

	/*
		http://technet.microsoft.com/en-us/library/ms190766(v=sql.105).aspx
	*/
	-- Define the CTE expression name and column list.
	WITH ctemonth
	(
		  [id]
		, [monthInEnglish]
		, [monthInFrench]
		, [monthInSpanish]
		, [sortOrder]
	)
	AS
	-- Define the CTE query.
	(
		SELECT 1, 'January', 'janvier', 'enero', 1
		UNION
		SELECT 2, 'February', 'février', 'febrero', 2
		UNION
		SELECT 3, 'March', 'mars', 'marzo', 3
		UNION
		SELECT 4, 'April', 'avril', 'abril', 4
		UNION
		SELECT 5, 'May', 'mai', 'mayo', 5    
		UNION
		SELECT 6, 'June', 'juin', 'junio', 6
		UNION
		SELECT 7, 'July', 'juillet', 'julio', 7
		UNION
		SELECT 8, 'August', 'août', 'agosto', 8
		UNION
		SELECT 9, 'September', 'septembre', 'septiembre', 9
		UNION
		SELECT 10, 'October', 'octobre', 'octubre', 10
		UNION
		SELECT 11, 'November', 'novermbre', 'noviembre', 11
		UNION
		SELECT 12, 'December', 'décembre', 'diciembre', 12
	)

	/*
		Merge Statement Definition
		http://technet.microsoft.com/en-us/library/bb522522(v=sql.105).aspx
	*/
	merge dbo.lookupCalendarMonth as tblTarget
	using ctemonth as tblSource

	ON (tblTarget.[id] = tblSource.[id]) 

		WHEN NOT MATCHED BY TARGET
			then Insert 
			(
				  [id]
				, [monthInEnglish]
				, [monthInFrench]
				, [monthInSpanish]
			)
			values
			(
				  tblSource.[id]
				, tblSource.[monthInEnglish]
				, tblSource.[monthInFrench]
				, tblSource.[monthInSpanish]
			)

		WHEN MATCHED and 
			(

				(
					isNull(tblTarget.[sortOrder], -1) 
				      != isNull(tblSource.[sortOrder], -1)
				)

				or
				(
				      isNull(tblTarget.[monthInEnglish], '') 
				     != isNull(tblSource.[monthInEnglish], '')
				)

				or
				(
					isNull(tblTarget.[monthInFrench], '') 
					!= isNull(tblSource.[monthInFrench], '')
				)

				or
				(
				     isNull(tblTarget.[monthInSpanish], '') 
				      != isNull(tblSource.[monthInSpanish], '')
				)

			)

			then Update set
				   [sortOrder] = tblSource.[sortOrder]
				 , [monthInEnglish] = tblSource.[monthInEnglish]
				 , [monthInFrench] = tblSource.[monthInFrench]
				 , [monthInSpanish] = tblSource.[monthInSpanish]
				 , [dateModified] = getdate()
				 , [modifiedBy] = SYSTEM_USER
	;

	select * 
        from dbo.lookupCalendarMonth

COMMIT TRAN;

SQL – Populate – DimTime

Pasted below is the sample code for populating the DimTime table.

As always, I left out the hardest part; which is how to calculate Fiscal Dates when it does not match the Calendar Dates.

Hopefully, will steal some cost later on and use that.

Here is the code:



set nocount on
go

/*
	How to join 2 tables without an ON clause
	http://stackoverflow.com/questions/15508142/how-to-join-2-tables-without-an-on-clause

*/
use [DBLabDW]
go

declare @dateBegin		datetime
declare @dateEnd		datetime
declare @dateCurrent	datetime

set @dateBegin = '1/1/2014'
set @dateEnd = '12/31/2014'

set @dateCurrent = @dateBegin

begin tran

   while (@dateCurrent <= @dateEnd)
   begin

	insert into dbo.DimTime
	(
             [FullDateAlternateKey]
	   , [DayNumberOfWeek]

	   , [EnglishDayNameOfWeek]
	   , [SpanishDayNameOfWeek]
	   , [FrenchDayNameOfWeek]
	   , [DayNumberOfMonth]
	   , [DayNumberOfYear]
	   , [WeekNumberOfYear]

	   , [EnglishMonthName]
	   , [SpanishMonthName]
	   , [FrenchMonthName]

	   , [MonthNumberOfYear]

	    , [CalendarQuarter]
	    , [CalendarYear]
	    , [CalendarSemester]

	    , [FiscalQuarter]
	    , [FiscalYear]
	    , [FiscalSemester]

	)
	select
		  @dateCurrent as [FullDateAlternateKey]
		, datepart(weekday, @dateCurrent) --[DayNumberOfWeek]
		, tblCalendarWeekday.[weekDayInEnglish]
		, tblCalendarWeekday.[weekDayInSpanish]
		, tblCalendarWeekday.[weekDayInFrench]
		, datepart(day, @dateCurrent) --[DayNumberOfMonth]
		, datepart(dayofYear, @dateCurrent) --[DayNumberOfMonth]
		, datepart(week, @dateCurrent) --[DayNumberOfMonth]
		, tblCalendarMonth.[monthInEnglish]
		, tblCalendarMonth.[monthInSpanish]
		, tblCalendarMonth.[monthInFrench]
		, datepart(month, @dateCurrent) -- [MonthNumberOfYear]

		, case 

		     when datepart(month, @dateCurrent) between 1 and 3 then 1
		     when datepart(month, @dateCurrent) between 4 and 6 then 2
		     when datepart(month, @dateCurrent) between 7 and 9 then 3
		     when datepart(month, @dateCurrent) between 10 and 12 then 4
		     else -1

		  end as [CalendarQuarter]

		, datepart(year, @dateCurrent)
		     as [CalendarYear]

		, case 
			when datepart(month, @dateCurrent) between 1 and 6 then 1
			when datepart(month, @dateCurrent) between 7 and 12 then 2
			else -1

		  end as [CalendarSemester]

		, case 

			when datepart(month, @dateCurrent) between 1 and 3 then 1
			when datepart(month, @dateCurrent) between 4 and 6 then 2
			when datepart(month, @dateCurrent) between 7 and 9 then 3
		        when datepart(month, @dateCurrent) between 10 and 12 then 4
			else -1

		  end	as [FiscalQuarter]

		, datepart(year, @dateCurrent)
			 as [FiscalYear]

		, case 

			when datepart(month, @dateCurrent) between 1 and 6 then 1
			when datepart(month, @dateCurrent) between 7 and 12 then 2
		        else -1
		  end as [FiscalSemester]

		from  [dbo].[lookupCalendarWeekday] tblCalendarWeekday

			cross join [dbo].[lookupCalendarMonth] tblCalendarMonth

		where tblCalendarWeekday.[weekday] = datepart(weekday, @dateCurrent)

		and   tblCalendarMonth.[id] = datepart(month, @dateCurrent)

		set @dateCurrent = dateadd(day, 1, @dateCurrent)

	end

	select * 
	from   dbo.DimTime

commit tran

go

Quick Points

Here are a couple of quick points:

  • As always, there is a very spirited debate that ran out into the open (Internet) surrounding the differences between datepart\week and datepart\iso_week and the role of locale\languages and the inferred DATE_FIRST setting

Conclusion

The Time Dimension is foundational to any discussion about Data Warehousing modeling.

In SQL Server Analysis Services (SSAS), one can either explicitly create it or have SSAS create it.

Best Practices suggests that one should explicitly create and populate it.

If you follow Best Practice (BP), please keep in mind the countries and locales that your system will be used.

And, whether you will have systems for each locale or whether you will use a centralized system.

Listening

Listening to my favorite story teller:

Kenny Chesney – The Boys of Fall

http://www.youtube.com/watch?v=AlXDo5WhQXI

References

References – Dimension Modeling

References – Calendar – Internationalization

References – Transact SQL

References – Transact SQL / Example

References – Transact SQL / DateFirst and DatePart 

References – Transact SQL / Compare ISO against ISO_WEEK

Transact SQL – Constraint – Primary Key

There are a couple of approaches one can use to get the primary key for a table.

Here are some of those ways:

  • sp_pkeys
  • sp_primarykeys
  • INFORMATION_SCHEMA.TABLE_CONSTRAINTS

sp_pkeys

Syntax:

exec sp_pkeys
@table_name = [table-name]
, @table_owner = [schema-name]
go

Sample:

exec sp_pkeys
@table_name = 'DimEmployee'
, @table_owner = 'dbo'
go

Output:

sp_pkeys

sp_primarykeys

Though sp_primarykeys was added to gain insight into remote data sources, you can use it it to query your local data source, as well.

Btw, to get foreign key data on remote data sources, please refer to sp_foreignkeys (Transact-SQL) – http://technet.microsoft.com/en-us/library/ms187337.aspx.

Syntax:

exec sp_primarykeys
table_server = [SQLInstanceName]
, table_catalog = [DatabaseName]
, @table_name = [tableName]
, @table_owner = [schemaName]
go

Sample:


use [AdventureWorksDW2008R2]
go

declare @serverName sysname
declare @databaseName sysname

--set server name to current SQL Instance
set @serverName = cast(SERVERPROPERTY('servername') as sysname)

--set database name to current database
set @databaseName = DB_NAME()

exec sp_primarykeys
@table_server = @servername
, @table_catalog = @databaseName
, @table_name = 'DimEmployee'
, @table_schema = 'dbo'
go

Output:

You might likely get a subtle error:


Msg 7411, Level 16, State 1, Procedure sp_primarykeys, Line 10
Server server-name is not configured for DATA ACCESS.

And, to correct try:

  • Enabling Data Access on the Data Source
  • The Data Source in this case is any registered OLE-DB Provider and so you can query the local or a remote data source

declare @serverName sysname
declare @isDataAccessEnabled bit

--set servername to local server
set @serverName = cast(SERVERPROPERTY('servername') as sysname)

--get data access enabled flag
select
@isDataAccessEnabled = tblServer.is_data_access_enabled
from sys.servers tblServer
where tblServer.name = @serverName

print '@isDataAccessEnabled :' + cast(@isDataAccessEnabled as sysname)

--if data access to sql server instance is disabled, please enable
if (@isDataAccessEnabled = 0)
begin

print 'Data Access on ' + @serverName + ' is currently disabled'

print 'Allowing Data access ' + @serverName + ' ...'

exec sp_serveroption @serverName , 'data access', 'true'

print 'Data Access on ' + @serverName + ' is now enabled'

end

 

INFORMATION_SCHEMA.TABLE_CONSTRAINTS

Syntax:


select *
from INFORMATION_SCHEMA.TABLE_CONSTRAINTS tblConstraint
where tblConstraint.CONSTRAINT_TYPE = 'PRIMARY KEY'
and     tblConstraint.CONSTRAINT_CATALOG = [database-Name]
and     tblConstraint.TABLE_SCHEMA = [schema-name]
and     tblConstraint.TABLE_NAME = [table-name]
go

Sample:


select *
from INFORMATION_SCHEMA.TABLE_CONSTRAINTS tblConstraint
where tblConstraint.CONSTRAINT_TYPE = 'PRIMARY KEY'
and     tblConstraint.CONSTRAINT_CATALOG = @databaseName
and     tblConstraint.TABLE_SCHEMA = 'dbo'
and     tblConstraint.TABLE_NAME = 'DimEmployee'
go

Output:

informationSchemaTable

References