Microsoft – SQL Server – Function – Newsequentialid

 

Background:

Here is a post that was started well over a year ago.  I was dealing with a terabyte database and the most expensive insert was inserting records into a table that has a uniqueIdentifier clustered primary key.

What is newsequentialid():

Quite a bit of web searches suggested that we consider using NewSequentialID.

NewSequentialID (Transact ID)
http://msdn.microsoft.com/en-us/library/ms189786.aspx

Creates a GUID that is greater than any GUID previously generated by this function on a specified computer since Windows was started. After restarting Windows, the GUID can start again from a lower range, but is still globally unique. When a GUID column is used as a row identifier, using NEWSEQUENTIALID can be faster than using the NEWID function. This is because the NEWID function causes random activity and uses fewer cached data pages. Using NEWSEQUENTIALID also helps to completely fill the data and index pages.

 

Implementation:

NewID

There are a couple of ways to use the NewID function.

We can use it as a default constraint and bind it directly to the column.  Thus we are employing declaratively.

We can also use it in our code-line by explicitly invoking it, capture the results into a variable, and use that variable in our DML statement.

Using NewID() as default constraint

Here is use using NewID declaratively:

    CREATE TABLE myTable 
    (
       ColumnRadom uniqueidentifier
           DEFAULT NEWID()
    );

 

Using result of NewID() in Insert Statements

Using it programatically:

    CREATE TABLE myTableB 
    (
       ColumnRadomB uniqueidentifier
    );

 declare @UIRandom uniqueIdentifier set @UIRandom = newid() insert into myTableB(columnRandomB) values (@UIRandom);

 

NewSequentialID

NEWSEQUENTIALID() can only be used with DEFAULT constraints on table columns of type uniqueidentifier.

For example:

    CREATE TABLE myTable 
    (
       ColumnSequential uniqueidentifier
           DEFAULT NEWSEQUENTIALID()
    );

 

Lab Work – Create Tables

Let us create two very simple tables that have uniqueIdentifiers columns.  The UniqueIdentifer columns are in-use as clustered index primary key.

 

UniqueIdentifier has radommly generated default value

set noexec off
go

use [DBLab]
go

/*
  drop table dbo.datatypeUniqueIdentifierRandomTraditional
*/

if object_id(
             'dbo.datatypeUniqueIdentifierRandomTraditional'
           ) is not null
begin
    set noexec on
end
go

create table 
    [dbo].[datatypeUniqueIdentifierRandomTraditional]
(
   [uid] uniqueIdentifier not null

    constraint 
        constraintDatatypeUniqueIdentifierRandomTraditional
      default NewID()

  , [counter] bigint not null

    constraint constraintDatatypeUniqueIdentifierRandomTraditional__Counter
    default (1)


, constraint PK_datatypeUniqueIdentifierRandomTraditional 
    primary key
    ([uid])

)
go

set noexec off
go

 

UniqueIdentifier has an increasing  default value

 

set noexec off
go

use [DBLab]
go

/*
   drop table dbo.datatypeUniqueIdentifierSequentialTraditional
*/

if object_id(
             'dbo.datatypeUniqueIdentifierSequentialTraditional'           ) is not null
begin
    set noexec on
end
go

create table 
    [dbo].[datatypeUniqueIdentifierSequentialTraditional]
(
   [uid] uniqueIdentifier not null

    constraint 
        constraintDatatypeUniqueIdentifierSequentialTraditional
      default NewSequentialID()

  , [counter] bigint not null

    constraint constraintDatatypeUniqueIdentifierSequentialTraditional__Counter
    default (1)


, constraint PK_datatypeUniqueIdentifierSequentialTraditional 
    primary key
    ([uid])

)
go

set noexec off
go

 

Lab Work – Populate Tables

Here we are using ostress to populate our tables.

The basic code block is:

  • Remove existing data
  • Create ostress temporary folder, if it does not exist
  • Remove ostress temporary folder
  • Create two hundred threads and repeat payload twenty thousand times on each thread
  • The payload is a simple insert statement into our table

 

Payload – Random UniqueIdentifiers

sqlcmd -S (local) -d DBLab -E -Q " truncate table dbo.datatypeUniqueIdentifierRandomTraditional;" 
if not exist e:\tmp\ostress\ md e:\tmp\ostress\
del e:\tmp\ostress\* /Q
"E:\Program Files\Microsoft Corporation\RMLUtils\ostress.exe" -q -oe:\tmp\ostress -S(local) -dDBLab -E -Q"insert into dbo.datatypeUniqueIdentifierRandomTraditional default values;" -n200 -r20000

 

Output: randomInserts

Completed inserts of 4 million records in 15 minutes and 40 seconds.

 

Payload – Sequential UniqueIdentifiers

sqlcmd -S (local) -d DBLab -E -Q " truncate table dbo.datatypeUniqueIdentifierSequentialTraditional;" 
if not exist e:\tmp\ostress\ md e:\tmp\ostress\
del e:\tmp\ostress\* /Q
"E:\Program Files\Microsoft Corporation\RMLUtils\ostress.exe" -q -oe:\tmp\ostress -S(local) -dDBLab -E -Q"insert into dbo.datatypeUniqueIdentifierSequentialTraditional default values;" -n200 -r20000

 

Output:

UniqueIdentifierSequentialInsertsBenchmark

Completed inserts of 4 million records in 10 minutes, 20 seconds.

 

sequentialInserts

Completed inserts of 4 million records in 11 minutes, 30 seconds.

 

 

Number of Data pages

Here is a sample query to determine the number of data pages occupied by objects:

SELECT 
          Schema_name(tblObject.schema_id) as [schema]
    , OBJECT_NAME(tblIndex.OBJECT_ID) AS TableName
    , tblIndex.name AS IndexName
    , tblIndexStat.index_type_desc AS IndexType 
    , tblIndexColumn.column_id as IndexColumnID
    , tblColumnType.name as columnDataType
    , tblIndexStat.page_count as [pageCount]
    , tblPartition.rows as NumberofRecords

FROM sys.dm_db_index_physical_stats
     (DB_ID(), NULL, NULL, NULL, NULL) tblIndexStat

    INNER JOIN sys.objects tblObject

        ON tblIndexStat.object_id = tblObject.object_id

    INNER JOIN sys.indexes tblIndex
        ON tblIndex.object_id = tblIndexStat.object_id 
        AND tblIndex.index_id = tblIndexStat.index_id 

    inner join sys.index_columns tblIndexColumn
         on tblIndex.object_id = tblIndexColumn.object_id

    inner join sys.columns tblColumn
       on  tblIndexColumn.object_id = tblColumn.object_id
       and tblIndexColumn.column_id = tblColumn.column_id

    inner join  sys.types tblColumnType

       on tblColumn.system_type_id = 
                 tblColumnType.system_type_id

        
        INNER JOIN sys.partitions tblPartition  
      ON  tblPartition.object_id = tblIndex.object_id 
      AND tblPartition.index_id = tblIndex.index_id   

--column Data type
where tblColumnType.name in ('uniqueidentifier')

--index first column
and   tblIndexColumn.column_id in (1)

ORDER BY tblIndexStat.page_count DESC

 

Output:

pageCount

 

Explanation:

  • Because of page splits, the table that uses NEW_ID uses more data pages than the table that uses NewSequentialID; about 30% more pages for a table sized @ 4 million records

 

Index Fragmentation

Here is a sample query to detect fragmentation:

SELECT 
          Schema_name(tblObject.schema_id) as [schema]
    , OBJECT_NAME(tblIndex.OBJECT_ID) AS TableName
    , tblIndex.name AS IndexName
    , tblIndexStat.index_type_desc AS IndexType 
    , tblIndexStat.avg_fragmentation_in_percent 
    , tblPartition.rows as NumberofRecords

FROM sys.dm_db_index_physical_stats
       (DB_ID(), NULL, NULL, NULL, NULL) tblIndexStat

    INNER JOIN sys.objects tblObject
        ON tblIndexStat.object_id = tblObject.object_id

    INNER JOIN sys.indexes tblIndex
        ON tblIndex.object_id = tblIndexStat.object_id 
        AND tblIndex.index_id = tblIndexStat.index_id 

    INNER JOIN sys.partitions tblPartition  
        ON  tblPartition.object_id = tblIndex.object_id 
        AND tblPartition.index_id = tblIndex.index_id   

WHERE  OBJECT_NAME(tblIndex.OBJECT_ID) like 'datatypeUni%'

ORDER BY tblIndexStat.avg_fragmentation_in_percent DESC

Output:

fragmentation

 

Explanation:

  • The table that relies on NEW_ID to populate its uniqueIdentifier column is 99% fragmented; while the corresponding table that uses NewSequentialID is only 7% fragmented

 

Change UniqueIdentifier column default constraint from newid to Newsequentialid

 

Here is a script that will change the default constraints on existing objects from using NEWID to NewSequentialID.

 

use [master]
go

if object_id(
             'dbo.sp_ChangeDefaultConstraintOnUniqueIdentifier'
            ) is null
begin

    exec('create procedure dbo.sp_ChangeDefaultConstraintOnUniqueIdentifier as select 1/0 as [shell] ')

end
go

alter procedure 
    dbo.sp_ChangeDefaultConstraintOnUniqueIdentifier
(
    @scriptOnly bit = 1
  , @debug bit = 0
)
as
begin

    set nocount on;

    declare @id        int
    declare @idAsString     sysname
    declare @schemaName    sysname
    declare @objectName    sysname
    declare @columnName    sysname

    declare @constraintName        sysname
    declare @constraintNameAssigned sysname
    declare @constraintDefinition   sysname

    declare @CRLF sysname
    declare @TAB  sysname

    declare @CONSTRAINT_DEFINITION_NEWID     sysname
    declare @CONSTRAINT_DEFINITION_NEWID_QUOTED    sysname
    declare @CONSTRAINT_DEFINITION_NEWSEQUENTIALID sysname


    declare @SQLLine  nvarchar(4000)
    declare @SQL    nvarchar(4000)
    declare @log      nvarchar(4000)

    declare @NEWID sysname
    declare @TRANSACTION_NAME   sysname

    declare @iNumberofEntries   int


    declare @tblConstraint TABLE
    (
          [id] int not null identity(1,1)
        , [schemaName] sysname not null
        , [objectName] sysname not null
        , [columnName] sysname not null
        , [constraintName] sysname null
        , [constraintDefinition] sysname null

    )


    set @constraintName = null
    set @constraintDefinition = null

    set @TAB = char(9)
    set @CRLF = char(13) + char(10)

    set @CONSTRAINT_DEFINITION_NEWID = 'newid()'
    set @CONSTRAINT_DEFINITION_NEWID_QUOTED = '(newid())'
    set @CONSTRAINT_DEFINITION_NEWSEQUENTIALID = 
               'NewSequentialID()'
    set @TRANSACTION_NAME =
               'trnDefaultConstraintModification'

    insert into @tblConstraint
    (
          [schemaName]
        , [objectName]
        , [columnName]
        , [constraintName] 
        , [constraintDefinition]

    )
    select 
           schema_name(tblObject.schema_id) as [schemaName]
         , tblObject.name as objectName 
         , tblColumn.name as columnName
         , tblDefaultConstraint.name as constraintName
         , tblDefaultConstraint.[definition]

    from   sys.objects tblObject

            INNER JOIN sys.indexes tblIndex

          ON tblObject.object_id = tblIndex.object_id 

        inner join sys.index_columns tblIndexColumn

         on tblIndex.object_id = 
                        tblIndexColumn.object_id

        inner join sys.columns tblColumn

          on  tblIndexColumn.object_id 
                        = tblColumn.object_id
          and tblIndexColumn.column_id 
                        = tblColumn.column_id


        inner join  sys.types tblColumnType

          on tblColumn.system_type_id
                      = tblColumnType.system_type_id


        left outer join sys.default_constraints 
                     tblDefaultConstraint

          on  tblColumn.object_id =
tblDefaultConstraint.parent_object_id 
          and tblColumn.column_id = 
                      tblDefaultConstraint.parent_column_id  

    -- user table
    where  tblObject.[type] = 'U'

    --first index column
    and   tblIndexColumn.column_id in (1)

    --column Data type
    and    tblColumnType.name in ('uniqueidentifier')

    set @iNumberofEntries = ( select count(*) 
                                  from @tblConstraint
                                )


    if (@debug = 1)
    begin

        select * 
        from @tblConstraint

    end
    
    set @id = 1

    while (@id <= @iNumberofEntries)
    begin

        set @idAsString = cast(@id as sysname)

        select 
             @schemaName = [schemaName]
           , @objectName = [objectName]
           , @columnName = [columnName]
           , @constraintName = [constraintName]
           , @constraintDefinition = 
                        constraintDefinition
        from  @tblConstraint
        where [id] = @id

        if (@debug = 1)
        begin


            set @log = 'Processing '
                    + @idAsString


            print @log

        end

        if
          (
               (@constraintName is null)
            or (@constraintDefinition is null)
            or (ltrim(rtrim(@constraintDefinition)) = 
                         @CONSTRAINT_DEFINITION_NEWID)
            or (ltrim(rtrim(@constraintDefinition)) = 
                         @CONSTRAINT_DEFINITION_NEWID_QUOTED)
            )
        begin

           set @SQL = ''
           set @SQLLine = ''

           set @SQL = @SQL + 'begin tran ' 
                       + @TRANSACTION_NAME + @CRLF

           if (@constraintName is null)
           begin
                
             set @constraintNameAssigned = 
                           'constraintDefault'
              + @objectName 
              + '__' 
              + @columnName

           end
           else
           begin

            set @SQLLine = 
                  @TAB
                        + 'alter table '
                        + quoteName(@schemaName)
                + '.'
                + quoteName(@objectName)
                + ' drop constraint '
                + quoteName(@constraintName)
                            + @CRLF

            set @sql = @sql + @SQLLine 


        end

        set @SQLLine =
            @TAB
             + 'alter table '
             + ' '
             + isNull(quoteName(@schemaName), '')
             + '.'
             + isNull(quoteName(@objectName), '')
             + ' '
             + ' add constraint '
             + ' '
             + '[' + isNull(@constraintName
                            , @constraintNameAssigned)
                     + ']'
             + ' default '
             + ' '
             + @CONSTRAINT_DEFINITION_NEWSEQUENTIALID
             + ' '
             + ' for '
             + ' '
             + quoteName(@columnName)
             + @CRLF

        set @SQL = @SQL + @SQLLine

        set @SQL = @SQL + 'commit tran ' 
                              + isNull(@TRANSACTION_NAME, '')
                              + @CRLF

           set @log = @TAB + @TAB 
            + 'SQL: '
                + @idAsString + ' ' + @CRLF 
            + ' ' 
            + isNull(@SQL, ' --blank -- ')

            print @log

        if (@scriptOnly = 0)
        begin

           print @TAB + 'Executing '
           exec( @SQL)

        end

        set @SQL = ''

    end -- if need to modify


    set @id = @id + 1


   end -- while


end
go


EXEC sys.sp_MS_marksystemobject 
     'dbo.sp_ChangeDefaultConstraintOnUniqueIdentifier'
go


/*

   use [DBLab]
   go

   exec dbo.sp_ChangeDefaultConstraintOnUniqueIdentifier
         @debug = 1
       , @scriptOnly = 0
   go


*/

Benefits

  • Newsequentialid (History/Benefits and Implementation)
    http://blogs.msdn.com/b/sqlprogrammability/archive/2006/03/23/559061.aspx

    In general, we made significant improvements in SQL Server scalability during Yukon. One of the areas of improvement is replication scalability.  While doing merge replication testing we found out that scaling was severely affected by high number of I/O operations.The cause of the problem was that new rows were inserted in random disk pages.  Guid generating function (newid) was returning non-sequential guids which resulted in random B-tree lookups.  After some investigation we figured out that we could use the new OS function UuidCreateSequential with some byte scrambling to convince the rest of SQL engine that guids are produced in sequential order.
  • Hopefully, sequential reads will be more optimal due to overall smaller number of pages needed by NewSequentialID compared to GUID

 

Downsides

  • To ensure data integrity keep in mind that the system still uses system mutexes and so one will experience a slight bottleneck during “NewSequentialID” generation and dispensing
  • NewSequentials ID has they are ever increasing and write to the same data pages at the bottom of our tables \ file groups create hot pages in that location.  Consider careful partitioning to spread data across few physical LUNs.  Keep in mind that this relates to the actual data writes, as log writes are always s those can are localized on the same physical file

 

Best Explanation:

One good thing about writing is one is forced to read quite a bit.  And, that is true for many reasons.  Reasons involve..

  • Is the writing new and refreshing
  • Is your hypothesis true
  • Is it relevant
  • Have others covered same topic and which ones are worth sharing
  • One might discover new keywords.  And, these days with Google, you end up searching on those keywords and might very well discover new, novel and fascinating areas

 

Along those lines, I hope you will take the time to read Dan Guzman’s post:

Why Random Keys are Bad
http://www.dbdelta.com/improving-uniqueidentifier-performance/

I think it’s important for one to understand why random keys have such a negative impact on performance against large tables. DBAs often cite fragmentation and page splits as the primary causes of poor performance with random keys. Although it is true random inserts do cause fragmentation and splits, the primary cause of bad performance with random keys is poor temporal reference locality ( http://en.wikipedia.org/wiki/Locality_of_reference ), which I’ll detail shortly. Note that there were no real page splits in these insert performance tests because the nearly 8K row size allowed only one row per page. Although significant extent fragmentation occurred, this didn’t impact these single-row requests; extent fragmentation is mostly an issue with sequential scans against spinning media. So neither splits nor fragmentation explain the poor performance of the random inserts.

Temporal reference locality basically means that once data is used (e.g. inserted or touched in any way), it is likely to be used again in the near future. This is why SQL Server uses a LRU-2 algorithm to manage the buffer cache; data most recently touched will remain in memory while older, less often referenced data are aged out. The impact of random key values on temporal locality (i.e. buffer efficiency) is huge. Consider that inserts are basically rewrites of existing pages. When a new row is inserted into a table, SQL Server first reads the page where the row belongs (by key value if the table has a clustered index) and then either adds the row to the existing data page or allocates a new one if there’s not enough space available in the existing page for the new row. With a random key value, the new key value is unlikely to be adjacent to the last one inserted (which is probably still in memory) so the needed page often must be read from storage.

 

Summary

In summary, inserts timelines are a bit less for uniqueIdentifiers compares to newid().  The system will also have less need for de-fragmentation.

And, as Dan Guzman pointed out, in time one might very well experience less I/O.  This might be less obvious as one will have to watch System I/O metrics for extending period of time and compare the competing designs.

References

Microsoft – Reference – UniqueIdentifier

  • Microsoft – NewSequentialID
    Link

 

Microsoft – History & Benefits

  • NewSequentialID – History/Benefits
    Link

 

Application Development – Microsoft – .Net

  • How to generate Sequentials for GUID SQL Server .Net
    Link
  • Unraveling the mysteries of NewSequentialID
    Link

 

Application Development – ORM Tools – NHibernate – UniqueIdentifier

  • NHibernate – GUID – CombGenerator
    Link
  • Mass Transit – New ID
    Link
  • MassTransit /src/MassTransit/NewId/NewId.cs
    Link
  • Mass Transit – New ID
    Link

 

Comparison

  • Performance Comparison – Identity x – newid() – NewSequentialID()
    Link

Costs of UniqueIdentifiers / GUIDS

  • The cost of GUIDs as primary keys (by Jimmy Nilsson)
    Link

 

Optimization

  • Improving UniqueIdentifier Performance
    Link

 

Software Development

  • Locality of Reference
    Link

 

Microsoft – SQL Server – Default Constraints

  • Plamen Ratchev – List all default constraints in a db, and their columns
    Link

 

Microsoft – Sql Server – Stats – Fragmentation

  • Check SQL Server a specified database index fragmentation percentage (SQL)
    Link

 

Microsoft – Sql Server – Index – Fill Factors

  • Optimize Fill Factors for SQL Server
    Link

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s