Technical: Microsoft – SQL Server – Data type GUID – Function – Newsequentialid
Here is a post that was started well over a year ago. I was dealing with a terabyte database and the most expensive insert was inserting records into a table that has a uniqueIdentifier clustered primary key.
What is newsequentialid():
Quite a bit of web searches suggested that we consider using NewSequentialID.
NewSequentialID (Transact ID)
Creates a GUID that is greater than any GUID previously generated by this function on a specified computer since Windows was started. After restarting Windows, the GUID can start again from a lower range, but is still globally unique. When a GUID column is used as a row identifier, using NEWSEQUENTIALID can be faster than using the NEWID function. This is because the NEWID function causes random activity and uses fewer cached data pages. Using NEWSEQUENTIALID also helps to completely fill the data and index pages.
There are a couple of ways to use the NewID function.
We can use it as a default constraint and bind it directly to the column. Thus we are employing declaratively.
We can also use it in our code-line by explicitly invoking it, capture the results into a variable, and use that variable in our DML statement.
Using NewID() as default constraint
Here is use using NewID declaratively:
CREATE TABLE myTable ( ColumnRadom uniqueidentifier DEFAULT NEWID() );
Using result of NewID() in Insert Statements
Using it programatically:
CREATE TABLE myTableB ( ColumnRadomB uniqueidentifier );declare @UIRandom uniqueIdentifier set @UIRandom = newid() insert into myTableB(columnRandomB) values (@UIRandom);
NEWSEQUENTIALID() can only be used with DEFAULT constraints on table columns of type uniqueidentifier.
CREATE TABLE myTable ( ColumnSequential uniqueidentifier DEFAULT NEWSEQUENTIALID() );
Lab Work – Create Tables
Let us create two very simple tables that have uniqueIdentifiers columns. The UniqueIdentifer columns are in-use as clustered index primary key.
UniqueIdentifier has radommly generated default value
set noexec off go use [DBLab] go /* drop table dbo.datatypeUniqueIdentifierRandomTraditional */ if object_id( 'dbo.datatypeUniqueIdentifierRandomTraditional' ) is not null begin set noexec on end go create table [dbo].[datatypeUniqueIdentifierRandomTraditional] ( [uid] uniqueIdentifier not null constraint constraintDatatypeUniqueIdentifierRandomTraditional default NewID() , [counter] bigint not null constraint constraintDatatypeUniqueIdentifierRandomTraditional__Counter default (1) , constraint PK_datatypeUniqueIdentifierRandomTraditional primary key ([uid]) ) go set noexec off go
UniqueIdentifier has an increasing default value
Lab Work – Populate Tables
Here we are using ostress to populate our tables.
The basic code block is:
- Remove existing data
- Create ostress temporary folder, if it does not exist
- Remove ostress temporary folder
- Create two hundred threads and repeat payload twenty thousand times on each thread
- The payload is a simple insert statement into our table
Payload – Random UniqueIdentifiers
Completed inserts of 4 million records in 15 minutes and 40 seconds.
Payload – Sequential UniqueIdentifiers
Completed inserts of 4 million records in 10 minutes, 20 seconds.
Completed inserts of 4 million records in 11 minutes, 30 seconds.
Number of Data pages
Here is a sample query to determine the number of data pages occupied by objects:
- Because of page splits, the table that uses NEW_ID uses more data pages than the table that uses NewSequentialID; about 30% more pages for a table sized @ 4 million records
Here is a sample query to detect fragmentation:
- The table that relies on NEW_ID to populate its uniqueIdentifier column is 99% fragmented; while the corresponding table that uses NewSequentialID is only 7% fragmented
Change UniqueIdentifier column default constraint from newid to Newsequentialid
Here is a script that will change the default constraints on existing objects from using NEWID to NewSequentialID.
- Newsequentialid (History/Benefits and Implementation)
In general, we made significant improvements in SQL Server scalability during Yukon. One of the areas of improvement is replication scalability. While doing merge replication testing we found out that scaling was severely affected by high number of I/O operations.The cause of the problem was that new rows were inserted in random disk pages. Guid generating function (newid) was returning non-sequential guids which resulted in random B-tree lookups. After some investigation we figured out that we could use the new OS function UuidCreateSequential with some byte scrambling to convince the rest of SQL engine that guids are produced in sequential order.
- Hopefully, sequential reads will be more optimal due to overall smaller number of pages needed by NewSequentialID compared to GUID
- To ensure data integrity keep in mind that the system still uses system mutexes and so one will experience a slight bottleneck during “NewSequentialID” generation and dispensing
- NewSequentials ID has they are ever increasing and write to the same data pages at the bottom of our tables \ file groups create hot pages in that location. Consider careful partitioning to spread data across few physical LUNs. Keep in mind that this relates to the actual data writes, as log writes are always s those can are localized on the same physical file
One good thing about writing is one is forced to read quite a bit. And, that is true for many reasons. Reasons involve..
- Is the writing new and refreshing
- Is your hypothesis true
- Is it relevant
- Have others covered same topic and which ones are worth sharing
- One might discover new keywords. And, these days with Google, you end up searching on those keywords and might very well discover new, novel and fascinating areas
Along those lines, I hope you will take the time to read Dan Guzman’s post:
Why Random Keys are Bad
I think it’s important for one to understand why random keys have such a negative impact on performance against large tables. DBAs often cite fragmentation and page splits as the primary causes of poor performance with random keys. Although it is true random inserts do cause fragmentation and splits, the primary cause of bad performance with random keys is poor temporal reference locality ( http://en.wikipedia.org/wiki/Locality_of_reference ), which I’ll detail shortly. Note that there were no real page splits in these insert performance tests because the nearly 8K row size allowed only one row per page. Although significant extent fragmentation occurred, this didn’t impact these single-row requests; extent fragmentation is mostly an issue with sequential scans against spinning media. So neither splits nor fragmentation explain the poor performance of the random inserts.
Temporal reference locality basically means that once data is used (e.g. inserted or touched in any way), it is likely to be used again in the near future. This is why SQL Server uses a LRU-2 algorithm to manage the buffer cache; data most recently touched will remain in memory while older, less often referenced data are aged out. The impact of random key values on temporal locality (i.e. buffer efficiency) is huge. Consider that inserts are basically rewrites of existing pages. When a new row is inserted into a table, SQL Server first reads the page where the row belongs (by key value if the table has a clustered index) and then either adds the row to the existing data page or allocates a new one if there’s not enough space available in the existing page for the new row. With a random key value, the new key value is unlikely to be adjacent to the last one inserted (which is probably still in memory) so the needed page often must be read from storage.
In summary, inserts timelines are a bit less for uniqueIdentifiers compares to newid(). The system will also have less need for de-fragmentation.
And, as Dan Guzman pointed out, in time one might very well experience less I/O. This might be less obvious as one will have to watch System I/O metrics for extending period of time and compare the competing designs.
Microsoft – Reference – UniqueIdentifier
- Microsoft – NewSequentialID
Microsoft – History & Benefits
- NewSequentialID – History/Benefits
Application Development – Microsoft – .Net
- How to generate Sequentials for GUID SQL Server .Net
- Unraveling the mysteries of NewSequentialID http://www.jorriss.net/blog/archive/2008/04/24/unraveling-the-mysteries-of-newsequentialid
Application Development – ORM Tools – NHibernate – UniqueIdentifier
- NHibernate – GUID – CombGenerator
- Mass Transit – New ID
- Guid vs int
- Performance Comparison – Identity x – newid() – NewSequentialID()
Costs of UniqueIdentifiers / GUIDS
- The cost of GUIDs as primary keys (by Jimmy Nilsson)
- Improving UniqueIdentifier Performance
- Locality of Reference
Microsoft – SQL Server – Default Constraints
- Plamen Ratchev – List all default constraints in a db, and their columns
Microsoft – Sql Server – Stats – Fragmentation
- Check SQL Server a specified database index fragmentation percentage (SQL)
Microsoft – Sql Server – Index – Fill Factors
- Optimize Fill Factors for SQL Server