SQL Server – BCP – Memory Constraints

Background

As a follow-up to a recent post titled “SQLServer – BCP Optimization via Running Multiple Instances in Parallel ”  ( https://danieladeniji.wordpress.com/2015/07/22/sqlserver-bcp-optimization-via-running-multiple-instances-in-parallel/ ), let us see what happens on the DB Server has one pushes it a bit.

Monitoring

Activity Manager

Waits

ResourceWaits-Memory

Explanation:

  • We can see that we are really waiting on Memory

Processes

ProcessWaits

Explanation:

  • A more granular view that shows the same thing; as it indicates that our individual BCP requests are suspended as they wait for their memory requests to be fulfilled

WhoIsActive

Running Adam Machanic’s shows..

WhoIsActive

Explanation:

  1. No Blocking Sessions
  2. Relative to writes, a lot of reads on the active BCP sessions
  3. Other BCP Sessions are waiting on RESOURCE_SEMAPHORE

Dynamic Management Views

sys.dm_exec_query_memory_grants

Let us do a little bit more work.

Not that much more as it is Friday, and so we will just align memory requests to the correspondent session’s query.

Query:

SELECT
          [sqlText] = dmvESQT.[text]
        , dmvDMEQMG.session_id
        , dmvDMEQMG.request_time
        , dmvDMEQMG.grant_time
        , [waitInMinutes] =
             case
                when dmvDMEQMG.grant_time is null then
                   datediff(minute, dmvDMEQMG.request_time, getdate())
                then null
             end
        , dmvDMEQMG.requested_memory_kb
        , requested_memory_mb = (dmvDMEQMG.requested_memory_kb / 1024)
        , dmvDMEQMG.granted_memory_kb
        , dmvDMEQMG.required_memory_kb
        , dmvDMEQMG.used_memory_kb
        , dmvDMEQMG.max_used_memory_kb
        , dmvDMEQMG.query_cost
        , dmvDMEQMG.resource_semaphore_id
        , dmvS.login_name

FROM sys.dm_exec_query_memory_grants dmvDMEQMG

    LEFT OUTER JOIN sys.dm_exec_sessions dmvS

       ON dmvDMEQMG.session_id = dmvS.session_id

    outer apply sys.dm_exec_sql_text(dmvDMEQMG.sql_handle) dmvESQT

Output:

requestedMemory_20150729_0931AM

Explanation:

  • Memory Requested
    • We can see that depending on the targeted table and size of the data feed, the requested memory and the query cost varies a bit
    • We can also see that the requested memory, in our case, is about 885 MB; a little below 1 GB
    • Once we take into account this request is for each BCP session, we need a few few GBs of unused memory to simultaneously service our concurrent requests

sys.dm_tran_locks

Let us see what types of locks are being acquired.

Query:


set transaction isolation level read uncommitted;

SELECT

           lck.resource_type
         , lck.request_mode
         , lck.request_status
         , dmvS.login_name
         , lck.request_session_id
         , par.object_id
         --, object_schema_name(par.object_id) as schhemaName
         --, object_name(par.object_id) as objectName
         , [schema] = [schema].name
         , objectName = obj.name
         , COUNT(*) number_locks

FROM sys.dm_tran_locks lck with (nolock)

        INNER JOIN sys.partitions par  with (nolock)

           ON lck.resource_associated_entity_id = par.hobt_id

        INNER JOIN sys.objects obj  with (nolock)

            ON par.object_id = obj.object_id

        INNER JOIN sys.schemas [schema] with (nolock)

            ON obj.schema_id = [schema].schema_id

        INNER JOIN sys.dm_exec_sessions dmvS  with (nolock)

            ON lck.request_session_id = dmvS.session_id

where  [schema].[name] not in ('sys')

GROUP BY
              lck.resource_type
            , lck.request_mode
            , lck.request_status
            , dmvS.login_name
            , lck.request_session_id
            , par.object_id
            , [schema].name
            , obj.name 

Output:

locks_page

Explanation:

  • It looks like though we requested Table Locks, we are getting more granular locks; in this case PAGE LOCKS

Summary

Again, I will suggest that you have yourself a nice LAB to tinker with things.

Your DBA might not grant your request for “VIEW SERVER STATE“.

But, nevertheless find a way around and earn your “Come up“.

Dedicated

Dedicating Sarah McLachlan’s Shipwreck to the beautiful warriors, Jillian Johnson and Mayci Breaux, who “laid down” viewing the movie Train Wreck in New Orleans last night.

SQL Server – BulkCopy (BCP) – Identity Columns

Background

I am copying data from Production to my personal Development box, but my SQL Join Queries are not yielding any data.

Code

Original Code


SETLOCAL

 set DBTargetServer=.\SQLEXPRESS_V2014
 set DBTargetDB=Dev
 set BATCH_SIZE=30000

 sqlcmd -e -b -Q"truncate table dbo.tblStaffingDateTime" -S%DBTargetServer% -d%DBTargetDB%

 timer.exe

   bcp %DBTargetDB%.dbo.tblStaffingDateTime in %TEMP%\dbo.tblStaffingDateTime.data -T -n -b%BATCH_SIZE% -S%DBTargetServer%

 timer.exe /S

ENDLOCAL 

 

Revised Code

Traced the problem back to ensuring that we preserve our Identity Values.

When issuing bcp, we use -E to insist that our supplied identify values are used, rather than for the system to generate its own.


SETLOCAL

 set DBTargetServer=.\SQLEXPRESS_V2014
 set DBTargetDB=Dev
 set BATCH_SIZE=30000

 sqlcmd -e -b -Q"truncate table dbo.tblStaffingDateTime" -S%DBTargetServer% -d%DBTargetDB%

 timer.exe

   rem -E Keep Identity Values ON
   bcp %DBTargetDB%.dbo.tblStaffingDateTime in %TEMP%\dbo.tblStaffingDateTime.data -T -n -E -b%BATCH_SIZE% -S%DBTargetServer%

 timer.exe /S

ENDLOCAL 

Metrics

Using Gammadyne Timer Utility ( http://www.gammadyne.com/cmdline.htm#timer ) compared how long it takes to insert 150 thousand records.

 -E Skipped :- Does not Preserve Identity Values
keepIdentityOffCleaned

 -E Added :- Preserve Identity Values

keepIdentityOnCleaned

 

Conclusively, when we preserve our Identity Column our throughout is about 15% less.

Dedicated

Will like to go another way with this.  But, got to claim my own.  And, this morning it is Duke Blue Devils,  the National Collegiate Basketball Champion 2014.

But, even then it is April.  And, yet still considered going another road, Dave Loggins’ Pieces of April.

But, will rest with Touch-Of-Pennsylvania ( https://www.youtube.com/watch?v=9_sQv2hfVxg )

Microsoft – SQL Server – Bulking Copying data using XML Template File

Background

Preparing to perform a bit of load test against MS SQL Server.  And, so downloaded spawner, a MS Windows desktop application that allows one to generate test data.

We were able to generate about 1 million records.

 

Generate test data

 

Spawner

Here is what the Spawner desktop looks like:

Spawner

 

 

Create Table

Create Table

use [DBLabInMemory]
go

set noexec off
go


/*
 drop table [dbo].[personTraditionalClustered]
*/
if object_id('[dbo].[personTraditionalClustered]') is not null
begin
 set noexec on
end
go



create table [dbo].[personTraditionalClustered]
(

 [firstname] [varchar](40) 
      COLLATE Latin1_General_100_BIN2 NOT NULL

 , [lastname] [varchar](40) 
      COLLATE Latin1_General_100_BIN2 NOT NULL

 , [emailAddress] [varchar](40) 
      COLLATE Latin1_General_100_BIN2 NOT NULL

 , [phoneNumber] [varchar](40) 
      COLLATE Latin1_General_100_BIN2 NOT NULL

 , [addressStreet] [varchar](100) 
      COLLATE Latin1_General_100_BIN2 NOT NULL

 , [addressCity] [varchar](40) 
      COLLATE Latin1_General_100_BIN2 NOT NULL

 , [addressState] [varchar](40) 
      COLLATE Latin1_General_100_BIN2 NOT NULL

 , [addressPostalCode] [varchar](40) 
      COLLATE Latin1_General_100_BIN2 NOT NULL

 , [dateofBirth] datetime null


 , [uniqueID] uniqueIdentifier not null
     constraint defaultPersonTraditionalUniqueID 
        default newid() 



 , constraint PK_PersonTraditional
     primary key ([uniqueID])


 , index idx_Firstname nonclustered
     ([firstname])

 , index idx_Lastname nonclustered
     ([lastname])

 , index idx_AddressCity nonclustered
     ([addressCity])


 , index idx_AddressState nonclustered
     ([addressState])

 , index idx_AddressPostalCode nonclustered
     ([addressPostalCode])

 , index idx_EmailAddress nonclustered
     ([emailAddress])

 , index idx_DateofBirth nonclustered
     ([dateofBirth])
 

)
go

set noexec off
go

Recommendation

As recommendation, I will suggest that one follows the following strategy:

  • Create table
  • Insert test data into the table
  • create template file
  • Customize template file (separators, column ordering, number of columns)
  • Export a few trial records
  • Experiment with importing data

 

Create Table

Create database table; in our case we created a table earlier in this post

Insert test data

Inserting test data is useful and we can better determine if we have the correct data types, column lengths and nullability indicator.

 

Create XML Template file

Syntax

bcp <table> format nul -c -f <format-file> -x -c -S <server_name> -d <database-name> -T -q 

 

Sample

bcp dbo.personTraditionalClustered format nul -x -c 
   -f targetFileTemplate.xml -x -S (local) -d DBLabInMemory -T -q 

This is sufficient when the incoming data file has all the columns that are in the destination table.  If not all of the columns are present or if they are not present in the same order, I will suggest one use an editor and adjust the created format file.

The error messages that we are trying to avoid is stated below:

When we have more columns in SQL Server than in the datafile

Unexpected EOF encountered in BCP data-file

 

Customize format file

Here is what our format file looks like:

<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 <RECORD>
 <FIELD ID="1" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="40" COLLATION="Latin1_General_100_BIN2"/>
 <FIELD ID="2" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="40" COLLATION="Latin1_General_100_BIN2"/>
 <FIELD ID="3" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="40" COLLATION="Latin1_General_100_BIN2"/>
 <FIELD ID="4" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="40" COLLATION="Latin1_General_100_BIN2"/>
 <FIELD ID="5" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="100" COLLATION="Latin1_General_100_BIN2"/>
 <FIELD ID="6" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="40" COLLATION="Latin1_General_100_BIN2"/>
 <FIELD ID="7" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="40" COLLATION="Latin1_General_100_BIN2"/>
 <FIELD ID="8" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="40" COLLATION="Latin1_General_100_BIN2"/>
 <FIELD ID="9" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="24"/>
 <FIELD ID="10" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="37"/>
 </RECORD>
 <ROW>
 <COLUMN SOURCE="1" NAME="firstname" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="2" NAME="lastname" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="3" NAME="emailAddress" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="4" NAME="phoneNumber" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="5" NAME="addressStreet" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="6" NAME="addressCity" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="7" NAME="addressState" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="8" NAME="addressPostalCode" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="9" NAME="dateofBirth" xsi:type="SQLDATETIME"/>
 <COLUMN SOURCE="10" NAME="uniqueID" xsi:type="SQLUNIQUEID"/>
 </ROW>
</BCPFORMAT>


 

There are a couple of changes we will be making to our template file

  • Because our input data file does not have the the uniqueID column in it, we will remove the corresponding lines from the created format file.
  • We will also change our Terminator from \t (tab) to ||

Here is what our new format file:

<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 <RECORD>
 <FIELD ID="1" xsi:type="CharTerm" TERMINATOR="||" MAX_LENGTH="40" COLLATION="Latin1_General_100_BIN2"/>
 <FIELD ID="2" xsi:type="CharTerm" TERMINATOR="||" MAX_LENGTH="40" COLLATION="Latin1_General_100_BIN2"/>
 <FIELD ID="3" xsi:type="CharTerm" TERMINATOR="||" MAX_LENGTH="40" COLLATION="Latin1_General_100_BIN2"/>
 <FIELD ID="4" xsi:type="CharTerm" TERMINATOR="||" MAX_LENGTH="40" COLLATION="Latin1_General_100_BIN2"/>
 <FIELD ID="5" xsi:type="CharTerm" TERMINATOR="||" MAX_LENGTH="100" COLLATION="Latin1_General_100_BIN2"/>
 <FIELD ID="6" xsi:type="CharTerm" TERMINATOR="||" MAX_LENGTH="40" COLLATION="Latin1_General_100_BIN2"/>
 <FIELD ID="7" xsi:type="CharTerm" TERMINATOR="||" MAX_LENGTH="40" COLLATION="Latin1_General_100_BIN2"/>
 <FIELD ID="8" xsi:type="CharTerm" TERMINATOR="||" MAX_LENGTH="40" COLLATION="Latin1_General_100_BIN2"/>
 <FIELD ID="9" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="24"/>
 </RECORD>
 <ROW>
 <COLUMN SOURCE="1" NAME="firstname" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="2" NAME="lastname" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="3" NAME="emailAddress" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="4" NAME="phoneNumber" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="5" NAME="addressStreet" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="6" NAME="addressCity" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="7" NAME="addressState" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="8" NAME="addressPostalCode" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="9" NAME="dateofBirth" xsi:type="SQLDATETIME"/>
 </ROW>
</BCPFORMAT>


 

Export a few trial records

Let us export out a few records

bcp "select top 10 [firstname],[lastname],[emailAddress],[phoneNumber],[addressStreet],[addressCity],[addressState],[addressPostalCode],[dateofBirth] from [dbo].[personTraditionalClustered] with (nolock)" queryout extractedData_v4.txt -c -S (local) -d DBLabInMemory -T

 

In the example above there are a couple of things one should note:

  • We are only taking out some data; just the the first 10 records
  • We have also restricted the columns to the ones we have in our input file

 

Insert test data

Let us perform some trial runs

sqlcmd -S (local) -d DBLabInMemory -E -Q "truncate table [dbo].[personTraditionalClustered]"

bcp [dbo].[personTraditionalClustered] in datafile_v4.txt -f fileTemplate_v4.xml -S (local) -d DBLabInMemory -e errfile.log -T -F 2 -L 4

 

  • In our sample data file, we have added a header line and so we need -F 2 to indicate that raw data starts at line 2
  • We have also added -L 4 to indicate import should end at line 4
  • And, so we are in essence only importing 3 lines

 

Actual Run

A couple of changes we will be making in the actual run

  • Added -b to indicate that we will like to change our block size from the default of 1000 to 50000

 

sqlcmd -S (local) -d DBLabInMemory -E -Q "truncate table [dbo].[personTraditionalClustered]"

bcp [dbo].[personTraditionalClustered] in datafile_v4.txt -f fileTemplate_v4.xml -S (local) -d DBLabInMemory -e errfile.log -T -F 2 -L 4

 

Output:

Starting copy...
50000 rows sent to SQL Server. Total sent: 50000
50000 rows sent to SQL Server. Total sent: 100000
50000 rows sent to SQL Server. Total sent: 150000
50000 rows sent to SQL Server. Total sent: 200000
50000 rows sent to SQL Server. Total sent: 250000
50000 rows sent to SQL Server. Total sent: 300000
50000 rows sent to SQL Server. Total sent: 350000
50000 rows sent to SQL Server. Total sent: 400000
50000 rows sent to SQL Server. Total sent: 450000
50000 rows sent to SQL Server. Total sent: 500000
50000 rows sent to SQL Server. Total sent: 550000
50000 rows sent to SQL Server. Total sent: 600000
50000 rows sent to SQL Server. Total sent: 650000
50000 rows sent to SQL Server. Total sent: 700000
50000 rows sent to SQL Server. Total sent: 750000
50000 rows sent to SQL Server. Total sent: 800000
50000 rows sent to SQL Server. Total sent: 850000
50000 rows sent to SQL Server. Total sent: 900000
50000 rows sent to SQL Server. Total sent: 950000
50000 rows sent to SQL Server. Total sent: 1000000
50000 rows sent to SQL Server. Total sent: 1050000
50000 rows sent to SQL Server. Total sent: 1100000
50000 rows sent to SQL Server. Total sent: 1150000

1160001 rows copied.
Network packet size (bytes): 4096
Clock Time (ms.) Total : 160759 Average : (7215.78 rows per sec.)

 

Version Control

Version Control – Github

Files posted to github @ https://github.com/DanielAdeniji/MSSSQLBCPFormatFileXML

 

Had a little hiccup as our data file exceed the 100 MB limit



[dadeniji@adelia XML]$ git push origin master
Counting objects: 10, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (10/10), done.
Writing objects: 100% (10/10), 52.08 MiB | 692 KiB/s, done.
Total 10 (delta 1), reused 0 (delta 0)
remote: error: GH001: Large files detected.
remote: error: Trace: b9a986aaae560141bf94f2213cdb5c18
remote: error: See http://git.io/iEPt8g for more information.
remote: error: File datagenSimple.txt is 120.97 MB; this exceeds GitHub's file size limit of 100 MB
To git@github.com:DanielAdeniji/MSSSQLBCPFormatFileXML.git
! [remote rejected] master -> master (pre-receive hook declined)
error: failed to push some refs to 'git@github.com:DanielAdeniji/MSSSQLBCPFormatFileXML.git'


 

To address tarred file and removed original file from our local git repository.

Tarred file

tar -zcvf datagenSimple.tar.gz datagenSimple.txt

Remove file

git rm --cached datagenSimple.txt

 

Added file (datagenSimple.txt) to .gitignore

vi .gitignore

 

Added file (datagenSimple.tar.gz) to local git

git add datagenSimple.tar.gz

 

Commit file

git commit

 

Output:

[dadeniji@adelia XML]$ git commit -a -m 'First commit'
[master ca1d17a] First commit
2 files changed, 0 insertions(+), 1160002 deletions(-)
create mode 100644 datagenSimple.tar.gz
delete mode 100644 datagenSimple.txt

 

Push to remove repository

git push origin master

 

But, it was still trying to add the removed file

 

remove file from history

Used David Underhill’s (git-remove-history.sh) file to permanently remove the oversized file.  BTW, the shell script is available @ http://dound.com/2009/04/git-forever-remove-files-or-folders-from-history/

 

sh /tmp/git-remove-history.sh datagenSimple.txt

Output:

[dadeniji@adelia XML]$ sh /tmp/git-remove-history.sh datagenSimple.txt
Rewrite 32420099006159bf2413542475cdc556167503d7 (1/2)rm 'datagenSimple.txt'
Rewrite ca1d17a66e273b7ceca218ff2e24392913b36d22 (2/2)
Ref 'refs/heads/master' was rewritten
Counting objects: 16, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (16/16), done.
Writing objects: 100% (16/16), done.
Total 16 (delta 5), reused 0 (delta 0)

 

Push to git

Syntax:


[dadeniji@adelia XML]$ git push origin master

Output:


[dadeniji@adelia XML]$ git push origin master
Counting objects: 12, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (10/10), done.
Writing objects: 100% (12/12), 52.04 MiB | 691 KiB/s, done.
Total 12 (delta 2), reused 11 (delta 2)
remote: warning: GH001: Large files detected.
remote: warning: See http://git.io/iEPt8g for more information.
remote: warning: File datagenSimple.tar.gz is 52.02 MB; 
this is larger than GitHub's recommended maximum file size of 50 MB
To git@github.com:DanielAdeniji/MSSSQLBCPFormatFileXML.git
* [new branch] master -> master

 

References

BCP Reference

BCP Sample

Git Commands

 

Technical: Microsoft – SQL Server – Data I/O – Computed Columns

Technical: Microsoft – SQL Server – Data I/O – Computed Columns

Introduction

For the last couple of weeks or so, I have struggled with a little SQL Script that transfers data between two SQL Server Instances.

Background

The Transact SQL is simple enough.  It uses Microsoft’s SQL Server highly regarded Linked Server (Heterogeneous Database) to copy data from a legacy system to a new one.

Original Query – Linked Server

Here is the original query that uses Linked Server



        declare @logdateDestMax datetime
        declare @logdateDestMaxDefault datetime

        set @logdateDestMaxDefault = '1/1/1900'

        set @logdateDestMax = ( 
                                 select max(createDate) 
                                 from dbo.employee 
                              )

        if ( @logdateDestMax is null)
        begin
              set @logdateDestMax = @logdateDestMaxDefault
        end
	insert into dbo.employee	
        (
	   [username]
	  ,[createDate]
	  ,[createdBy]
	)
	select 
	   [username]
	  ,[createDate]
	  ,[createdBy]
        from [DBLABJ_NC_10_ODBC].[dbHR].dbo.employee tblSrc
	where [logdate] > @logdateDestMax

Rewrite  Query – OpenQuery

As I know that 4 part name queries can sometimes be problematic and slow, I thought may be I should rewrite to use openquery.

Here is the re-written query that uses Linked Server \ Open Query


        declare @logdateDestMax datetime
        declare @logdateDestMaxDefault datetime

        set @logdateDestMaxDefault = '1/1/1900'

        set @logdateDestMax = ( 
                                 select max(createDate) 
                                 from dbo.employee 
                              )

        if ( @logdateDestMax is null)
        begin
              set @logdateDestMax = @logdateDestMaxDefault
        end
	insert into dbo.employee	
        (
	   [username]
	  ,[createDate]
	  ,[createdBy]
	)
	select 
	   [username]
	  ,[createDate]
	  ,[createdBy]
        from openquery(
                           [DBLABJ_NC_10_ODBC]
                         , 'select * from [dbHR].dbo.employee'
                      ) tblSrc
	where [logdate] > @logdateDestMax

OpenQuery – Poor Auditioning

The example above is a very poor staging of the beenfits of openquery.

Usually, we want a query that can be completely serviceable at the Other Server. The query above still brings all the data over to the querying server.

Once all the data is brought over, the resident SQL instance then issues a comparison.

There are things we can do to better ‘position’ the linked Server.  Our alternate choices includes:

  • Creating a Stored Procedure on the Linked Server.  The SP will  accept parameters that will help ‘our case’

Compare 4 part name and OpenQuery

Compare Execution Plans

Here is a quick comparison of both plans.

ExecutionPlan

One can quickly see that the plan that uses the 4 part name is 53%, while the one that uses openquery is at 36%.

Compare IO Stats

With “set statistics io on”, we can see that IO stats for the first query:

Table 'employee'. Scan count 1, logical reads 1369, physical reads 0, read-ahead 
reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

We do not have corresponding IO stats for the query that uses openquery; as that query is completely ‘materialized’ on the Linked Server and seemingly Profiling Stats are not returned for externally ran queries.

Remote Scan Stats

If we focus on the remote Scan Stats, we are able to dig a bit deeper into our estimated and actual costs:

Linked Server Cost

RemoteQuery-Slow-4PartName

Open Query Cost

RemoteQuery-Slow-OpenQuery

Here is the break-down

Item 4 part name Open Query
Actual Number of Rows 5 49821
Actual Rebinds  1 0
Actual Rewinds  0 0
Estimated CPU Cost  4.9921 3.3633
Estimated I/O Cost  0 0
Estimated Number of Executions 1 1
Estimated Number of Rows  14936.3 10000
Estimated Operator Cost  4.9921 (100%) 3.36333 (100%)
Estimated Rebinds  0  0
Estimated Rewinds  0  0
Estimated Row Size 448 Bytes  448 Bytes
Estimated SubTree Cost 4.9921  3.36333
Logical Operation Remote Query Remote Scan
Node ID 1 1
Number of Executions 1 1
Output List  DBLAB].[dbo].[UpdateLog].col1, [DBLAB].[dbo].[UpdateLog].col2, [DBLAB].[dbo].[UpdateLog].logdate, [DBLAB].[dbo].[UpdateLog].col3 SQLNCLI11
Parallel False False
Physical Operation  Remote Query Remote Scan
Remote Object SELECT “Tbl1001”.”col1″ “Col1003″,”Tbl1001”.”col2″ “Col1004″,”Tbl1001″.”logdate” “Col1005″,”Tbl1001”.”col3″ “Col1006” FROM “DBLAB”.”dbo”.”UpdateLog” “Tbl1001” WHERE “Tbl1001″.”logdate”>? select * from [DBLAB].dbo.UpdateLog tblSrc
Remote Source  Named Instance name Actual Name Linked Server

Explanation

  • “Actual Number of Rows”.  OpenQuery is bringing back all the records and comparing them against our data filter.  It seems 4 part name is only indicating the number of records that will be returned by the select statement
  • For 4 part name, we have rebinds.  For best explanation of what rebinds are, please “Check out” Scary DBA’s Blog posting – Rebind and Rewind in Execution Plans (http://www.scarydba.com/2011/06/15/rebind-and-rewind-in-execution-plans/“)
  • Estimated CPU Cost – 4 the four part name it is a bit higher
  • Estimated Number of Rows – In case of  “4 Part Name”, the Engine can consult with the Linked Server and ask it for the number of records in the table; whereas, for the “Open Query”, the  query will have to be ran.  And, so we get a place holder 10,000 count
  • Estimated Operator Cost – Same as Estimated CPU Cost; it seems most of our cost is attributed to CPU; very little I/O cost
  • Estimated Sub Tree Cost – All query costs eaten up by this operation
  • Logical Operation – “Remote Query” (Linked Server) vs “Remote Scan” ( openquery)
  • Output List :- Openquery lists our SQL Server Provider (SQLNCLI11)
  • Parallel :- Our record count is relatively low and no parallelism
  • Physical Operation – Same as Logical Operation **** “Remote Query” (Linked Server) vs “Remote Scan” ( openquery) ***
  • In this case our Linked Server is on the same box as our calling Server.  For 4 part name, our entry is the Instance Name, not the full Instance Name, just the instance itself (SQLExpress).
    And, the openquery as the full instance name for our Linked Server

Summary

  • Linked Server offers a bit more reliable instrumentation
  • I/O Costs for the remote scan that is employed when we use OpenQuery are hidden

Bcp

Intro

We decided to try bcp as Linked Server “selects” was taken about 1.5 hours for a measly 15K (15000) records.

Now that we are out of MS SQL Server, we will use Gammadyne’s Timer.exe ( http://www.gammadyne.com/cmdline.htm#timer) to time the performance of our bcp sessions.

As we are only profiling for performance, we will tweak our code by making a couple of adjustments.

  • Add -L <N (-L 1000) :- Only return 1000 records

Bcp Performance – Table

Get data from the entire table.

bcp  "dbo.employee" out employee.txt -c -S hrDB -d hr -T -L 1000

 Image

bcpTable

  • So we are only getting 11.63 rows per sec.
  • And, it is taken us 90 seconds to get 1000 records

Bcp Performance – QueryOut

Get data for all columns.

bcp  "select * from dbo.employee" queryout employee.txt -c -S hrDB -d hr -T -L 1000

 Image

bcpQueryOut

  • So we are only getting 11.56 rows per sec.
  • And, it is taken us 86.8 seconds to get 1000 records

So our numbers are still not nothing to jump around about.

Looked at the following areas on the Linked Server:

  • Wait Stats

Bcp Performance – QueryOut – Filtered ColumnList

Reviewed our column list  and noticed that one of the columns we are bringing back is a computed column.

We removed that column by explicitly listing the columns that we want:

bcp  "select FIRSTNAME, LASTNAME, USERNAME, PASSWORD from dbo.employee" queryout 
employee.txt -c -S hrDB -d hr -T -L 1000

 Image

bcpQueryOutFilteredColumnList

  • So we are only getting 62,500 rows per sec.
  • And, it is taken us 62 milliseconds to get 1000 records

Computed Columns

So our biggest drag was the “computed column”.  We do not even need it, as it is will be regenerated on the other side.

Find Computed Columns

Find Computed Columns – MS SQL Server 2000


select 
		  tblUser.name as schemaName
		, tblObject.name as objectName
		, tblColumn.name as columnName

from   syscolumns tblColumn

	  inner join sysobjects tblObject

	 	on tblColumn.id = tblObject.id

	 inner join sysusers tblUser

		on tblObject.uid = tblUser.uid

where  tblObject.[type] = 'U' 
and    tblColumn.iscomputed = 1

Find Computed Columns – MS SQL Server 2005 and above



select 
		  tblSchema.name as schemaName
		, tblObject.name as objectName
		, tblColumn.name as columnName
		, tblColumnComputed.definition
		, tblColumnComputed.is_computed as [isComputed]		
		, tblColumnComputed.is_persisted as isPersisted		

from   sys.columns tblColumn

	  inner join sys.objects tblObject

	     on tblColumn.object_id = tblObject.object_id

	  inner join sys.schemas tblSchema

	     on tblObject.schema_id = tblSchema.schema_id

	  inner join sys.computed_columns tblColumnComputed

	     on  tblColumn.object_id = tblColumnComputed.object_id
	     and tblColumn.column_id = tblColumnComputed.column_id

where  tblObject.type = 'U'
and    tblColumn.is_computed = 1


Image

ListComputedColumns_v2005

Conclusion

So again keep an eye on computed columns.

They were introduced in MS SQL Server v2000, but at that time they can not be persisted.

From v2005 on out, they can be persisted.

Persisted means their values are computed upon initial creation or subsequent updates.  And, thereafter the saved value is read and thus “readers” are not forced to pay the price of re-calculations.

But, even then make sure that you only include them when you need them.  Be especially careful when you use * (select * from <schema-name>.<object-name>); or when you reference the entire table.