SQL Server – BulkCopy ( BCP ) – Which data file?

Background

Using BCP, we are churning though quite a bit of files.

Unfortunately, the DOS batch file that I quickly stitched together missed an importantly functionality.

And, that functionality is to log the current file being processed.

Diagnostic

Microsoft

Resource Monitor

I am a big fan of Microsoft’s Resource Monitor.

Let us use it.

Preparation

We remote connect to the source computer and launched task Manager; from Task Manager accessed Resource Monitor.

Resource Monitor – Tab – Memory

Image

ResourceMonitor_Tab_CPU_ProcessesAndAssociatedHandles_20180706_0850AM.png

Explanation
  1. When we filter on the bcp.exe process, we see our data file as one of the files mentioned in the “Associated Handles” tab

 

Resource Monitor – Tab – Memory

Image

ResourceMonitor_Tab_Memory_20180706_0852AM.png

Explanation
  1. The BCP process is using about 14 MB

 

Resource Monitor – Tab – Disk

Image

ResourceMonitor_Tab_Disk_20180706_0907AM.png

Explanation
  1. sqlserver,exe is mentioned
  2. But, not our bcp.exe file

 

Summary

To get a reliable insight into which files are being accessed, please consider Microsoft’s own Resource Monitor; specifically the CPU tab.

 

 

SQL Server – BCP – Memory Constraints

Background

As a follow-up to a recent post titled “SQLServer – BCP Optimization via Running Multiple Instances in Parallel ”  ( https://danieladeniji.wordpress.com/2015/07/22/sqlserver-bcp-optimization-via-running-multiple-instances-in-parallel/ ), let us see what happens on the DB Server has one pushes it a bit.

Monitoring

Activity Manager

Waits

ResourceWaits-Memory

Explanation:

  • We can see that we are really waiting on Memory

Processes

ProcessWaits

Explanation:

  • A more granular view that shows the same thing; as it indicates that our individual BCP requests are suspended as they wait for their memory requests to be fulfilled

WhoIsActive

Running Adam Machanic’s shows..

WhoIsActive

Explanation:

  1. No Blocking Sessions
  2. Relative to writes, a lot of reads on the active BCP sessions
  3. Other BCP Sessions are waiting on RESOURCE_SEMAPHORE

Dynamic Management Views

sys.dm_exec_query_memory_grants

Let us do a little bit more work.

Not that much more as it is Friday, and so we will just align memory requests to the correspondent session’s query.

Query:

SELECT
          [sqlText] = dmvESQT.[text]
        , dmvDMEQMG.session_id
        , dmvDMEQMG.request_time
        , dmvDMEQMG.grant_time
        , [waitInMinutes] =
             case
                when dmvDMEQMG.grant_time is null then
                   datediff(minute, dmvDMEQMG.request_time, getdate())
                then null
             end
        , dmvDMEQMG.requested_memory_kb
        , requested_memory_mb = (dmvDMEQMG.requested_memory_kb / 1024)
        , dmvDMEQMG.granted_memory_kb
        , dmvDMEQMG.required_memory_kb
        , dmvDMEQMG.used_memory_kb
        , dmvDMEQMG.max_used_memory_kb
        , dmvDMEQMG.query_cost
        , dmvDMEQMG.resource_semaphore_id
        , dmvS.login_name

FROM sys.dm_exec_query_memory_grants dmvDMEQMG

    LEFT OUTER JOIN sys.dm_exec_sessions dmvS

       ON dmvDMEQMG.session_id = dmvS.session_id

    outer apply sys.dm_exec_sql_text(dmvDMEQMG.sql_handle) dmvESQT

Output:

requestedMemory_20150729_0931AM

Explanation:

  • Memory Requested
    • We can see that depending on the targeted table and size of the data feed, the requested memory and the query cost varies a bit
    • We can also see that the requested memory, in our case, is about 885 MB; a little below 1 GB
    • Once we take into account this request is for each BCP session, we need a few few GBs of unused memory to simultaneously service our concurrent requests

sys.dm_tran_locks

Let us see what types of locks are being acquired.

Query:


set transaction isolation level read uncommitted;

SELECT

           lck.resource_type
         , lck.request_mode
         , lck.request_status
         , dmvS.login_name
         , lck.request_session_id
         , par.object_id
         --, object_schema_name(par.object_id) as schhemaName
         --, object_name(par.object_id) as objectName
         , [schema] = [schema].name
         , objectName = obj.name
         , COUNT(*) number_locks

FROM sys.dm_tran_locks lck with (nolock)

        INNER JOIN sys.partitions par  with (nolock)

           ON lck.resource_associated_entity_id = par.hobt_id

        INNER JOIN sys.objects obj  with (nolock)

            ON par.object_id = obj.object_id

        INNER JOIN sys.schemas [schema] with (nolock)

            ON obj.schema_id = [schema].schema_id

        INNER JOIN sys.dm_exec_sessions dmvS  with (nolock)

            ON lck.request_session_id = dmvS.session_id

where  [schema].[name] not in ('sys')

GROUP BY
              lck.resource_type
            , lck.request_mode
            , lck.request_status
            , dmvS.login_name
            , lck.request_session_id
            , par.object_id
            , [schema].name
            , obj.name 

Output:

locks_page

Explanation:

  • It looks like though we requested Table Locks, we are getting more granular locks; in this case PAGE LOCKS

Summary

Again, I will suggest that you have yourself a nice LAB to tinker with things.

Your DBA might not grant your request for “VIEW SERVER STATE“.

But, nevertheless find a way around and earn your “Come up“.

Dedicated

Dedicating Sarah McLachlan’s Shipwreck to the beautiful warriors, Jillian Johnson and Mayci Breaux, who “laid down” viewing the movie Train Wreck in New Orleans last night.

SQLServer – BCP Optimization via Running Multiple Instances in Parallel

Background

Here is another in our series “Optimizing SQL Server BCP”.

 

Code

Original Code

 

invokeBCP.cmd

The original code looks like this:


	bcp %DBTargetDB%.dbo.[tblUser] in %TEMP_SESSION%\dbo.tblUser.data -T -E -n -b%BATCH_SIZE% -S%DBTargetServer%
	bcp %DBTargetDB%.dbo.[tblInternalUser] in %TEMP_SESSION%\dbo.tblInternalUser.data -T -E -n -b%BATCH_SIZE% -S%DBTargetServer%
	bcp %DBTargetDB%.dbo.[tblUserDefaultView] in %TEMP_SESSION%\dbo.tblUserDefaultView.data -T -E -n -b%BATCH_SIZE% -S%DBTargetServer%
	bcp %DBTargetDB%.dbo.[tblUserBusinessUnitAccess] in %TEMP_SESSION%\dbo.tblUserBusinessUnitAccess.data -T -E -n -b%BATCH_SIZE% -S%DBTargetServer%
	bcp %DBTargetDB%.dbo.[tblUserLocationCountryAdmin] in %TEMP_SESSION%\dbo.tblUserLocationCountryAdmin.data -T -E -n -b%BATCH_SIZE% -S%DBTargetServer%
	bcp %DBTargetDB%.dbo.[tblLocationBusinessUnit] in %TEMP_SESSION%\dbo.tblLocationBusinessUnit.data -T -E -n -b%BATCH_SIZE% -S%DBTargetServer%

Revised Code

Here is the revised code:

 

waitOnChildTasks.cmd

@echo off
setlocal

	@REM Reset Error Level
	set ERRORLEVEL=
	set ERRORLEVEL_LOCAL=

	set SLEEP_TIME_IN_SECONDS=5
	set "EXE=bcp.exe"

	:WAITLOOP

		@REM Using TaskList get list of matching processes
		@REM Send list to FIND.exe
		@REM Find returns 0 if match found, else 1
		tasklist /FI "IMAGENAME eq %EXE%" 2>NUL | find /I /N "%EXE%">NUL

		@REM Save ERRORLEVEL into ERRORLEVEL_LOCAL
		set "ERRORLEVEL_LOCAL=%ERRORLEVEL%"

		@REM If ERRORLEVEL_LOCAL is 0, then goto RUNNING BLOCK
		if %ERRORLEVEL_LOCAL% equ 0 goto :RUNNING

		@REM Else, assume not running
		goto :NOTRUNNING

	:RUNNING

		echo "Running  - %EXE% @ Check performed @ %time%"

		@Rem sleep for SLEEP_TIME_IN_SECONDS seconds
		Echo "Sleeping for %SLEEP_TIME_IN_SECONDS% (secs)"
		sleep %SLEEP_TIME_IN_SECONDS%

		goto WAITLOOP

	:NOTRUNNING

		echo "Completed Check for determining whether %EXE% is running!"

endlocal

invokeBCP.cmd

	echo "Bulk coppying data from datafiles " @ %DATE% %TIME% " ...."

	start bcp %DBTargetDB%.dbo.[tblUser] in %TEMP_SESSION%\dbo.tblUser.data -T -E -n -b%BATCH_SIZE% -S%DBTargetServer%
	start bcp %DBTargetDB%.dbo.[tblInternalUser] in %TEMP_SESSION%\dbo.tblInternalUser.data -T -E -n -b%BATCH_SIZE% -S%DBTargetServer%
	start bcp %DBTargetDB%.dbo.[tblUserDefaultView] in %TEMP_SESSION%\dbo.tblUserDefaultView.data -T -E -n -b%BATCH_SIZE% -S%DBTargetServer%
	start bcp %DBTargetDB%.dbo.[tblUserBusinessUnitAccess] in %TEMP_SESSION%\dbo.tblUserBusinessUnitAccess.data -T -E -n -b%BATCH_SIZE% -S%DBTargetServer%
	start bcp %DBTargetDB%.dbo.[tblUserLocationCountryAdmin] in %TEMP_SESSION%\dbo.tblUserLocationCountryAdmin.data -T -E -n -b%BATCH_SIZE% -S%DBTargetServer%
	start bcp %DBTargetDB%.dbo.[tblLocationBusinessUnit] in %TEMP_SESSION%\dbo.tblLocationBusinessUnit.data -T -E -n -b%BATCH_SIZE% -S%DBTargetServer%

	Rem wait on all child processes
	waitOnChildTasks.cmd

	echo "Bulked copied data from datafiles " @ %DATE% %TIME% 

Environment

SLEEP.EXE

Depending on your version of MS Windows, ours is MS Windows 2008/R2, you might get an error stating that

‘sleep’ is not recognized as an internal or external command, operable program or batch file.

If so, please download “Windows Server 2003 Resource Kits” from http://www.microsoft.com/en-us/download/details.aspx?id=17657.

And, perform a deep extract to get you a nice version of the sleep.exe

 

SQLServer – BulkCopy – Optimization – Reviewing Impact of Block Size via the Transaction Log Monitoring – Take 2

Background

As a follow-up to SQLServer – BulkCopy – Optimization – Take 1 ( https://danieladeniji.wordpress.com/2015/06/05/sqlserver-bulkcopy-optimization-take-1/ ), let us see how much churn occurs in our Transaction Log as we vary our Bulk Copy Program ( BCP ) block sizes.

Assumption

Let us make the following assumptions

  • Database Recovery Model
    • Recovery Model – Simple or Bulk Logged
    • Recovery Model – If recovery mode is Full, all changes, row allocation and data changes, are logged
  • We can truncate the destination table
  • We can lock the destination table

Diagnostic Tool

There are a couple of barometers we can use to read Transaction Log Utilization:

  1. DBCC SQL Perf
  2. sys.dm_db_log_space_usage
  3. fn_dblog
    • Undocumented SQL Server Dynamic Management View
    • Takes two parameters; the beginning and ending LSN

DBCC SQL Perf and sys.dm_db_log_usage offers database level information.  Whereas, fn_dblog works at the detail level.

Table Definition

Table - DBLab.NumbersSource

set noexec off;
SET ANSI_NULLS ON
SET QUOTED_IDENTIFIER ON
GO

if schema_id('DBLab') is null
begin

    exec('create schema [DBLab] ')

end;

go

if object_id('[DBLab].[NumbersSource]') is not null
begin

    set noexec on;

end
go

CREATE TABLE [DBLab].[NumbersSource]
(
	[Number] [int] NOT NULL,

     CONSTRAINT [DBLab.PK_NumbersSource] PRIMARY KEY CLUSTERED
        (
	        [Number] ASC
        )

) ON [PRIMARY]

GO

set noexec off;
go


Table - DBLab.Numbers


set noexec off;
SET ANSI_NULLS ON
SET QUOTED_IDENTIFIER ON
GO

if schema_id('DBLab') is null
begin

    exec('create schema [DBLab] ')

end;

go

if object_id('[DBLab].[Numbers]') is not null
begin

    set noexec on;

end
go

CREATE TABLE [DBLab].[Numbers]
(
	[Number] [int] NOT NULL,

     CONSTRAINT [DBLab.PK_Numbers] PRIMARY KEY CLUSTERED
        (
	        [Number] ASC
        )

) ON [PRIMARY]

GO

set noexec off;
go

Populate Data


SET NOCOUNT ON
SET ANSI_NULLS ON
SET QUOTED_IDENTIFIER ON
set noexec off
go

if schema_id('dblab') is null
begin

    exec('create schema [dblab] authorization [dbo]')

end
go

set noexec off
go

if exists
(
	select 1
	from   [dblab].[NumbersSource]
)
begin

	set noexec on

end
go
--Create a simple table with 10,000 rows
--load Numbers table with 1,000,000 numbers
WITH t1 AS
(
        SELECT 0 AS n UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0)
        ,t2 AS (SELECT 0 AS n FROM t1 t1a, t1 t1b, t1 t1c, t1 t1d)
        ,t3 AS (SELECT 0 AS n FROM t2 t2a, t2 t2b, t2 t2c)
        ,tblVarNumbers AS (SELECT ROW_NUMBER() OVER(ORDER BY n) - 1 AS number FROM t3
)
INSERT INTO [dblab].[NumbersSource] WITH (TABLOCKX)
([Number])
SELECT number
FROM   tblVarNumbers
GO

set noexec off
go

LAB

To make thing more palatable, let us use a small dataset; a thousand records or so.

Data Extract


SETLOCAL

	set "DBSourceServer=.\SQLEXPRESS_V2014"
	set "TEMP_SESSION=%TEMP%"

	echo TEMP_SESSION is %TEMP_SESSION%	

	rem If folder does not exists, then create folder
	if not exist %TEMP_SESSION% mkdir %TEMP_SESSION%

	Rem Remove existing files
	if exist  %TEMP_SESSION%\*.data del %TEMP_SESSION%\*.data /Q

	bcp "select top 1000 * from DBLAB.dblab.NumbersSource" queryout %TEMP_SESSION%\dbo.NumbersSmall.data -T -n -S%DBSourceServer%

ENDLOCAL

Clean up Transaction Log

Let us clean up our transaction Log by doing the following:

  1. Set Recovery Mode to “Bulk Logged”
  2. Perform a full database backup to the NUL Device
  3. Perform a transaction log backup to the NUL Device

     alter database [DBLAB] set recovery bulk_logged;
   
     backup database [DBLAB] to disk = 'NUL';

     print '--------------------------------------------------------'
     backup log [DBLAB] to disk = 'NUL';
     print '--------------------------------------------------------'

    go

Import in blocks

Let us import records, one at a time


SETLOCAL

   set "DBSourceServer=.\SQLEXPRESS_V2014"
   set "TEMP_SESSION=%TEMP%"
   set "BLOCK_SIZE=100"

   echo TEMP_SESSION is %TEMP_SESSION%

   rem If folder does not exists, then create folder
   if not exist %TEMP_SESSION% mkdir %TEMP_SESSION%

   sqlcmd -Q"truncate table DBLAB.DBLab.Numbers" -S%DBSourceServer%

   bcp DBLAB.DBLab.Numbers in %TEMP_SESSION%\dbo.NumbersSmall.data -T -n -S%DBSourceServer% -b%BLOCK_SIZE% -h "TABLOCK, ORDER (Number ASC)"

  ENDLOCAL

Command Shell Output

bcpDataInBatchesYes [Cmhshell]

SQL Transaction Log – Aggregate Contents

Here is the code we will use to aggregate our Transaction Log


declare @spid int

--set @spid = 60

select
          tblFNDBLog.AllocUnitName
        , tblFNDBLog.Operation
        , [NumberofRecords] = count(*)
        , [Log Record Fixed Length] = sum(tblFNDBLog.[Log Record Fixed Length])
        , [Log Record Length] = sum(tblFNDBLog.[Log Record Length])
from   fn_dblog(null, null) tblFNDBLog

--where  spid = @spid

where tblFNDBLog.AllocUnitName is not null

and   tblFNDBLog.AllocUnitName not in ('Unknown Alloc Unit')

and   tblFNDBLog.AllocUnitName not like 'sys.%'

group by
          tblFNDBLog.AllocUnitName
        , tblFNDBLog.Operation

SQL Transaction Log – Aggregate Contents – 100 records per block

bcpDataInBatches1E2

Import in blocks of 1 million

Go back and clean up the Transaction Log


   alter database [DBLAB] set recovery bulk_logged;

   backup database [DBLAB] to disk = 'NUL';  

   print '--------------------------------------------------------'
   backup log [DBLAB] to disk = 'NUL'; 
   print '--------------------------------------------------------'

   go

Code – 1 million records

Let us import a million records at a time.


SETLOCAL

set "DBSourceServer=.\SQLEXPRESS_V2014"
set "TEMP_SESSION=%TEMP%"
set "BLOCK_SIZE=1000000"

echo TEMP_SESSION is %TEMP_SESSION%

rem If folder does not exists, then create folder
if not exist %TEMP_SESSION% mkdir %TEMP_SESSION%

sqlcmd -Q"truncate table DBLAB.DBLab.Numbers" -S%DBSourceServer%

bcp DBLAB.DBLab.Numbers in %TEMP_SESSION%\dbo.NumbersSmall.data -T -n -S%DBSourceServer% -b%BLOCK_SIZE% -h "TABLOCK, ORDER (Number ASC)"

ENDLOCAL

Command Shell – Output

bcpDataInBatchesNo [Cmhshell]

Aggregate Transaction Log – 1 million records

bcpDataInBatches1Million

SQL Profiler

SQL Server Profiler is another popular tool for modeling system performance.

Batch Size of 100

Here is what we capture for Batch Sizes of 100.

bulkInsertInBatches

Batch Size of 1 Million

Here is what we capture for Batch Sizes of 1 Million.

bulkInsertInBatchesNo

Quick Comparison of SQL Profiler Result

  • It is obvious there is a lot more dialog when we have a batch size of 100
  • It is not so obvious that our writes is so little.  I think it is likely that the actual BCP Writes are not reflected in the captured profiler readings

fn_dblog

More on fn_dblog.

There are quite a few things one should consider when thinking of using fn_dblog.  Inclusive are:

  1. fn_dblog accesses and reads a database log file
    • It is very expensive as it is reading an actual physical file
    • Let us assume that the file does not have indexes and so it can be taxing on the system
  2. fn_dblog accepts two arguments
    • The starting LSN Number
    • The ending LSN Number
  3. Log files are database specific
    • And, so one needs to change to the specific database
  4. fn_dblog exposes a lot of detailed information

Code

Here is a sample script for accessing a couple of columns.


--use [database-name];
declare @iNumberofRecordsMax int

set @iNumberofRecordsMax = 50;

; with cteAllocationUnit
as
(
    select
              tblSAU.data_space_id
            , tblSAU.type_desc

            , tblSAU.allocation_unit_id

            , tblSDF.physical_name

    from   sys.allocation_units tblSAU

            inner join sys.database_files tblSDF

                on tblSAU.data_space_id = tblSDF.data_space_id
)

, ctePartitionObject
as
(
    select
              tblSP.partition_id
            , tblSP.partition_number
            , tblSP.data_compression_desc

            , tblSP.object_id
            , tblSO.schema_id
            , [schemaName] = schema_name(tblSO.schema_id)
            , objectName = tblSO.name

    from   sys.partitions tblSP

            inner join sys.objects tblSO

                on tblSP.object_id = tblSO.object_id
)
select top (@iNumberofRecordsMax)
          tblFNDBLog.spid
        , tblFNDBLog.AllocUnitName
        , tblFNDBLog.Operation
        , [Log Record Fixed Length] = (tblFNDBLog.[Log Record Fixed Length])
        , [Log Record Length] = (tblFNDBLog.[Log Record Length])
        , tblFNDBLog.Context

        , cteAU.physical_name

        , tblFNDBLog.PartitionId
        , ctePO.object_id
        , ctePO.partition_number
        , ctePO.data_compression_desc

        , [schemaName] = ctePO.schemaName
        , [tableName] = ctePO.objectName

        , tblFNDBLog.[Article ID]

from   fn_dblog(null, null) tblFNDBLog

        left outer join cteAllocationUnit cteAU

            on tblFNDBLog.AllocUnitId = cteAU.allocation_unit_id

        left outer join ctePartitionObject ctePO

            on tblFNDBLog.PartitionId = ctePO.[partition_id]

where tblFNDBLog.AllocUnitName is not null

and   tblFNDBLog.AllocUnitName not in ('Unknown Alloc Unit')

and   tblFNDBLog.AllocUnitName not like 'sys.%'

Summary

Conclusively, if we use a high enough block size, we do not incur as much Transaction Log, as we would otherwise have used.

Specifically, we shed the LOP_INSERT_ROWS Operation.

The reason is salient, but it is because each time bcp starts working on a new block, it revisits the underlying table and inspects whether it is empty.

When the block size is smaller that the number of records, this consideration occurs and tests negative, every other time besides the initial consideration.

For a very high block size, the decision tree only occurs during the first block and is not repeated, as it is a singleton operation.

References

Will love to say “Life is Hard, but really?

…What can be so hard, when you Google on things and led to pasture by Sunil Agarwal and Remus Rusanu.

  1. Bulk Import Optimizations (Minimal Logging) – Sunil Agarwal
    http://blogs.msdn.com/b/sqlserverstorageengine/archive/2008/02/05/bulk-logging-optimizations-minimal-logging.aspx
  2. How to read and Interpret the SQL Server Log – Remus Rusanu – ( I write database engines by day – I use databases by night ).
    http://rusanu.com/2014/03/10/how-to-read-and-interpret-the-sql-server-log/
  3. SSIS and Minimally Logged Inserts – Jakub K
    http://jakubka.blogspot.com/2014/06/ssis-and-minimally-logged-inserts.html

SQLServer – BulkCopy – Optimization – Take 1

Background

Temporarily using Bulkcopy to copy data from production Operational DB to a development Reporting DB.

Here are some of the areas we are looking at to speed up the insertion of data into the destination DB.

Assumption

To measure the fastest response let us assume that the Reporting DB will not be serving actual requests during the refresh cycle.

In making that assumption, we can be quite aggressive about the choices we make.

That is, we can make choices and assume that users will not overly complain about empty reports while the data is being emptied and reloaded.

Areas

  1. Use minimal logging
  2. Review DB Engine processes when inserting data into destination DB
    • Review Database Engine locks requested
    • Review Database Constraints checked
    • Review Triggers processed
  3. Import Batch sizes
  4. Sorted data

Implement Steps

Minimal Database Logging

As we are only serving DB reporting requests from our destination SQL instance, we do not risk data loss.

If we are scared of losing data, we will take regular full database backups and more incessantly transaction log backup.

There are three recovery modes : FULL, SIMPLE, and “Bulk_Logged”.

  • Our Operational database is configured for FULL RECOVERY
  • Our Reporting database, we have a choice of Simple or “Bulk_Logged”
    • Simple :– Minimal Logging kept on all operations
    • Bulk_Logged :- System automatically calibrates itself and strives for balance between Performance and Recoverability

Database Engine Locks

The DB Engine exhausts quite a lot of energy and resources to request and manage database locks.  To temporary alleviate this cycle we will request a table lock when pruning old data and when importing new data.

Pruning old data

If other tables do not reference our targeted table, we are able to truncate the targeted table.

Truncate Table


   truncate table [schema].[table]

On the other hand, if other tables reference our targeted table, we have to use the delete statement.

Here is a sample code where we do the following:

  • Prune in batches; doing so allows us stabilize our resource (Memory, I/O, Log) requirements
  • Use TABLOCK to ensure that we request and take out a single lock rather than multiple individual records or page locks;
    Individual records are escalated to page locks once certain limits are reached

Prune Table


   set nocount on;

   declare @iNumberofRecordsInCycle int

   declare @iNumberofRecordsPruned int

   set @iNumberofRecordsInCycle = 50000

   set @iNumberofRecordsPruned = -1

   while (@iNumberofRecordsPruned != 0)

  begin

       delete tblD top (@iNumberofRecordsInCycle)

       from [schema-name].[table-name] tblD   WITH (TABLOCK)

      set @iNumberofRecordsPruned = @@rowcount

  end

Importing Data

When bringing in data, we use the –h operator and pass in TABLOCK as part of our argument list.


    bcp %DBTargetDB%.dbo.tblSales in %TEMP%\dbo.tblSales.data -E  -T -n  -h"TABLOCK, ORDER (SALE_ID ASC)"  -b%BATCH_SIZE% -S%DBTargetServer% 

Database Constraints

As we trust our originating system, there is little need to revalidate the data we are bringing in.

We will temporarily suspend constraints checking by issuing “ALTER TABLE [table-name] NOCHECK CONSTRAINT [constraint-name]”.

Disable Constraint Checking:


    sqlcmd -e -b -Q"EXEC sp_MSforeachtable @command1='ALTER TABLE ? NOCHECK CONSTRAINT ALL' " -S%DBTargetServer% -d%DBTargetDB% 

Resume Constraint Checking:


sqlcmd -e -b -Q"EXEC sp_MSforeachtable @command1='PRINT ''?''; ALTER TABLE ? WITH CHECK CHECK CONSTRAINT ALL' " -S%DBTargetServer% -d%DBTargetDB%

The resume “Constraint checking” code is quite interesting; for the following reasons:

  • There are two CHECKS “CHECK CHECK”
    • The first CHECK enables constraints
    • The second CHECK ensures that existing data meets the criteria
      • As all relevant existing data is checked for conformity it will likely take a while

 

If there are unaligned data, we will see errors such as :


Msg 547, Level 16, State 0, Line 1
The ALTER TABLE statement conflicted with the FOREIGN KEY constraint "FK_tblTranslate_tblToken".
The conflict occurred in database "HRDB", table "dbo.tblToken", column 'pkTokenID'.

Table Triggers

When implemented, triggers automatically fire during DML operations.

One really needs to have a good grasp of the code logic embedded within the trigger and determine which ones are needed on our destination system.

To disable all triggers, here is what we did:


     sqlcmd -e -b -Q"exec sp_msforeachtable 'ALTER TABLE ? DISABLE TRIGGER all'" -S%DBTargetServer% -d%DBTargetDB% 

Once we have copied that data over to our destination, we re-enabled them.


    sqlcmd -e -b -Q"exec sp_msforeachtable 'ALTER TABLE ? ENABLE TRIGGER all'" -S%DBTargetServer% -d%DBTargetDB% 

Bulk Insert Batch Size

Microsoft’s documentation almost emphatically states that when importing data, one should use a batch size that covers the entire incoming dataset.

I have enough bones to pick with Microsoft, than to lawyer this one.

And, so unless your Database Server, Client, Reporting DB concurrent processes demonstrable state and show otherwise, I will say use  a  big batch size.


  set DBTargetServer=SQLServerReport
  set DBTargeDB=AMEA
  

rem trying out 1 million records
set BATCH_SIZE=1000000

bcp %DBTargetDB%.dbo.[tblDevice] in %TEMP%\dbo.tblDevice.data  -T -E -n  -h"TABLOCK, ORDER (pkDeviceID ASC)"  -b%BATCH_SIZE% -S%DBTargetServer%    

Sorted Data

Before this review, I would not have thought that Sorting would matter.

But, according to Microsoft, it helps to sort the data before hand, and guide the BCP application that the incoming data is pre-sorted.

The BCP app thus skips the in-built sorting steps.

In the example below, using the –h operand we combine two hints; the erstwhile TABLOCK explicit locking request and the ORDER hint.


set DBTargetServer=SQLServerReport
set DBTargeDB=AMEA

rem trying out 1 million records
set BATCH_SIZE=1000000

bcp %DBTargetDB%.dbo.[tblDevice] in %TEMP%\dbo.tblDevice.data  -T -E -n  -h "TABLOCK, ORDER (pkDeviceID ASC)"  -b%BATCH_SIZE% -S%DBTargetServer%    

Quick Details:

  1. In our sample, we referenced our Clustering key (pkDeviceID)
  2. If your table is a heap, that is no clustered index, I will suggest that you
    1. Earnestly review your entity physical model and consider adding a clustered index
    2. I am doubtful that it will be helpful, but consider changing your extract to use “query out” and as part of your query “add an order by [col1], [col2]”

More

As always, your immutable problem will likely be hardware.

You need plentiful I/O, Memory, and Network bandwidth.

Hardware

Network

  • Often database servers are hidden deep in the castle
  • To gain access to them one has to transverse through various Network Firewalls and proxies
  • In multi-homed host set-ups, ensure that traffic has been configured to use the pre-arranged network card
  • In essence, use trace route and network monitoring tools to trace and audit traffic routes

I/O

  • I/O is all about Network Storage
  • Are you using Fiber Channel or Ethernet card & switches?
  • Just an in Network Analysis, is your I/O path sufficiently segregated?

Memory

  • Can never have too much
  • Having lots of memory lets the system and one get away with things one will otherwise not get away with

Other areas to consider are:

  1. Database table partitioning
  2. Storage Layout
    • Are your flat files local or network
    • If local, do they share same physical drive as your database files
  3. Database File Growth
    • Are your database files pre-allotted
    • Have you properly sized your auto-growth sizes

Performance Profiling

Review Database Resource Locks

Query:

SELECT
          t1.resource_type
     --   , t1.resource_database_id
        , databaseName = db_name(t1.resource_database_id)
     --   , t1.resource_associated_entity_id
        , objectName =
            CASE
                WHEN resource_type = 'OBJECT' THEN object_name(resource_associated_entity_id)
                WHEN resource_type IN ('DATABASE', 'FILE', 'METADATA') THEN 'N/A'
                WHEN resource_type IN ('KEY', 'PAGE', 'RID')  THEN
                                      (
                                        SELECT object_name(tblSP_Inner.object_id)
                                        FROM   sys.partitions tblSP_Inner
                                        WHERE  tblSP_Inner.hobt_id = t1.resource_associated_entity_id
                                      )
               ELSE 'Undefined'
            END

        , t1.request_mode

        , CASE t1.REQUEST_MODE
            WHEN 'S'        THEN 'Shared'
            WHEN 'U'        THEN 'Update'
            WHEN 'X'        THEN 'Exclusive'
            WHEN 'IS'       THEN 'Intent Shared'
            WHEN 'IU'       THEN 'Intent Update'
            WHEN 'IX'       THEN 'Intent Exclusive'
            WHEN 'SIU'      THEN 'Shared Intent Update'
            WHEN 'SIX'      THEN 'Shared Intent Exclusive'
            WHEN 'UIX'      THEN 'Update Intent Exclusive'
            WHEN 'BU'       THEN 'Bulk Update'
            WHEN 'RangeS_S' THEN 'Shared Range S'
            WHEN 'RangeS_U' THEN 'Shared Range U'
            WHEN 'RangeI_N' THEN 'Insert Range'
            WHEN 'RangeI_S' THEN 'Insert Range S'
            WHEN 'RangeI_U' THEN 'Insert Range U'
            WHEN 'RangeI_X' THEN 'Insert Range X'
            WHEN 'RangeX_S' THEN 'Exclusive range S'
            WHEN 'RangeX_U' THEN 'Exclusive range U'
            WHEN 'RangeX_X' THEN 'Exclusive range X'
            WHEN 'SCH-M'    THEN 'Schema-Modification'
            WHEN 'SCH-S'    THEN 'Schema-Stability'
            ELSE NULL
        END AS REQUEST_LOCK_MODE

        , t1.request_session_id
     --   t2.blocking_session_id

    FROM sys.dm_tran_locks as t1

            LEFT OUTER JOIN sys.dm_os_waiting_tasks as t2

                   ON t1.lock_owner_address = t2.resource_address

   where  t1.resource_database_id = db_id()

Output:

Here is what happens when the table is locked.

DatabaseLocks

Constraints – Foreign Key – Review

Here we look for Un-trusted foreign key constraints.


SELECT
        objectName = object_name(tblFK.parent_object_id)
      , objectReferenced = object_name(tblFK.referenced_object_id)
      , [name] = tblFK.name
      , isDisabled = tblFK.is_disabled
      , isNotTrusted = tblFK.is_not_trusted
      , isNotForReplication = tblFK.is_not_for_replication

FROM sys.foreign_keys tblFK
where
     (
        (tblFK.is_not_trusted =1)
     ) 

Network Connections

Script

Review Network Connections


SELECT
      c.session_id, c.net_transport, c.encrypt_option
    , c.auth_scheme, s.host_name, s.program_name
    , s.client_interface_name
    , r.command

FROM sys.dm_exec_connections AS c

        JOIN sys.dm_exec_sessions AS s

            ON c.session_id = s.session_id

        LEFT OUTER JOIN sys.dm_exec_requests r

            ON c.session_id = r.session_id
go

 

Output:

SQLCMD

Here is what our connection looks like when we use SQLCMD.exe to remove existing rows:

sqlcmd

 

Bulk Insert

Here is what our connection looks like when we use BCP.exe to import new rows:

BULKINSERT

 

 

Explanation:

  • Our transport layer is Shared memory; as we are running bcp from same host as our DB Server, we are able to use the fastest available communication protocol
  • Unfortunately, no encryption
  • Authentication Scheme is NTLM, not as good as kerberos
  • SQLCMD is using OLE/DB; whereas BCP is using ODBC

 

Follow Up Tasks

  1. Index Maintenance – Defragmentation
    • Clustered Indexes – It is unlikely that your Clustered Index will be fragmented; especially if your data file is sorted based on your Clustered Index; the default mode
    • Unclustered Indexes – Not so fast with your Unclustered Indexes; personally I use Olla’s wonderful gift, SQL Server Index and Statistics Maintenance, which is available @ https://ola.hallengren.com/sql-server-index-and-statistics-maintenance.html
  2. Index Statistics
    • Please Update your Statistics, as well

 

Summary

Monitoring your SQL Instance, System, Network, and Storage will yield new insights as to where your bottlenecks are.

Listening

Like Ron, who I met yesterday, in my Power Lunch Walk, said to me it is the little things we miss…

That’s Why I’m Here
Composers:Mark Alan Springer, Shaye Smith
Producers:Buddy Cannon, Norro Wilson
Song By:Kenny Chesney
http://www.vevo.com/watch/kenny-chesney/Thats-Why-Im-Here/USBVA0100016

SQL Server – BulkCopy (Bcp) – Error – Unexpected EOF encountered in BCP data-file

Background

Having problems copying data using bulkcopy.

Error Message

bcp dbo.tblThirdPartyRooms in C:\Users\daniel\AppDdbo.tblThirdPartyRooms_v1.data -E -T -c -b20000 -S.\SQLEXPRESS_V2014 

Starting copy... SQLState = HY000, NativeError = 0
Error = [Microsoft][SQL Native Client]Unexpected EOF encountered in BCP data-file 0 rows copied.
Network packet size (bytes): 4096 Clock Time (ms.) Total : 15

What is the problem?

Let us make sure that structurally the tables are the same.

To do so, we can try using tablediff

TableDiff

 

Syntax

Item Explanation Sample
 Binary  Application Name  C:\Program Files\Microsoft SQL Server\120\COM\tablediff.exe
 -sourceServer  Source Server  SeattleDB
 -sourcedatabase  Source Database  Hotel
 -sourceschema  Source Schema  dbo
 -sourcetable  Source Table  tblThirdPartyRoomMapping
 -f Name of SQL file that will bring destination database object inline with source object %temp%\alignSQL.sql
 -o  BCP Output file %temp% \bcpOutput.txt
 -q  Quick Row Count a) Do not list individual record differences
b) Quicker

 

 

Sample

 

SETLOCAL

	set TABLEDIFF_BIN="C:\Program Files\Microsoft SQL Server\120\COM\tablediff.exe"
	set fixSQLFile=%TEMP%\alignSQL.sql
	set BCPOutputFile=%TEMP%\BCPOutput.txt

	rem Source
	set ss=SeattleDB
	set sd=Hotel
	set ss2=dbo
	set st=tblThirdPartyRoomMapping

	rem Destination
	set ds=.\SQLEXPRESS_V2014
	set dd=Hotel
	set ds2=dbo
	set dt=tblThirdPartyRoomMapping

	rem Misc Options
	set miscOptions=-f %fixSQLFile% -o %BCPOutputFile% -q 

        rem reset ERRORLEVEL FLAG
        verify >nul
	%TABLEDIFF_BIN% -sourceServer %ss% -sourcedatabase %sd% -sourceschema %ss2% -sourcetable %st% ^
     	-destinationserver %ds% -destinationdatabase %dd% -destinationschema %ds2% -destinationtable %dt% ^
		%miscOptions% 

       rem display errorlevel
       echo ErrorLevel is %ErrorLevel%
ENDLOCAL

Output:

Here is our run output …

TableDiffOutput
Error Level:
ErrorLevel

 

And, here is what we discovered upon inspecting our BCPOutput.txt file.

BCPOutput

Quick Explanation:

  • We were able to connect to both servers
  • As we expected, we have data in the Source Database, but not in the Destination database
  • We were hoping that we will find schema differences, but none showed up

 

Compare BCP Out to BCP Input Command

Went back and reviewed our BCP Commands

BCP Output


bcp Hotel.dbo.[tblThirdPartyRoomMapping] out %TEMP%\dbo.tblThirdPartyRoomMapping_v1.data  -t"||" -T -c -S%DBSourceServer%

 

BCP Input


bcp %DBTargetDB%.dbo.tblThirdPartyRoomMapping in %TEMP%\dbo.tblThirdPartyRoomMapping_v1.data -E -T -c -b%BATCH_SIZE% -S%DBTargetServer%

Upon a more careful comparison of the BCP Export and Import statements, now noticed that we have a column delimeter in the export, but we missed it on the input.

BCP Input (Corrected)


rem please notice the -t"||" -- adding delimeter
bcp %DBTargetDB%.dbo.tblThirdPartyRoomMapping in %TEMP%\dbo.tblThirdPartyRoomMapping_v1.data  -t"||" -E -T -c -b%BATCH_SIZE% -S%DBTargetServer%

Summary

Once again, this is just a stupid error on my part.

Wrote it up as the initial hits that came up via Google, were a bit muddled.

I suppose no one makes mistakes like this.

SQL Server – BulkCopy (BCP) – Identity Columns

Background

I am copying data from Production to my personal Development box, but my SQL Join Queries are not yielding any data.

Code

Original Code


SETLOCAL

 set DBTargetServer=.\SQLEXPRESS_V2014
 set DBTargetDB=Dev
 set BATCH_SIZE=30000

 sqlcmd -e -b -Q"truncate table dbo.tblStaffingDateTime" -S%DBTargetServer% -d%DBTargetDB%

 timer.exe

   bcp %DBTargetDB%.dbo.tblStaffingDateTime in %TEMP%\dbo.tblStaffingDateTime.data -T -n -b%BATCH_SIZE% -S%DBTargetServer%

 timer.exe /S

ENDLOCAL 

 

Revised Code

Traced the problem back to ensuring that we preserve our Identity Values.

When issuing bcp, we use -E to insist that our supplied identify values are used, rather than for the system to generate its own.


SETLOCAL

 set DBTargetServer=.\SQLEXPRESS_V2014
 set DBTargetDB=Dev
 set BATCH_SIZE=30000

 sqlcmd -e -b -Q"truncate table dbo.tblStaffingDateTime" -S%DBTargetServer% -d%DBTargetDB%

 timer.exe

   rem -E Keep Identity Values ON
   bcp %DBTargetDB%.dbo.tblStaffingDateTime in %TEMP%\dbo.tblStaffingDateTime.data -T -n -E -b%BATCH_SIZE% -S%DBTargetServer%

 timer.exe /S

ENDLOCAL 

Metrics

Using Gammadyne Timer Utility ( http://www.gammadyne.com/cmdline.htm#timer ) compared how long it takes to insert 150 thousand records.

 -E Skipped :- Does not Preserve Identity Values
keepIdentityOffCleaned

 -E Added :- Preserve Identity Values

keepIdentityOnCleaned

 

Conclusively, when we preserve our Identity Column our throughout is about 15% less.

Dedicated

Will like to go another way with this.  But, got to claim my own.  And, this morning it is Duke Blue Devils,  the National Collegiate Basketball Champion 2014.

But, even then it is April.  And, yet still considered going another road, Dave Loggins’ Pieces of April.

But, will rest with Touch-Of-Pennsylvania ( https://www.youtube.com/watch?v=9_sQv2hfVxg )