SQL Server – BulkCopy ( BCP ) – Which data file?

Background

Using BCP, we are churning though quite a bit of files.

Unfortunately, the DOS batch file that I quickly stitched together missed an importantly functionality.

And, that functionality is to log the current file being processed.

Diagnostic

Microsoft

Resource Monitor

I am a big fan of Microsoft’s Resource Monitor.

Let us use it.

Preparation

We remote connect to the source computer and launched task Manager; from Task Manager accessed Resource Monitor.

Resource Monitor – Tab – Memory

Image

ResourceMonitor_Tab_CPU_ProcessesAndAssociatedHandles_20180706_0850AM.png

Explanation
  1. When we filter on the bcp.exe process, we see our data file as one of the files mentioned in the “Associated Handles” tab

 

Resource Monitor – Tab – Memory

Image

ResourceMonitor_Tab_Memory_20180706_0852AM.png

Explanation
  1. The BCP process is using about 14 MB

 

Resource Monitor – Tab – Disk

Image

ResourceMonitor_Tab_Disk_20180706_0907AM.png

Explanation
  1. sqlserver,exe is mentioned
  2. But, not our bcp.exe file

 

Summary

To get a reliable insight into which files are being accessed, please consider Microsoft’s own Resource Monitor; specifically the CPU tab.

 

 

SQL Server – BCP – BCP Hints

Background

Wanted to evaluate BCP Optimization Hints.

BCP HINTS

The ones we are going to talk about are:

Syntax

  1. TABLOCK
  2. ROWS_PER_BATCH
  3. Sorted

Sample

  1. TABLOCK
  2. ROWS_PER_BATCH
    • 1 million ( 1000000 )
  3. Sorted
    • ORDER (work_email ASC)

BCP Statement

Sample


bcp "dbo.hrdbPerson" in data\hrdbPerson.txt -T -n -h "TABLOCK, ROWS_PER_BATCH=1000000, ORDER (work_email ASC)" -S . -d hrdb

Explanation

  1. Table :-
    • dbo.hrdbPerson
  2. File :-
    • data\hrdbPerson.text
  3. Authentication
    • Active Directory ( -T)
  4. Format
    • -n ( Native Format )
  5. Hint
    • TABLOCK
    • ROWS_PER_BATCH=1000000
    • ORDER (work_email ASC)
      • Same as clustered index
  6. Sql Instance
    • -s
      • . ( local SQL Server Instance [Default ] )
  7. Database
    • -d
      • hrdb

SQL Server Profiler

Data Out

Trace Definition

Image

Explanation

  1. Events
    • Security Audit
      • Audit Login
      • Audit Logout
    • Stored Procedures
      • RPC:Completed
    • TSQL
      • SQL:BatchCompleted
      • SQL:BatchStarting
  2. Columns
    • Event Class
    • TextData
    • CPU
    • Reads
    • Writes
    • Duration
    • SPID
    • RowCounts

Trace Events

Image

Explanation

  1. Duration
    • Duration is in milliseconds
      • 6376 milliseconds
      • 6 seconds
  2. CPU
    • 1703
  3.  Reads
    • 26542
  4. Writes
    • 0
  5.  RowCounts
    • 317015 records

Data In

Trace Definition

Image

Explanation

  1. Events
    • SQL:BatchCompleted
  2. Columns
    • Event Class
    • TextData
    • CPU
    • Reads
    • Writes
    • Duration
    • SPID
    • RowCounts

Trace Events

Image

Tabulation

  1. EventClass
    • SQL:BatchCompleted
  2. TextData
    • insert bulk dbo.telecom([loc] char(2) collate SQL_Latin1_General_CP1_CI_AS,[employeeid] char(9) collate SQL_Latin1_General_CP1_CI_AS,[ssn] char(9) collate
  3. CPU
    • 20516
  4. Reads
    • 1185987
  5. Writes
    • 56870
  6. Duration
    • 21390
  7. SPID
    • 72
  8. RowCounts
    • 317015

Explanation

  1. Duration
    • Duration is in milliseconds
      • 21390 milliseconds
      • 21 seconds
  2. RowCounts
    • 317015 records

Summary

Unfortunately, there is not a lot of sample out there on how to use BCP Hints.

Reference

  1. Microsoft
    • Database Engine > Technical Reference > Command Prompt Utility Reference (Database Engine)
      • BCP Utility

SQLServer – BulkCopy – Optimization – Take 1

Background

Temporarily using Bulkcopy to copy data from production Operational DB to a development Reporting DB.

Here are some of the areas we are looking at to speed up the insertion of data into the destination DB.

Assumption

To measure the fastest response let us assume that the Reporting DB will not be serving actual requests during the refresh cycle.

In making that assumption, we can be quite aggressive about the choices we make.

That is, we can make choices and assume that users will not overly complain about empty reports while the data is being emptied and reloaded.

Areas

  1. Use minimal logging
  2. Review DB Engine processes when inserting data into destination DB
    • Review Database Engine locks requested
    • Review Database Constraints checked
    • Review Triggers processed
  3. Import Batch sizes
  4. Sorted data

Implement Steps

Minimal Database Logging

As we are only serving DB reporting requests from our destination SQL instance, we do not risk data loss.

If we are scared of losing data, we will take regular full database backups and more incessantly transaction log backup.

There are three recovery modes : FULL, SIMPLE, and “Bulk_Logged”.

  • Our Operational database is configured for FULL RECOVERY
  • Our Reporting database, we have a choice of Simple or “Bulk_Logged”
    • Simple :– Minimal Logging kept on all operations
    • Bulk_Logged :- System automatically calibrates itself and strives for balance between Performance and Recoverability

Database Engine Locks

The DB Engine exhausts quite a lot of energy and resources to request and manage database locks.  To temporary alleviate this cycle we will request a table lock when pruning old data and when importing new data.

Pruning old data

If other tables do not reference our targeted table, we are able to truncate the targeted table.

Truncate Table


   truncate table [schema].[table]

On the other hand, if other tables reference our targeted table, we have to use the delete statement.

Here is a sample code where we do the following:

  • Prune in batches; doing so allows us stabilize our resource (Memory, I/O, Log) requirements
  • Use TABLOCK to ensure that we request and take out a single lock rather than multiple individual records or page locks;
    Individual records are escalated to page locks once certain limits are reached

Prune Table


   set nocount on;

   declare @iNumberofRecordsInCycle int

   declare @iNumberofRecordsPruned int

   set @iNumberofRecordsInCycle = 50000

   set @iNumberofRecordsPruned = -1

   while (@iNumberofRecordsPruned != 0)

  begin

       delete tblD top (@iNumberofRecordsInCycle)

       from [schema-name].[table-name] tblD   WITH (TABLOCK)

      set @iNumberofRecordsPruned = @@rowcount

  end

Importing Data

When bringing in data, we use the –h operator and pass in TABLOCK as part of our argument list.


    bcp %DBTargetDB%.dbo.tblSales in %TEMP%\dbo.tblSales.data -E  -T -n  -h"TABLOCK, ORDER (SALE_ID ASC)"  -b%BATCH_SIZE% -S%DBTargetServer% 

Database Constraints

As we trust our originating system, there is little need to revalidate the data we are bringing in.

We will temporarily suspend constraints checking by issuing “ALTER TABLE [table-name] NOCHECK CONSTRAINT [constraint-name]”.

Disable Constraint Checking:


    sqlcmd -e -b -Q"EXEC sp_MSforeachtable @command1='ALTER TABLE ? NOCHECK CONSTRAINT ALL' " -S%DBTargetServer% -d%DBTargetDB% 

Resume Constraint Checking:


sqlcmd -e -b -Q"EXEC sp_MSforeachtable @command1='PRINT ''?''; ALTER TABLE ? WITH CHECK CHECK CONSTRAINT ALL' " -S%DBTargetServer% -d%DBTargetDB%

The resume “Constraint checking” code is quite interesting; for the following reasons:

  • There are two CHECKS “CHECK CHECK”
    • The first CHECK enables constraints
    • The second CHECK ensures that existing data meets the criteria
      • As all relevant existing data is checked for conformity it will likely take a while

 

If there are unaligned data, we will see errors such as :


Msg 547, Level 16, State 0, Line 1
The ALTER TABLE statement conflicted with the FOREIGN KEY constraint "FK_tblTranslate_tblToken".
The conflict occurred in database "HRDB", table "dbo.tblToken", column 'pkTokenID'.

Table Triggers

When implemented, triggers automatically fire during DML operations.

One really needs to have a good grasp of the code logic embedded within the trigger and determine which ones are needed on our destination system.

To disable all triggers, here is what we did:


     sqlcmd -e -b -Q"exec sp_msforeachtable 'ALTER TABLE ? DISABLE TRIGGER all'" -S%DBTargetServer% -d%DBTargetDB% 

Once we have copied that data over to our destination, we re-enabled them.


    sqlcmd -e -b -Q"exec sp_msforeachtable 'ALTER TABLE ? ENABLE TRIGGER all'" -S%DBTargetServer% -d%DBTargetDB% 

Bulk Insert Batch Size

Microsoft’s documentation almost emphatically states that when importing data, one should use a batch size that covers the entire incoming dataset.

I have enough bones to pick with Microsoft, than to lawyer this one.

And, so unless your Database Server, Client, Reporting DB concurrent processes demonstrable state and show otherwise, I will say use  a  big batch size.


  set DBTargetServer=SQLServerReport
  set DBTargeDB=AMEA
  

rem trying out 1 million records
set BATCH_SIZE=1000000

bcp %DBTargetDB%.dbo.[tblDevice] in %TEMP%\dbo.tblDevice.data  -T -E -n  -h"TABLOCK, ORDER (pkDeviceID ASC)"  -b%BATCH_SIZE% -S%DBTargetServer%    

Sorted Data

Before this review, I would not have thought that Sorting would matter.

But, according to Microsoft, it helps to sort the data before hand, and guide the BCP application that the incoming data is pre-sorted.

The BCP app thus skips the in-built sorting steps.

In the example below, using the –h operand we combine two hints; the erstwhile TABLOCK explicit locking request and the ORDER hint.


set DBTargetServer=SQLServerReport
set DBTargeDB=AMEA

rem trying out 1 million records
set BATCH_SIZE=1000000

bcp %DBTargetDB%.dbo.[tblDevice] in %TEMP%\dbo.tblDevice.data  -T -E -n  -h "TABLOCK, ORDER (pkDeviceID ASC)"  -b%BATCH_SIZE% -S%DBTargetServer%    

Quick Details:

  1. In our sample, we referenced our Clustering key (pkDeviceID)
  2. If your table is a heap, that is no clustered index, I will suggest that you
    1. Earnestly review your entity physical model and consider adding a clustered index
    2. I am doubtful that it will be helpful, but consider changing your extract to use “query out” and as part of your query “add an order by [col1], [col2]”

More

As always, your immutable problem will likely be hardware.

You need plentiful I/O, Memory, and Network bandwidth.

Hardware

Network

  • Often database servers are hidden deep in the castle
  • To gain access to them one has to transverse through various Network Firewalls and proxies
  • In multi-homed host set-ups, ensure that traffic has been configured to use the pre-arranged network card
  • In essence, use trace route and network monitoring tools to trace and audit traffic routes

I/O

  • I/O is all about Network Storage
  • Are you using Fiber Channel or Ethernet card & switches?
  • Just an in Network Analysis, is your I/O path sufficiently segregated?

Memory

  • Can never have too much
  • Having lots of memory lets the system and one get away with things one will otherwise not get away with

Other areas to consider are:

  1. Database table partitioning
  2. Storage Layout
    • Are your flat files local or network
    • If local, do they share same physical drive as your database files
  3. Database File Growth
    • Are your database files pre-allotted
    • Have you properly sized your auto-growth sizes

Performance Profiling

Review Database Resource Locks

Query:

SELECT
          t1.resource_type
     --   , t1.resource_database_id
        , databaseName = db_name(t1.resource_database_id)
     --   , t1.resource_associated_entity_id
        , objectName =
            CASE
                WHEN resource_type = 'OBJECT' THEN object_name(resource_associated_entity_id)
                WHEN resource_type IN ('DATABASE', 'FILE', 'METADATA') THEN 'N/A'
                WHEN resource_type IN ('KEY', 'PAGE', 'RID')  THEN
                                      (
                                        SELECT object_name(tblSP_Inner.object_id)
                                        FROM   sys.partitions tblSP_Inner
                                        WHERE  tblSP_Inner.hobt_id = t1.resource_associated_entity_id
                                      )
               ELSE 'Undefined'
            END

        , t1.request_mode

        , CASE t1.REQUEST_MODE
            WHEN 'S'        THEN 'Shared'
            WHEN 'U'        THEN 'Update'
            WHEN 'X'        THEN 'Exclusive'
            WHEN 'IS'       THEN 'Intent Shared'
            WHEN 'IU'       THEN 'Intent Update'
            WHEN 'IX'       THEN 'Intent Exclusive'
            WHEN 'SIU'      THEN 'Shared Intent Update'
            WHEN 'SIX'      THEN 'Shared Intent Exclusive'
            WHEN 'UIX'      THEN 'Update Intent Exclusive'
            WHEN 'BU'       THEN 'Bulk Update'
            WHEN 'RangeS_S' THEN 'Shared Range S'
            WHEN 'RangeS_U' THEN 'Shared Range U'
            WHEN 'RangeI_N' THEN 'Insert Range'
            WHEN 'RangeI_S' THEN 'Insert Range S'
            WHEN 'RangeI_U' THEN 'Insert Range U'
            WHEN 'RangeI_X' THEN 'Insert Range X'
            WHEN 'RangeX_S' THEN 'Exclusive range S'
            WHEN 'RangeX_U' THEN 'Exclusive range U'
            WHEN 'RangeX_X' THEN 'Exclusive range X'
            WHEN 'SCH-M'    THEN 'Schema-Modification'
            WHEN 'SCH-S'    THEN 'Schema-Stability'
            ELSE NULL
        END AS REQUEST_LOCK_MODE

        , t1.request_session_id
     --   t2.blocking_session_id

    FROM sys.dm_tran_locks as t1

            LEFT OUTER JOIN sys.dm_os_waiting_tasks as t2

                   ON t1.lock_owner_address = t2.resource_address

   where  t1.resource_database_id = db_id()

Output:

Here is what happens when the table is locked.

DatabaseLocks

Constraints – Foreign Key – Review

Here we look for Un-trusted foreign key constraints.


SELECT
        objectName = object_name(tblFK.parent_object_id)
      , objectReferenced = object_name(tblFK.referenced_object_id)
      , [name] = tblFK.name
      , isDisabled = tblFK.is_disabled
      , isNotTrusted = tblFK.is_not_trusted
      , isNotForReplication = tblFK.is_not_for_replication

FROM sys.foreign_keys tblFK
where
     (
        (tblFK.is_not_trusted =1)
     ) 

Network Connections

Script

Review Network Connections


SELECT
      c.session_id, c.net_transport, c.encrypt_option
    , c.auth_scheme, s.host_name, s.program_name
    , s.client_interface_name
    , r.command

FROM sys.dm_exec_connections AS c

        JOIN sys.dm_exec_sessions AS s

            ON c.session_id = s.session_id

        LEFT OUTER JOIN sys.dm_exec_requests r

            ON c.session_id = r.session_id
go

 

Output:

SQLCMD

Here is what our connection looks like when we use SQLCMD.exe to remove existing rows:

sqlcmd

 

Bulk Insert

Here is what our connection looks like when we use BCP.exe to import new rows:

BULKINSERT

 

 

Explanation:

  • Our transport layer is Shared memory; as we are running bcp from same host as our DB Server, we are able to use the fastest available communication protocol
  • Unfortunately, no encryption
  • Authentication Scheme is NTLM, not as good as kerberos
  • SQLCMD is using OLE/DB; whereas BCP is using ODBC

 

Follow Up Tasks

  1. Index Maintenance – Defragmentation
    • Clustered Indexes – It is unlikely that your Clustered Index will be fragmented; especially if your data file is sorted based on your Clustered Index; the default mode
    • Unclustered Indexes – Not so fast with your Unclustered Indexes; personally I use Olla’s wonderful gift, SQL Server Index and Statistics Maintenance, which is available @ https://ola.hallengren.com/sql-server-index-and-statistics-maintenance.html
  2. Index Statistics
    • Please Update your Statistics, as well

 

Summary

Monitoring your SQL Instance, System, Network, and Storage will yield new insights as to where your bottlenecks are.

Listening

Like Ron, who I met yesterday, in my Power Lunch Walk, said to me it is the little things we miss…

That’s Why I’m Here
Composers:Mark Alan Springer, Shaye Smith
Producers:Buddy Cannon, Norro Wilson
Song By:Kenny Chesney
http://www.vevo.com/watch/kenny-chesney/Thats-Why-Im-Here/USBVA0100016

Microsoft – SQL Client – Client Tools – Install

Background

Have a web server in our Lab environment that needs client tools access to our SQL Server.

By Client Tools we mean sqlcmd.exe and bcp.exe access.

Using Web Service we connect to a site and download text and xml data and write the downloaded data into files.  And, we now need to use bcp to feed the downloaded data into SQLServer and later using SQLCmd execute some Stored procedures to move data from the staging tables into the destination tables.

 

 

Utilities

Here are the SQL Server utilities we need and the role they play.

sqlcmd allows us to submit sql statements; and bcp allows us to populate and extract data.

Here is Microsoft’s synopsis:

Utility Summary
 sqlcmd The sqlcmd utility lets you enter Transact-SQL statements, system procedures, and script files at the command prompt, in Query Editor in SQLCMD mode, in a Windows script file or in an operating system (Cmd.exe) job step of a SQL Server Agent job. This utility uses ODBC to execute Transact-SQL batches.
 bcp The bcp utility bulk copies data between an instance of Microsoft SQL Server and a data file in a user-specified format. The bcp utility can be used to import large numbers of new rows into SQL Server tables or to export data out of tables into data files. Except when used with the queryout option, the utility requires no knowledge of Transact-SQL. To import data into a table, you must either use a format file created for that table or understand the structure of the table and the types of data that are valid for its columns.

 

 

 

ODBC & Native Client

For a while ODBC was Microsoft mainstay for client connectivity, then came OLE-DB, and now we are back to ODBC.

Read more here:

 

Here is where to find the drivers for each SQL Server Version:

Utility Summary Link
SQL Server 2014 Microsoft ODBC Driver 11 for SQL Server http://blogs.msdn.com/b/sqlnativeclient/archive/2013/01/23/introducing-the-new-microsoft-odbc-drivers-for-sql-server.aspx
 SQL Server 2012 Microsoft® SQL Server® 2012 Native Client http://www.microsoft.com/en-us/download/details.aspx?id=29065
SQL Server 2008-R2 Microsoft® SQL Server® 2008-R2 Native Client http://www.microsoft.com/en-us/download/details.aspx?id=16978

 

 

 

 

Command Line Tools

Here are the URLs for the command line tools

Utility Summary Link Prerequisite
SQL Server 2014 Microsoft Command Line Utilities 11 for SQL Server http://www.microsoft.com/en-us/download/details.aspx?id=36433 Microsoft ODBC Driver 11 for SQL Server
 SQL Server 2012 Microsoft® SQL Server® 2012 Command Line Utilities http://www.microsoft.com/en-us/download/details.aspx?id=29065 Microsoft® SQL Server® 2012 Native Client
SQL Server 2008-R2 Microsoft® SQL Server® 2008 R2 Command Line Utilities http://www.microsoft.com/en-us/download/details.aspx?id=16978 Microsoft® SQL Server® 2008 R2 Native Client

 

 

 

Summary

Again, simple stuff, but if you have not done it a while, it can make you head swirl.