SQLServer – BulkCopy – Optimization – Reviewing Impact of Block Size via the Transaction Log Monitoring – Take 2

Background

As a follow-up to SQLServer – BulkCopy – Optimization – Take 1 ( https://danieladeniji.wordpress.com/2015/06/05/sqlserver-bulkcopy-optimization-take-1/ ), let us see how much churn occurs in our Transaction Log as we vary our Bulk Copy Program ( BCP ) block sizes.

Assumption

Let us make the following assumptions

  • Database Recovery Model
    • Recovery Model – Simple or Bulk Logged
    • Recovery Model – If recovery mode is Full, all changes, row allocation and data changes, are logged
  • We can truncate the destination table
  • We can lock the destination table

Diagnostic Tool

There are a couple of barometers we can use to read Transaction Log Utilization:

  1. DBCC SQL Perf
  2. sys.dm_db_log_space_usage
  3. fn_dblog
    • Undocumented SQL Server Dynamic Management View
    • Takes two parameters; the beginning and ending LSN

DBCC SQL Perf and sys.dm_db_log_usage offers database level information.  Whereas, fn_dblog works at the detail level.

Table Definition

Table - DBLab.NumbersSource

set noexec off;
SET ANSI_NULLS ON
SET QUOTED_IDENTIFIER ON
GO

if schema_id('DBLab') is null
begin

    exec('create schema [DBLab] ')

end;

go

if object_id('[DBLab].[NumbersSource]') is not null
begin

    set noexec on;

end
go

CREATE TABLE [DBLab].[NumbersSource]
(
	[Number] [int] NOT NULL,

     CONSTRAINT [DBLab.PK_NumbersSource] PRIMARY KEY CLUSTERED
        (
	        [Number] ASC
        )

) ON [PRIMARY]

GO

set noexec off;
go


Table - DBLab.Numbers


set noexec off;
SET ANSI_NULLS ON
SET QUOTED_IDENTIFIER ON
GO

if schema_id('DBLab') is null
begin

    exec('create schema [DBLab] ')

end;

go

if object_id('[DBLab].[Numbers]') is not null
begin

    set noexec on;

end
go

CREATE TABLE [DBLab].[Numbers]
(
	[Number] [int] NOT NULL,

     CONSTRAINT [DBLab.PK_Numbers] PRIMARY KEY CLUSTERED
        (
	        [Number] ASC
        )

) ON [PRIMARY]

GO

set noexec off;
go

Populate Data


SET NOCOUNT ON
SET ANSI_NULLS ON
SET QUOTED_IDENTIFIER ON
set noexec off
go

if schema_id('dblab') is null
begin

    exec('create schema [dblab] authorization [dbo]')

end
go

set noexec off
go

if exists
(
	select 1
	from   [dblab].[NumbersSource]
)
begin

	set noexec on

end
go
--Create a simple table with 10,000 rows
--load Numbers table with 1,000,000 numbers
WITH t1 AS
(
        SELECT 0 AS n UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0)
        ,t2 AS (SELECT 0 AS n FROM t1 t1a, t1 t1b, t1 t1c, t1 t1d)
        ,t3 AS (SELECT 0 AS n FROM t2 t2a, t2 t2b, t2 t2c)
        ,tblVarNumbers AS (SELECT ROW_NUMBER() OVER(ORDER BY n) - 1 AS number FROM t3
)
INSERT INTO [dblab].[NumbersSource] WITH (TABLOCKX)
([Number])
SELECT number
FROM   tblVarNumbers
GO

set noexec off
go

LAB

To make thing more palatable, let us use a small dataset; a thousand records or so.

Data Extract


SETLOCAL

	set "DBSourceServer=.\SQLEXPRESS_V2014"
	set "TEMP_SESSION=%TEMP%"

	echo TEMP_SESSION is %TEMP_SESSION%	

	rem If folder does not exists, then create folder
	if not exist %TEMP_SESSION% mkdir %TEMP_SESSION%

	Rem Remove existing files
	if exist  %TEMP_SESSION%\*.data del %TEMP_SESSION%\*.data /Q

	bcp "select top 1000 * from DBLAB.dblab.NumbersSource" queryout %TEMP_SESSION%\dbo.NumbersSmall.data -T -n -S%DBSourceServer%

ENDLOCAL

Clean up Transaction Log

Let us clean up our transaction Log by doing the following:

  1. Set Recovery Mode to “Bulk Logged”
  2. Perform a full database backup to the NUL Device
  3. Perform a transaction log backup to the NUL Device

     alter database [DBLAB] set recovery bulk_logged;
   
     backup database [DBLAB] to disk = 'NUL';

     print '--------------------------------------------------------'
     backup log [DBLAB] to disk = 'NUL';
     print '--------------------------------------------------------'

    go

Import in blocks

Let us import records, one at a time


SETLOCAL

   set "DBSourceServer=.\SQLEXPRESS_V2014"
   set "TEMP_SESSION=%TEMP%"
   set "BLOCK_SIZE=100"

   echo TEMP_SESSION is %TEMP_SESSION%

   rem If folder does not exists, then create folder
   if not exist %TEMP_SESSION% mkdir %TEMP_SESSION%

   sqlcmd -Q"truncate table DBLAB.DBLab.Numbers" -S%DBSourceServer%

   bcp DBLAB.DBLab.Numbers in %TEMP_SESSION%\dbo.NumbersSmall.data -T -n -S%DBSourceServer% -b%BLOCK_SIZE% -h "TABLOCK, ORDER (Number ASC)"

  ENDLOCAL

Command Shell Output

bcpDataInBatchesYes [Cmhshell]

SQL Transaction Log – Aggregate Contents

Here is the code we will use to aggregate our Transaction Log


declare @spid int

--set @spid = 60

select
          tblFNDBLog.AllocUnitName
        , tblFNDBLog.Operation
        , [NumberofRecords] = count(*)
        , [Log Record Fixed Length] = sum(tblFNDBLog.[Log Record Fixed Length])
        , [Log Record Length] = sum(tblFNDBLog.[Log Record Length])
from   fn_dblog(null, null) tblFNDBLog

--where  spid = @spid

where tblFNDBLog.AllocUnitName is not null

and   tblFNDBLog.AllocUnitName not in ('Unknown Alloc Unit')

and   tblFNDBLog.AllocUnitName not like 'sys.%'

group by
          tblFNDBLog.AllocUnitName
        , tblFNDBLog.Operation

SQL Transaction Log – Aggregate Contents – 100 records per block

bcpDataInBatches1E2

Import in blocks of 1 million

Go back and clean up the Transaction Log


   alter database [DBLAB] set recovery bulk_logged;

   backup database [DBLAB] to disk = 'NUL';  

   print '--------------------------------------------------------'
   backup log [DBLAB] to disk = 'NUL'; 
   print '--------------------------------------------------------'

   go

Code – 1 million records

Let us import a million records at a time.


SETLOCAL

set "DBSourceServer=.\SQLEXPRESS_V2014"
set "TEMP_SESSION=%TEMP%"
set "BLOCK_SIZE=1000000"

echo TEMP_SESSION is %TEMP_SESSION%

rem If folder does not exists, then create folder
if not exist %TEMP_SESSION% mkdir %TEMP_SESSION%

sqlcmd -Q"truncate table DBLAB.DBLab.Numbers" -S%DBSourceServer%

bcp DBLAB.DBLab.Numbers in %TEMP_SESSION%\dbo.NumbersSmall.data -T -n -S%DBSourceServer% -b%BLOCK_SIZE% -h "TABLOCK, ORDER (Number ASC)"

ENDLOCAL

Command Shell – Output

bcpDataInBatchesNo [Cmhshell]

Aggregate Transaction Log – 1 million records

bcpDataInBatches1Million

SQL Profiler

SQL Server Profiler is another popular tool for modeling system performance.

Batch Size of 100

Here is what we capture for Batch Sizes of 100.

bulkInsertInBatches

Batch Size of 1 Million

Here is what we capture for Batch Sizes of 1 Million.

bulkInsertInBatchesNo

Quick Comparison of SQL Profiler Result

  • It is obvious there is a lot more dialog when we have a batch size of 100
  • It is not so obvious that our writes is so little.  I think it is likely that the actual BCP Writes are not reflected in the captured profiler readings

fn_dblog

More on fn_dblog.

There are quite a few things one should consider when thinking of using fn_dblog.  Inclusive are:

  1. fn_dblog accesses and reads a database log file
    • It is very expensive as it is reading an actual physical file
    • Let us assume that the file does not have indexes and so it can be taxing on the system
  2. fn_dblog accepts two arguments
    • The starting LSN Number
    • The ending LSN Number
  3. Log files are database specific
    • And, so one needs to change to the specific database
  4. fn_dblog exposes a lot of detailed information

Code

Here is a sample script for accessing a couple of columns.


--use [database-name];
declare @iNumberofRecordsMax int

set @iNumberofRecordsMax = 50;

; with cteAllocationUnit
as
(
    select
              tblSAU.data_space_id
            , tblSAU.type_desc

            , tblSAU.allocation_unit_id

            , tblSDF.physical_name

    from   sys.allocation_units tblSAU

            inner join sys.database_files tblSDF

                on tblSAU.data_space_id = tblSDF.data_space_id
)

, ctePartitionObject
as
(
    select
              tblSP.partition_id
            , tblSP.partition_number
            , tblSP.data_compression_desc

            , tblSP.object_id
            , tblSO.schema_id
            , [schemaName] = schema_name(tblSO.schema_id)
            , objectName = tblSO.name

    from   sys.partitions tblSP

            inner join sys.objects tblSO

                on tblSP.object_id = tblSO.object_id
)
select top (@iNumberofRecordsMax)
          tblFNDBLog.spid
        , tblFNDBLog.AllocUnitName
        , tblFNDBLog.Operation
        , [Log Record Fixed Length] = (tblFNDBLog.[Log Record Fixed Length])
        , [Log Record Length] = (tblFNDBLog.[Log Record Length])
        , tblFNDBLog.Context

        , cteAU.physical_name

        , tblFNDBLog.PartitionId
        , ctePO.object_id
        , ctePO.partition_number
        , ctePO.data_compression_desc

        , [schemaName] = ctePO.schemaName
        , [tableName] = ctePO.objectName

        , tblFNDBLog.[Article ID]

from   fn_dblog(null, null) tblFNDBLog

        left outer join cteAllocationUnit cteAU

            on tblFNDBLog.AllocUnitId = cteAU.allocation_unit_id

        left outer join ctePartitionObject ctePO

            on tblFNDBLog.PartitionId = ctePO.[partition_id]

where tblFNDBLog.AllocUnitName is not null

and   tblFNDBLog.AllocUnitName not in ('Unknown Alloc Unit')

and   tblFNDBLog.AllocUnitName not like 'sys.%'

Summary

Conclusively, if we use a high enough block size, we do not incur as much Transaction Log, as we would otherwise have used.

Specifically, we shed the LOP_INSERT_ROWS Operation.

The reason is salient, but it is because each time bcp starts working on a new block, it revisits the underlying table and inspects whether it is empty.

When the block size is smaller that the number of records, this consideration occurs and tests negative, every other time besides the initial consideration.

For a very high block size, the decision tree only occurs during the first block and is not repeated, as it is a singleton operation.

References

Will love to say “Life is Hard, but really?

…What can be so hard, when you Google on things and led to pasture by Sunil Agarwal and Remus Rusanu.

  1. Bulk Import Optimizations (Minimal Logging) – Sunil Agarwal
    http://blogs.msdn.com/b/sqlserverstorageengine/archive/2008/02/05/bulk-logging-optimizations-minimal-logging.aspx
  2. How to read and Interpret the SQL Server Log – Remus Rusanu – ( I write database engines by day – I use databases by night ).
    http://rusanu.com/2014/03/10/how-to-read-and-interpret-the-sql-server-log/
  3. SSIS and Minimally Logged Inserts – Jakub K
    http://jakubka.blogspot.com/2014/06/ssis-and-minimally-logged-inserts.html

Microsoft – SQL Server – Bulking Copying data using XML Template File

Background

Preparing to perform a bit of load test against MS SQL Server.  And, so downloaded spawner, a MS Windows desktop application that allows one to generate test data.

We were able to generate about 1 million records.

 

Generate test data

 

Spawner

Here is what the Spawner desktop looks like:

Spawner

 

 

Create Table

Create Table

use [DBLabInMemory]
go

set noexec off
go


/*
 drop table [dbo].[personTraditionalClustered]
*/
if object_id('[dbo].[personTraditionalClustered]') is not null
begin
 set noexec on
end
go



create table [dbo].[personTraditionalClustered]
(

 [firstname] [varchar](40) 
      COLLATE Latin1_General_100_BIN2 NOT NULL

 , [lastname] [varchar](40) 
      COLLATE Latin1_General_100_BIN2 NOT NULL

 , [emailAddress] [varchar](40) 
      COLLATE Latin1_General_100_BIN2 NOT NULL

 , [phoneNumber] [varchar](40) 
      COLLATE Latin1_General_100_BIN2 NOT NULL

 , [addressStreet] [varchar](100) 
      COLLATE Latin1_General_100_BIN2 NOT NULL

 , [addressCity] [varchar](40) 
      COLLATE Latin1_General_100_BIN2 NOT NULL

 , [addressState] [varchar](40) 
      COLLATE Latin1_General_100_BIN2 NOT NULL

 , [addressPostalCode] [varchar](40) 
      COLLATE Latin1_General_100_BIN2 NOT NULL

 , [dateofBirth] datetime null


 , [uniqueID] uniqueIdentifier not null
     constraint defaultPersonTraditionalUniqueID 
        default newid() 



 , constraint PK_PersonTraditional
     primary key ([uniqueID])


 , index idx_Firstname nonclustered
     ([firstname])

 , index idx_Lastname nonclustered
     ([lastname])

 , index idx_AddressCity nonclustered
     ([addressCity])


 , index idx_AddressState nonclustered
     ([addressState])

 , index idx_AddressPostalCode nonclustered
     ([addressPostalCode])

 , index idx_EmailAddress nonclustered
     ([emailAddress])

 , index idx_DateofBirth nonclustered
     ([dateofBirth])
 

)
go

set noexec off
go

Recommendation

As recommendation, I will suggest that one follows the following strategy:

  • Create table
  • Insert test data into the table
  • create template file
  • Customize template file (separators, column ordering, number of columns)
  • Export a few trial records
  • Experiment with importing data

 

Create Table

Create database table; in our case we created a table earlier in this post

Insert test data

Inserting test data is useful and we can better determine if we have the correct data types, column lengths and nullability indicator.

 

Create XML Template file

Syntax

bcp <table> format nul -c -f <format-file> -x -c -S <server_name> -d <database-name> -T -q 

 

Sample

bcp dbo.personTraditionalClustered format nul -x -c 
   -f targetFileTemplate.xml -x -S (local) -d DBLabInMemory -T -q 

This is sufficient when the incoming data file has all the columns that are in the destination table.  If not all of the columns are present or if they are not present in the same order, I will suggest one use an editor and adjust the created format file.

The error messages that we are trying to avoid is stated below:

When we have more columns in SQL Server than in the datafile

Unexpected EOF encountered in BCP data-file

 

Customize format file

Here is what our format file looks like:

<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 <RECORD>
 <FIELD ID="1" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="40" COLLATION="Latin1_General_100_BIN2"/>
 <FIELD ID="2" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="40" COLLATION="Latin1_General_100_BIN2"/>
 <FIELD ID="3" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="40" COLLATION="Latin1_General_100_BIN2"/>
 <FIELD ID="4" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="40" COLLATION="Latin1_General_100_BIN2"/>
 <FIELD ID="5" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="100" COLLATION="Latin1_General_100_BIN2"/>
 <FIELD ID="6" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="40" COLLATION="Latin1_General_100_BIN2"/>
 <FIELD ID="7" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="40" COLLATION="Latin1_General_100_BIN2"/>
 <FIELD ID="8" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="40" COLLATION="Latin1_General_100_BIN2"/>
 <FIELD ID="9" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="24"/>
 <FIELD ID="10" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="37"/>
 </RECORD>
 <ROW>
 <COLUMN SOURCE="1" NAME="firstname" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="2" NAME="lastname" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="3" NAME="emailAddress" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="4" NAME="phoneNumber" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="5" NAME="addressStreet" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="6" NAME="addressCity" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="7" NAME="addressState" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="8" NAME="addressPostalCode" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="9" NAME="dateofBirth" xsi:type="SQLDATETIME"/>
 <COLUMN SOURCE="10" NAME="uniqueID" xsi:type="SQLUNIQUEID"/>
 </ROW>
</BCPFORMAT>


 

There are a couple of changes we will be making to our template file

  • Because our input data file does not have the the uniqueID column in it, we will remove the corresponding lines from the created format file.
  • We will also change our Terminator from \t (tab) to ||

Here is what our new format file:

<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 <RECORD>
 <FIELD ID="1" xsi:type="CharTerm" TERMINATOR="||" MAX_LENGTH="40" COLLATION="Latin1_General_100_BIN2"/>
 <FIELD ID="2" xsi:type="CharTerm" TERMINATOR="||" MAX_LENGTH="40" COLLATION="Latin1_General_100_BIN2"/>
 <FIELD ID="3" xsi:type="CharTerm" TERMINATOR="||" MAX_LENGTH="40" COLLATION="Latin1_General_100_BIN2"/>
 <FIELD ID="4" xsi:type="CharTerm" TERMINATOR="||" MAX_LENGTH="40" COLLATION="Latin1_General_100_BIN2"/>
 <FIELD ID="5" xsi:type="CharTerm" TERMINATOR="||" MAX_LENGTH="100" COLLATION="Latin1_General_100_BIN2"/>
 <FIELD ID="6" xsi:type="CharTerm" TERMINATOR="||" MAX_LENGTH="40" COLLATION="Latin1_General_100_BIN2"/>
 <FIELD ID="7" xsi:type="CharTerm" TERMINATOR="||" MAX_LENGTH="40" COLLATION="Latin1_General_100_BIN2"/>
 <FIELD ID="8" xsi:type="CharTerm" TERMINATOR="||" MAX_LENGTH="40" COLLATION="Latin1_General_100_BIN2"/>
 <FIELD ID="9" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="24"/>
 </RECORD>
 <ROW>
 <COLUMN SOURCE="1" NAME="firstname" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="2" NAME="lastname" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="3" NAME="emailAddress" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="4" NAME="phoneNumber" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="5" NAME="addressStreet" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="6" NAME="addressCity" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="7" NAME="addressState" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="8" NAME="addressPostalCode" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="9" NAME="dateofBirth" xsi:type="SQLDATETIME"/>
 </ROW>
</BCPFORMAT>


 

Export a few trial records

Let us export out a few records

bcp "select top 10 [firstname],[lastname],[emailAddress],[phoneNumber],[addressStreet],[addressCity],[addressState],[addressPostalCode],[dateofBirth] from [dbo].[personTraditionalClustered] with (nolock)" queryout extractedData_v4.txt -c -S (local) -d DBLabInMemory -T

 

In the example above there are a couple of things one should note:

  • We are only taking out some data; just the the first 10 records
  • We have also restricted the columns to the ones we have in our input file

 

Insert test data

Let us perform some trial runs

sqlcmd -S (local) -d DBLabInMemory -E -Q "truncate table [dbo].[personTraditionalClustered]"

bcp [dbo].[personTraditionalClustered] in datafile_v4.txt -f fileTemplate_v4.xml -S (local) -d DBLabInMemory -e errfile.log -T -F 2 -L 4

 

  • In our sample data file, we have added a header line and so we need -F 2 to indicate that raw data starts at line 2
  • We have also added -L 4 to indicate import should end at line 4
  • And, so we are in essence only importing 3 lines

 

Actual Run

A couple of changes we will be making in the actual run

  • Added -b to indicate that we will like to change our block size from the default of 1000 to 50000

 

sqlcmd -S (local) -d DBLabInMemory -E -Q "truncate table [dbo].[personTraditionalClustered]"

bcp [dbo].[personTraditionalClustered] in datafile_v4.txt -f fileTemplate_v4.xml -S (local) -d DBLabInMemory -e errfile.log -T -F 2 -L 4

 

Output:

Starting copy...
50000 rows sent to SQL Server. Total sent: 50000
50000 rows sent to SQL Server. Total sent: 100000
50000 rows sent to SQL Server. Total sent: 150000
50000 rows sent to SQL Server. Total sent: 200000
50000 rows sent to SQL Server. Total sent: 250000
50000 rows sent to SQL Server. Total sent: 300000
50000 rows sent to SQL Server. Total sent: 350000
50000 rows sent to SQL Server. Total sent: 400000
50000 rows sent to SQL Server. Total sent: 450000
50000 rows sent to SQL Server. Total sent: 500000
50000 rows sent to SQL Server. Total sent: 550000
50000 rows sent to SQL Server. Total sent: 600000
50000 rows sent to SQL Server. Total sent: 650000
50000 rows sent to SQL Server. Total sent: 700000
50000 rows sent to SQL Server. Total sent: 750000
50000 rows sent to SQL Server. Total sent: 800000
50000 rows sent to SQL Server. Total sent: 850000
50000 rows sent to SQL Server. Total sent: 900000
50000 rows sent to SQL Server. Total sent: 950000
50000 rows sent to SQL Server. Total sent: 1000000
50000 rows sent to SQL Server. Total sent: 1050000
50000 rows sent to SQL Server. Total sent: 1100000
50000 rows sent to SQL Server. Total sent: 1150000

1160001 rows copied.
Network packet size (bytes): 4096
Clock Time (ms.) Total : 160759 Average : (7215.78 rows per sec.)

 

Version Control

Version Control – Github

Files posted to github @ https://github.com/DanielAdeniji/MSSSQLBCPFormatFileXML

 

Had a little hiccup as our data file exceed the 100 MB limit



[dadeniji@adelia XML]$ git push origin master
Counting objects: 10, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (10/10), done.
Writing objects: 100% (10/10), 52.08 MiB | 692 KiB/s, done.
Total 10 (delta 1), reused 0 (delta 0)
remote: error: GH001: Large files detected.
remote: error: Trace: b9a986aaae560141bf94f2213cdb5c18
remote: error: See http://git.io/iEPt8g for more information.
remote: error: File datagenSimple.txt is 120.97 MB; this exceeds GitHub's file size limit of 100 MB
To git@github.com:DanielAdeniji/MSSSQLBCPFormatFileXML.git
! [remote rejected] master -> master (pre-receive hook declined)
error: failed to push some refs to 'git@github.com:DanielAdeniji/MSSSQLBCPFormatFileXML.git'


 

To address tarred file and removed original file from our local git repository.

Tarred file

tar -zcvf datagenSimple.tar.gz datagenSimple.txt

Remove file

git rm --cached datagenSimple.txt

 

Added file (datagenSimple.txt) to .gitignore

vi .gitignore

 

Added file (datagenSimple.tar.gz) to local git

git add datagenSimple.tar.gz

 

Commit file

git commit

 

Output:

[dadeniji@adelia XML]$ git commit -a -m 'First commit'
[master ca1d17a] First commit
2 files changed, 0 insertions(+), 1160002 deletions(-)
create mode 100644 datagenSimple.tar.gz
delete mode 100644 datagenSimple.txt

 

Push to remove repository

git push origin master

 

But, it was still trying to add the removed file

 

remove file from history

Used David Underhill’s (git-remove-history.sh) file to permanently remove the oversized file.  BTW, the shell script is available @ http://dound.com/2009/04/git-forever-remove-files-or-folders-from-history/

 

sh /tmp/git-remove-history.sh datagenSimple.txt

Output:

[dadeniji@adelia XML]$ sh /tmp/git-remove-history.sh datagenSimple.txt
Rewrite 32420099006159bf2413542475cdc556167503d7 (1/2)rm 'datagenSimple.txt'
Rewrite ca1d17a66e273b7ceca218ff2e24392913b36d22 (2/2)
Ref 'refs/heads/master' was rewritten
Counting objects: 16, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (16/16), done.
Writing objects: 100% (16/16), done.
Total 16 (delta 5), reused 0 (delta 0)

 

Push to git

Syntax:


[dadeniji@adelia XML]$ git push origin master

Output:


[dadeniji@adelia XML]$ git push origin master
Counting objects: 12, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (10/10), done.
Writing objects: 100% (12/12), 52.04 MiB | 691 KiB/s, done.
Total 12 (delta 2), reused 11 (delta 2)
remote: warning: GH001: Large files detected.
remote: warning: See http://git.io/iEPt8g for more information.
remote: warning: File datagenSimple.tar.gz is 52.02 MB; 
this is larger than GitHub's recommended maximum file size of 50 MB
To git@github.com:DanielAdeniji/MSSSQLBCPFormatFileXML.git
* [new branch] master -> master

 

References

BCP Reference

BCP Sample

Git Commands

 

SQL Server – v2000 – BCP – Error – FUNCTION SEQUENCE ERROR

Introduction

Trying to get data out of SQL Server and hoping to use the quickest path.  One traditional tool is bcp.

Code – Transact SQL

Code – Transact SQL – dbProduct

Create database dbProduct and create a lone table dbo.product.

set noexec off
go

use [master]
go

if db_id('dbproduct') is null
begin

	print 'Creating DB dbproduct ...' 	

	exec('create database [dbproduct]')
	exec('alter database [dbproduct] set recovery simple');

	print 'Created DB dbproduct' 	

end
go

use [dbproduct]
go

/*
	drop table [dbo].[product]
*/
if object_id('dbo.product') is not null
begin

	set noexec on

end
go

/*
	drop table dbo.product;
*/

create table dbo.product
(

 	  [id] bigint not null identity(1,1)
	, [productName] varchar(600) not null
	, [size] varchar(80) null
	, [price] money not null

	, constraint PK_product primary key
		([id])

	, constraint Unique_ProductName unique
		([productName])

	, [addedBy] sysname not null
		constraint defaultproductAddedBy default SYSTEM_USER

	, [addedOn] datetime not null
		constraint defaultproductAddedOn default getdate()

)

set noexec off
go

Code – Transact SQL – dbSales

Create database dbSales and create a lone table dbo.order and dbo.orderDetail.


set noexec off
go

use [master]
go

if db_id('dbSales') is null
begin

	print 'Creating DB dbSales ...' 	

	exec('create database [dbSales]')
	exec('alter database [dbSales] set recovery simple');

	print 'Created DB dbSales' 	

end
go

use [dbSales]
go

if object_id('dbo.[order]') is not null
begin

	set noexec on

end
go

/*

	drop table dbo.[orderDetail];
	drop table dbo.[order];

*/

/*

	exec sp_help 'dbo.Order'

	exec sp_help 'dbo.OrderDetail'
*/

create table dbo.[order]
(
 	  [id] bigint not null identity(1,1) 

	, constraint PK_Order primary key
		([id])

	, [addedBy] sysname not null
		constraint defaultOrderAddedBy default SYSTEM_USER

	, [addedOn] datetime not null
		constraint defaultOrderAddedOn default getdate()

)
go

set noexec off
go

if object_id('dbo.[orderDetail]') is not null
begin

	set noexec on

end
go

create table dbo.[orderDetail]
(
 	   [OrderID] bigint
 	,  [sequenceID] int not null

	,  [productID] int not null

	,  [NumberofItems] int not null

		  constraint defaultOrderDetailNumbreofItems default 1

	,  [addedBy] sysname not null

		  constraint defaultOrderDetailAddedBy default SYSTEM_USER

	, [addedOn] datetime not null

		  constraint defaultOrderDetailAddedOn default getdate()

	, constraint PK_OrderDetail primary key
		(
			  [OrderID]
			, [SequenceID]
               )

	, constraint FK_OrderDetail foreign key
		(
			  [OrderID]
                )
	        references dbo.[Order]
	        (
			  [id]
                )

)
go

set noexec off
go

Code – Transact SQL – dbSales – dbo.ufn_computedCost

Create Scaler function dbo.ufn_computedCost.


use [dbSales]
go

if object_id('dbo.ufn_computedCost') is not null
begin

	set noexec on

end
go

/*

	drop function dbo.ufn_computedCost

*/

create function dbo.ufn_computedCost
(
	  @productID   int
	, @NumberofItems  int
)
returns money
as
begin

	declare @cost money

	select @cost = 
			@NumberofItems * tblProduct.[price]
	from   [dbProduct].dbo.product tblProduct
	where  tblProduct.[id] = @productID

	return (@cost)

end
go

set noexec off
go

Code – Transact SQL – dbSales – dbo.OrderDetail.cost

On the dbo.OrderDetail table, create a new column (cost) and bind it to the user function dbo.ufn_computedCost.


use [dbSales]
go

/*
	alter table dbo.orderDetail
	  drop column [cost]
*/

if not exists
	(

	    select name
	    from   syscolumns tblColumn
	    where  tblColumn.id = object_id('dbo.orderDetail')
	    and    tblColumn.name = 'cost'

	)
begin

	print 'Adding new column dbo.orderDetail - cost ... '

		alter table [dbo].[orderDetail]
		   add [cost]
			as dbo.ufn_computedCost(
                                                    [productID]
                                                  , [NumberofItems]
                                               )

	print 'Added new column dbo.orderDetail - cost'

end
go

Populate Tables

Let us populate the tables

Populate tables – [dbProduct].dbo.Product


set nocount on;
go

delete from [dbproduct].dbo.product;
go

set identity_insert [dbproduct].dbo.product on
go

	insert into [dbproduct].dbo.product
	([id], [productName], [size], [price])
	values(1, 'Frosted Flakes Cereal', '15 oz',2.98)

	insert into [dbproduct].dbo.product
	([id],[productName], [size], [price])
	values(2, 'Kellogg''s Corn Flakes Cereal', '24 oz', 4.67)

	insert into [dbproduct].dbo.product
	([id],[productName],[size], [price])
	values(3, 'Milk - Vitamin D','1 Gallon', 3.65)

	insert into [dbproduct].dbo.product
	([id],[productName],[size], [price])
	values(31, 'Chobani Non-Fat Greek Yogurt','15/6 oz', 15.99)

	insert into [dbproduct].dbo.product
	([id],[productName],[size], [price])
	values(41, 'Donsuemor Madeleines French Cakes','28 oz', 8.13)

	insert into [dbproduct].dbo.product
	([id],[productName],[size], [price])
	values(51, 'Turon','15 pieces', 11.85)

        insert into [dbProduct].[dbo].product
	([id],[productName],[size], [price])
	values(52, 'empanada','7 pieces', 7.00)

go

set identity_insert [dbproduct].dbo.product off
go

Populate tables – [dbSales].dbo.Order and [dbSales].dbo.OrderDetail


set nocount on;
go

delete from [dbSales].dbo.[orderDetail];
delete from [dbSales].dbo.[order];
go

set identity_insert [dbSales].dbo.[order] on

	insert into [dbSales].dbo.[order]
	([id])
	values (1)

	insert into [dbSales].dbo.[orderDetail]
	(
		[orderID], [sequenceID], [productID]
	)
	values
	(
		1, 1, 1
	)

	insert into [dbSales].dbo.[orderDetail]
	(
		[orderID], [sequenceID], [productID]
	)
	values
	(
		1, 2, 3
	)

        --------------------------------------------------------
	insert into [dbSales].dbo.[order]
	([id])
	values (2)

	insert into [dbSales].dbo.[orderDetail]
	(
		[orderID], [sequenceID], [productID]
	)
	values
	(
		2, 1, 1
	)

       -----------------------------------------------------------

	insert into [dbSales].dbo.[order]
	([id])
	values (3)

	insert into [dbSales].dbo.[orderDetail]
	(
		[orderID], [sequenceID], [productID], [NumberofItems]
	)
	values
	(
		3, 1, 51,1
	)

	insert into [dbSales].dbo.[orderDetail]
	(
		[orderID], [sequenceID], [productID],[NumberofItems]
	)
	values
	(
		3, 2, 52,2
	)

go

set identity_insert [dbSales].dbo.[order] off
go

Bcp Data

Bcp Data – OrderDetail

Fetch data out of the dbo.OrderDetail table.

Script:

"C:\Program Files (x86)\Microsoft SQL Server\80\Tools\Binn\bcp"  "select * from [dbSales].dbo.[orderDetail]" queryout orderDetail.csv -c -S DBLAB\MSSQL2000  -T

Output:

bcp-good

Bcp Data – Failed

Let us imagine that a few weeks later or we are are in the process of moving to a new system and we try to bcp again, but upon issuing an identical BCP Command, we are now getting an error.

Bcp Data – OrderDetail

Fetch data out of the dbo.OrderDetail table.

Script:


"C:\Program Files (x86)\Microsoft SQL Server\80\Tools\Binn\bcp"  "select * from [dbSales].dbo.[orderDetail]" queryout orderDetail.csv -c -S DBLAB\MSSQL2000  -T

 

Error:

Error – Textual

SQLState = S1010, NativeError = 0
Error = [Microsoft][ODBC SQL Server Driver]Function sequence error

Error – Image

bcp-failed

Error Diagnosis

An error that reads “Function sequence error” seems intimidating. But, no problem.

We issued a select statement against the target table for our bcp:

select * from dbsales.dbo.orderDetail

And, got back an helpful error message:

Server: Msg 208, Level 16, State 1, Procedure ufn_computedCost, Line 24
Invalid object name 'dbProduct.dbo.product'.

Fix

Our fix was to move the dbProduct database to our new system.

In Microsoft SQL Server it is sometimes difficult to find cross database dependencies;  partly because Transact SQL does not allow us to specify those relationships and constraints.

Using the code pasted below, we tried relating the dbo.OrderDetail.productID column in the dbSales database to the id column in the dbProduct.dbo.product table.

     use [dbSales]
     go

     alter table dbo.OrderDetail
	   add constraint FK_OrderDetail_Product foreign key
       (
   	   [productID]
       )
       references [dbProduct].dbo.[product]
       (
    	  [id]
       )

But, you will get an error stating “cross-database foreign key references are not supported.



Server: Msg 1763, Level 16, State 1, Line 2
Cross-database foreign key references are not supported. 
Foreign key 'dbProduct.dbo.product'.
Server: Msg 1750, Level 16, State 1, Line 2
Could not create constraint. See previous errors.

Please keep in mind that if you try to reference an entirely missing database object, you will get a different error.

In the example below, we tried referencing a missing object (dbProduct.dbo.productMissing):


     use [dbSales]
     go

     alter table dbo.OrderDetail
	   add constraint FK_OrderDetail_Product foreign key
       (
   	   [productID]
       )
       references [dbProduct].dbo.[productMissing]
       (
    	  [id]
       )

And, get an error stating “references invalid table ….”


Server: Msg 1767, Level 16, State 1, Line 1
Foreign key 'FK_OrderDetail_Product' references invalid table 'dbProduct.dbo.product1'.
Server: Msg 1750, Level 16, State 1, Line 1
Could not create constraint. See previous errors.

More research might reveal different root causes:

  • Someone might have renamed the dbProduct.dbo.product table
  • In our case, the dbProduct database had not yet been moved to our new system

Conclusion

Yes,  error messages are sometimes unpleasant.  And, quick Google searches might find problems that a bit worse and laiden with unrelated riddles.

But, steady and progressive remediation steps wins the day!

References