Guillaume Fourrat – VSS backup of AlwaysOn Secondaries

Background

Catching up a bit on other people’s posting and wanted to share this one from Guillaume Fourrat.

Guillaume titles it “VSS backup of AlwaysOn Secondaries” and it is here.

 

VSS backup of AlwaysOn Secondaries

 

Hi Everyone,

Today I’m going to highlight one of the changes brought by SQL Server 2012 SP2, which is the way we handle VSS Backup requests on AlwaysOn Secondary Databases.

Until now, any request for a FULL database backup (VSS_BT_FULL) thru VSS against a DB that is an AlwaysOn secondary was failing by design. Our VSS Writer SQLWriter would return FAILED_AT_PREPARE_SNAPSHOT (0x800423f4 – VSS_E_WRITERERROR_NONRETRYABLE).

A copy-only VSS backup (VSS_BT_COPY) would work.

The rationale being the following: a FULL backup is actually updating the target DB (reset of differential bitmap mainly), which is not possible when the DB is read only. Furthermore, because of the failover possibilities introduced by AlwaysOn, the favored option was to use Native SQL Servers backup that could rely on SQL Server variable backup location (http://msdn.microsoft.com/en-us/library/hh245119.aspx) if needed, and be ‘alwayson –aware’.

So that could be the end of the story: against an AlwaysOn Secondary DB, either use Copy_only VSS backups or use native backups.

But of course that wouldn’t make for a very interesting blog post…

Enters HyperV…

Consider the following scenario:

Large Windows HyperV Servers, hosting many HyperV virtual Servers, some of them SQL Servers in Always On architecture.

In short: a Private Cloud.

In this context, the IT usually takes care of the infrastructure at host level, and lets users deal with whatever happens within the VMs. One of the key tasks of IT is to manage backups (eg. for disaster recovery at datacenter level, or to provide restore of single VMs).

And the mainstream way to do that is to take VSS backups of the Host Disk Volumes. Microsoft System Center DPM will do exactly that.

But VSS backups are all about taking backups that are consistent: in ‘standalone’ SQL Server context you may already know all the logic SQLWriter implements to make sure that IO against the Databases that are backed up are frozen during the snapshot operation. So, back to our HyperV context, collecting a point-in-time image of a VHD without bothering with what happens within the VM would be defeating that very purpose right?

So what happens is the following: the VSS backup is propagated to Guest VMs thru HyperV integration services. That propagation hardcodes the backup type to VSS_BT_FULL, and therefore all guest VMs are initiating a VSS backup/snapshot in their own context. The purpose is to make sure that all applications are quiesced within all running VMs at the time we take the snapshot at the host level. This will enable us to generate a consistent backup of running VMs.

But let’s now put this in the context where one of the VMs is running an AlwaysOn secondary DB: you guessed it, it’s not going to work:

clip_image002

The important thing to know here is that the error returned by SQLWriter in VM3 will actually bubble up all the way to the initial VSS backup command at Host level, and will make it fail as a whole.

So we ended up in a situation where the IT infrastructure folks would see their Host backups failing from time to time for an unknown reason, depending on whether one or more of the VM present on the Host Drive being backup up had a secondary AlwaysOn DB! It could be that the AlwaysOn AG spans different HyperV hosts and therefore that the presence of a Secondary DB on a given Host is not something static over time.

Because of the complexity of the whole call chain, and because infrastructure IT operators may not have any visibility (or understanding) of the VM content, you can imagine what kind of troubleshooting challenges this would offer… And even when the situation is understood, well, what do we do? If host level backup must be somehow manually synchronized to the applicative state of Guest VMs, the Private Cloud scenario becomes quite more complicated all of a sudden.

This is the reason why SQL Server 2012 SP2 ships a code change for SQLWriter that will implement the following:

clip_image004

As you can see, SQLWriter now detects this specific situation and changes the backup type to VSS_BT_COPYONLY. This will only happen for VSS_BT_FULL backups against AlwaysOn secondary Dbs. VSS_BT_FULL backups against primary DB will happen without change.

In this case, the VSS backup will now successfully complete in VM3 and the host-level backup success state will no longer be tied to guest VM’s AlwaysOn activity. Private Cloud scenario unlocked!

Important note: the fact that VSS backup of AlwaysOn secondaries now works does not make it the preferred solution to backup SQL Server AlwaysOn architectures. The main purpose of the SP2 change is to avoid a situation where a single SQL Server in a VM fails a complete host-level backup operation that encompassing dozens of VMs.

The resulting backup for the VM hosting SQL should be considered a Disaster Recovery one, where AlwaysOn will be removed at once at restore time, not as a way to rebuild a subset of the nodes for an AlwaysOn Availability group. And for regular databases within the VM, that backup is as good any regular VSS one.

Finally, SQL Server 2012 SP2 only contains a partial fix for this issue. Servers running case-sensitive sort orders will require SQL Server 2012 SP2 Cumulative Update 2.

HTH,

Guillaume Fourrat
SQL Server Escalation Engineer
Microsoft France

Commentary

I really like the post for the reasons listed below:

  1. It is not something about a stupid query that can not be written a different way as it is embedded in the code or generated via an ORM Tool ( Entity Framework, Hibernate )
    • That is, it can be changed and optimized
    • Furthermore, we are not handicapped by what the developer thinks is fanciful and should be upheld by all means
  2. It affects a whole lot of parties
    • 3rd Party tools vendors that utilize VSS to backup SQL Server databases
    • Corporations that do not know or care what VSS is or does, they do not have a DBA and could care less that sql server datafiles can not just be backed up like other files
  3. The Application logs an error
    • The application actually logs the error and it is the hope that support staff will capture & review the errors periodically
  4. MSFT can and has provided a fix
    • MSFT has provided a fix and that means they are aware and tracking it
    • The patch will be brought into the main line code and all future versions will benefit going forward
    • If it breaks in subsequent version, we have every right to open a ticket thru Corporate Support or as a Connect Item
  5. And, above all else this is an actual technical problem
    • It is not a process problem
    • It is not a people problem
    • It is not one handicapped by long institutional or inter-department slight or grievance

SQL Server – Messages – Backup

Background

Reviewing SQL Server Agent jobs and came upon some messages that I wanted to be sure do not pose any risk.

 

SQL Server Agent

Backup Jobs

backupdatabaselogbrushed-up

 

Messages

The specific messages that we will be digging more into are:

  1. 3014
  2. 3211

 

What do the messages mean?

Code


set nocount on;

declare @language		   sysname
declare @languageID		   int	

declare @languageIDDefault int
declare @languageIDEnglish int

declare @sqlMessageID_3014 int
declare @sqlMessageID_3211 int

set @sqlMessageID_3014 = 3014
set @sqlMessageID_3211 = 3211

set @languageIDDefault = 0 
set @languageIDEnglish = 1033

set @language        = @@language

/*
	Message
		Executed as user: LABDOMAIN\sqlAgent. ...rcent processed. [SQLSTATE 01000] (Message 3211)  
		19 percent processed. [SQLSTATE 01000...  The step succeeded.
*/
if (@language is not null)
begin

	select @languageID	= tblSL.[langid]
	from   sys.syslanguages tbLSL
	where  tblSL.[name] = @language

end

if (
		(@languageID is null)
	)
begin

	set @languageID = @languageIDDefault

end

print 'Default Language ' + isNull(@language, '')
print 'Language ID ' + cast(isNull(@languageID, -1) as sysname)

select top 100 *
from   sys.messages tbLSM
where  (
			   (tblSM.language_id = @languageID)
			or (tblSM.language_id = @languageIDEnglish)
	   )		
and    tblSM.message_id	in
		 (
			  @sqlMessageID_3014		
			, @sqlMessageID_3211
		 )


 

Explanation

What we did is query the master.sys.messages table

We filtered on

  1. The messages ID
    • 3014
    • 3211
  2. And, passed along the language ID

 

Output

output-grid

 

Explanation

Message ID Severity  Interpretation
 3014 10  At the completion of backups, we are told how many pages were backed, how long it took, and throughput
 3211  10  When the stats option is passed, backup and restore commands render percentile status message on the progress

 

BTW, Severity 10 is an informational message as demonstrated here.

Database Engine Error Severities

levelsofseverity

 

Summary

If bothered by the 3211 message please remove the stats operand from the backup command.

Using the 3014 message, backup completions are logged to both the console and SQL Server’s errorlog.

Please enable the 3605 Trace Flag to suspend logging of backup completion to the errorlog.

It is not clear if one is able to dissuade logging to the console.

Symantec/Veritas – Backup Exec – Opened Files Options

Background

As a working DBA, backing up files is an area that is close to me.

Our Corporate Backup Platform is Symantec\Veritas, let us see how it handles live database files.

 

Job

Configuration

“Use Advanced Open File Option” – Unchecked

When creating a job, we can either accept the default option of not taking any special precaution with Opened Files.

In the screenshot below, we left “Use Advanced Open File Option” unchecked.

AdvancedOpenFileOption-Disabled

Report

JobHistory-FilesIsInUse

Explanation

Our SQL database files are marked “Files is in use“.

 

“Use Advanced Open File Option” – Checked

On the other hand, when we check the “Advanced Open File Option” option, we are able to decide how we want to handle Opened Files.

The available options are:

  1. Automatic – Allow VSS to select the snapshot provider
  2. System – Use Microsoft Software Shadow Copy Provider
  3. Software – Veritas Storage Foundation for Windows
  4. Hardware – Use technology provided by hardware manufacturer

 

We took Option 2 – “System use Microsoft Software Shadow Copy Provider” for a quick drive.

 

Report

BackupDetailInformation

Explanation

 

We no longer have skipped files.

 

Configuration

Exclusion List

We are handling our SQL Server Backups through the built-in SQL Server tools. In essence, we are not quite interested in backing up the live opened SQL Server data and log files.

We will thus exclude those through the Exclusion List option.

The steps are documented are.

And, they are

  1. Use the menu options ” Edit”\”Manage Selection List”
  2. Choose the “Excludes” item

 

Manage Selection List

ManageSelectionLists

 

Exclude Properties ( Before )

Here is the empty list

ExcludeProperties-EmptyList

We click the Insert button

Include/Exclude Selection

The choices we make in the “Include/Exclude Selections”  is detailed below:

  1. Resource type :- General
  2. Path :- \\*\*
    • The first * means all computers\hosts
    • The second * means all drives on the designated hosts
  3. File
    • *.mdf
  4. Selection criteria
    • Include subdirectories

 

ExcludeProperties-mdf

Clicked OK button and repeat same steps replacing *.mdf with *.ndf and *.ldf.

 

Exclude Properties ( After )

Here is our completed list

ExcludeProperties-CompletedList

 

Summary

Again, Backup Exec satisfactory and gracefully handles in-use files.

There are at minimum two avenues for handling them.

The more granular option is to handle them at individual job configuration levels.  When handled at the job level, we are able to utilize Microsoft’s Shadow Copy Option or Veritas Storage Foundation.

Also, at the job level, we are able to target specific folders or/and file types.

At a global level, we can also just exclude specific file extensions.

SQL Server – Backup – Error – The specified network name is no longer available.)’ while attempting ‘FlushFileBuffers’

Background

Attempting SQL Server backup to our Network File Server, but failing with an error message stating “The operating system returned the error ’64(The specified network name is no longer available.)’ while attempting ‘FlushFileBuffers’“.

Error Message


The operating system returned the error '64(The specified network name is no longer available.)' while attempting 'FlushFileBuffers' on '\\labfileserver\sqlbackup\hrdb_backup_2015_08_18_161953_6973966.bak'.  BACKUP DATABASE is terminating abnormally.

Resolution

Coverage

The resolution is well covered on the Internet:

  1. SESSTIMEOUT
  2. Drivers

Increase Session Timeout

Visually we can easily change the registry using regedit, but let us use command line.

Query


REG QUERY HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\LanmanWorkstation\Parameters /s

Output:

LanmanWorkStationParametersRead

Explanation:

  1. We can see that SessTimeout is not listed
  2. And, as such we are using the default timeout of 45 seconds ( https://technet.microsoft.com/en-us/library/cc938292.aspx )

Write

REG ADD HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\LanmanWorkstation\Parameters /v SessTimeout /t REG_DWORD /d 360 

Output:

LanmanWorkStationParametersWrite

Explanation:

  • Thanks goodness, we were successful

Re-read


REG QUERY HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\LanmanWorkstation\Parameters /v SessTimeout

Output:

LanmanWorkStationParametersRead

Explanation:

  • We got back 0x168
  • 0x signifies Hex, and so let us covert to Decimal.  0x168 is 360 decimal

Retry Backup

The next time we retried backup, we were successful.

I think the problem is due to the way SQL Server backups database.  It allocates a big chunk of sparse file to make sure that the needed storage is available and reserved.  It does a lot of work and then goes back and writes out the backup file in chunks.

My thinking is that this causes network timeouts.

Using Compression

Let us make sure that we are using SQL Server Backup Compression:

ServerBackupCompression

We checked the “Compress backup” setting.

And, that saved us some storage the next time we attempted a backup.

SQLBackupFiles

The initial backup was 263 MB and the new one is 185 MB.

We thus saved 30%.

Other Errors

The good thing about writing is that it forces one into a hole.  By it’s very nature, you almost have to check out more things out.  In our case, there are numerous errors being logged in the Event Viewer.

Event Viewer – System

  • Type => Warning
  • Source => MrxSmb
  • Event => 3019

EvertViewer-3019

Quick Explanation:

  1. Every minute, a few entries are being logged

It appears that the registry entry item that we just changed will be effectual in better understanding this error, as well.

This is based on a quick reading of:

MSDN Blogs > Microsoft Open Specifications Support Team Blog > CIFS and SMB Timeouts in Windows
http://blogs.msdn.com/b/openspecification/archive/2013/03/19/cifs-and-smb-timeouts-in-windows.aspx

Summary

Again, thanks goodness for the Internet.

I really thought it was just out of date drivers (disk and network) and was spending time seeking the latest files out over the Internet.

But, since the registry patch worked without us having to restart the MS SQL Server Instance nor the host, life is good!

References

Registry / Command Line

  1. Reg /query
    https://technet.microsoft.com/en-us/library/Cc742028.aspx
  2. Reg /add
    https://technet.microsoft.com/en-us/library/Cc742162.aspx

SQL Server – What version of SQL did backup file originate?

Background

Unfortunately, dealing with a hard-drive crash and now trying to review and possibly restore the backup files.

Question

So now the question is what version of SQL Server was installed on the crashed server.

 

Investing using Backup file

Transact SQL – Restore HeaderOnly

SQL:


restore headeronly
from  disk ='e:\tmp\DBLAB_backup_2015_08_04_001526_4046147.bak'

 

Output:

SQLServerFromBackupfile

 

Explanation:

  1. In our case
    • Database Version is 661 –> SQL Server 2008
    • SoftwareVersionMajor and SoftwareVersionMinor are 10 and 50; and that transpose to MS SQL Server v2008/R2

 

Documentation:

https://msdn.microsoft.com/en-us/library/ms178536.aspx

 

Column Name Description Sample
Backup Name Backup up set name  DBLAB_backup_2015_08_04_001526_4046147
Backup Type Backup type ( 1 = Database, 2 = Transaction log, 4 = File, 5 = Differential database, 6 = Differential file, 7 = Partial, 8 = Differential partial )  1
Compressed Whether the backup set is compressed using software-based compression ( 0 = No, 1 = Yes )  1
Username  User name that performed the backup operation. LABDomain\MSSQLService
Computer  Name of the server that wrote the backup set.  LABDB
Database Name  Name of the database that was backed up.  LABDB
DatabaseVersion Version of the database from which the backup was created.  661 ( SQL Server 2008 )
DatabaseCreationDate Date and time the database was created.  2013-10-17 23:02:52.000
BackupSize Size of the backup, in bytes.  14864384
 SoftwareVersionMajor  Major version number of the server that created the backup set. 10
 SoftwareVersionMinor  Minor version number of the server that created the backup set.  50
 MachineName  Name of the computer that performed the backup operation.  DBLAB
 HasBackupChecksums  1 = Backup contains backup checksums.  0
 IsDamaged  1 = Database was damaged when backed up, but the backup operation was requested to continue despite errors.  0
 CompressedBackupSize  Byte count of the backup set. For uncompressed backups, this value is the same as BackupSize.To calculate the compression ratio, use CompressedBackupSize and BackupSize.  4769366
 containment  Indicates the containment status of the database ( 0 = database containment is off, 1 = database is in partial containment )  0

 

 

Summary

In retrospect, the server crash is causing us to review things that we had an operational blindside to.

As an example, are we using compression, checksum, etc?