SQL Server – Transaction Log Management – Recovery Mode

Background

It is finally time to revisit a blog posting I started on Dec 21st, 2016.

That post is titled “SQL Server – Error – Cannot shrink log file because the logical file located at the end of the file is in use“.

And, it is here.

Issue

Received an alarm that our database files are growing and so sought to find out why.

 

TroubleShooting

Transaction Log – Utilization Metrics from Performance Counter

transactionLog.IUDF_UtilizationMetricsFromPerformanceCounters

Code

use master
go

if schema_id('transactionLog') is null
begin

	exec('create schema [transactionLog] authorization [dbo]')

end
go

if object_id('[transactionLog].[IUDF_UtilizationMetricsFromPerformanceCounters]') is null
begin

	exec(
			'create function [transactionLog].[IUDF_UtilizationMetricsFromPerformanceCounters]
			()
			RETURNS table
			AS
			RETURN 
			(
				SELECT 
					[shell] = 1/0
		   );
	   	   '
		)

end
go


alter function [transactionLog].[IUDF_UtilizationMetricsFromPerformanceCounters]
()
returns TABLE
as
return
(

	with cte
	(
		  [dbid]
		, [database]
		, [counter]
		, [size]
	)
	as
	(
 
		SELECT
				  [dbid]
					= db_id(RTRIM(instance_name)) 
 
				, [database]
					= RTRIM(instance_name) 
 
				, tblDMOPC.counter_name
 
				, cntr_value log_size_kb
 
		FROM  sys.dm_os_performance_counters tblDMOPC
 
		WHERE tblDMOPC.object_name = 'SQLServer:Databases'
 
		AND   tblDMOPC.counter_name like 'Log File' + '%'
 
		AND   tblDMOPC.instance_name <> '_Total'
 
	)
	, ctePivot
	as
	(
		select *
 
		from   cte
		pivot 
				(
					avg ([size]) 
							for [counter] in
								(
									  [Log File(s) Used Size (KB)]                                                                                                           
									, [Log File(s) Size (KB)]                                                                                                           
								)
				) as avgSize
 
	)
 
	select 

			  [cteP].[dbid]

			, [cteP].[database]
 
			, [usedMB] = [Log File(s) Used Size (KB)]
							/ 1000
 
			, [totalMB] =  [Log File(s) Size (KB)]
							/ 1000
 
			, [%]
				= cast
					(
						( [Log File(s) Used Size (KB)] * 100.00)
						/ ( [Log File(s) Size (KB)])
						as decimal(10,2 )
					)

			, [recoveryModel]
				= tblSD.recovery_model_desc

			, [log reuse wait]
				= tblSD.log_reuse_wait_desc
			
	from   ctePivot cteP

	inner join sys.databases tblSD

		on cteP.[dbid] = tblSD.database_id
 

)
go


Output

qa_20170207_0958pm

Explanation

  1. Before capturing the screenshot above, our target database, QA_rbpivr1, was in FULL RECOVERY
    • It’s transaction log size was 8 GB
    • And, 3.2 GB of the 8GB is in use

 

Why so much Transaction Log is being held up?

Recovery Mode

Again for QA_rbpiv1  3.2 GB is being used.

Let us review how database is configured

qa_recoverymode_20170207_1009am

We see that the Recovery Mode for our database is Full.

Backup Dates

databaseproperty_options_general_backupdate_qa_20170207_1013am

 

Explanation
  1. The last full backup was taken last night Feb 2nd, 2017 and it completed at 8:35 PM
  2. And, the last Transaction backup was taken on Sunday Feb 5th, 2017 at 6 PM

 

 

Transaction Backup History

Let us review our Transaction Backup History

 

sqlserveragent_transactionbackuphistory_qa_20170207_1019am

Explanation:

  1. Before today, transaction backups were occurred on 2/5/2017, 1/29/2017, 1/22/2017
  2. And, they were occurring each hour
  3. Looked back at the Calendar and those days fell on Sundays

 

 

Review Scheduled Transaction Backup

jobproperties-databasebackup-userdatabases-log-schedules-sunday-eachhour

 

Explanation:

  1. Schedule is Weekly on Sundays
  2. And, the frequency is each hour

 

Correction

Our bad.

Revise Schedule

sqlserveragent_transactionbackuphistory_qa_20170207_1028am

Explanation:

  1. Changed Schedule’s frequency from Weekly to Daily

 

Metadata

Review Checkpoints in Error Log File

Enable Logging of Checkpoints in ErrorLog file

Enable Trace Flags

Code

dbcc traceon (3502, 3504, 3605, -1 )

Review

Code

dbcc tracestatus

Output

dbcctracestatus

Review ErrorLog file

Let us review our error logs and make sure that Checkpoints are indeed been logged.

logfileviewer_20170207_1220pm

 

transactionLog.sp_checkPointsInErrorLogFileAggregate

 

Code

 


use [master]
go

if object_id('transactionLog.sp_checkPointsInErrorLogFileAggregate') is null
begin

	exec('create procedure [transactionLog].[sp_checkPointsInErrorLogFileAggregate] as ')

end
go

alter procedure [transactionLog].[sp_checkPointsInErrorLogFileAggregate] 
as  

begin

	set nocount on;
  
	declare @p1  INT
	declare @p2  INT
	declare @p3  VARCHAR(255)
	declare @p4  VARCHAR(255)
  
	declare @tblErrorLog     TABLE
	(
		  [id] int not null
			identity(1,1)
  
		, [LogDate] datetime
  
		, [ProcessInfo] sysname
  
		, [Text]        varchar(600) null
  
		, primary key
			(
				  [LogDate]
				, [id]
			)
  
		, INDEX INDX_ID
			(
				[id]
			)
  
	)
  
	set @p1 = 0 -- current
	set @p2 = 1 -- error log
	set @p3 = 'Ckpt'
	set @p3 = 'Ckpt dbid'
	set @p4  = ''
  
	insert into @tblErrorLog
	(
		  [LogDate]
		, [ProcessInfo]
		, [Text]        
	)
	exec [sys].[sp_readerrorlog]
			 @p1   = @p1  
		   , @p2   = @p2
		   , @p3   = @p3
		   , @p4   = @p4
  
  
	select
			  tblSD.[name]
  
			, tblSD.log_reuse_wait_desc
  
			, [count] 
				= count(tblEL.LogDate)
  
			, [LogDateMin]
				= min(tblEL.LogDate)
  
			, [LogDateMax]
				= max(tblEL.LogDate)
   
	from  sys.databases tblSD
  
	left outer join @tblErrorLog tblEL
  
			on   tblEL.[Text]  like '%' + cast(tblSD.database_id as varchar(10)) + '%'
          
			and  tblEL.[Text] like 'Ckpt dbid ' + '%' + 'started'
 
	group by
  
			   tblSD.[name]
			 , tblSD.log_reuse_wait_desc
  
	order by
  
			   tblSD.log_reuse_wait_desc asc
			 , count(tblEL.LogDate) desc
			 , tblSD.[name]

end
go



 

Output

errorlogfile_checkpoints_transactionlogcheckpoint_20170207_1036am

Explanation

  1. Database
    • QA_rbpivr1
      • LogDateMax
        • From digging in the Transaction Log File, we can see that the last Transaction Log on 2017-02-06 20:34
        • The timeline is Yesterday, Monday Feb 6th, 2017 8:34 PM
      • LogDateMin
        • LogDateMin is 2017-01-15 04:00
        • In readable term Jan 15th, 2017 @ 4 AM

 

Source Control

GitHub

  1. DanielAdeniji/SQLServerTransactionLogMetrics
    Link

 

Summary

In summary, our transaction size is growing and not being pruned because unfortunately our scheduled Transaction backup job was incorrectly set to only occur on Sundays.

Please keep in mind this only affects databases set for FULL RECOVERY.

From the metrics above, we can see that our identified database Transaction Log Checkpoints are only occurring at the tail end of Full Backups.

And, as aforementioned on Sundays.

 

SQL Server – Error – Cannot shrink log file because the logical file located at the end of the file is in use

Background

Received a trouble ticket overnight that indicates that one of the disks on our server is running low.

Removed some of the outdated backup files and now wanted to shrink a log file that was relatively big.

Log File Shrink Command

Shrink Log File

Command


use [rbpivr1]
go

dbcc shrinkfile(2)
go

 

Error Message

Textual


Cannot shrink log file 2 (xxxxx_log) because the logical log file located at the end of the file is in use.


Image

errormessage

 

TroubleShoot

Let us do a bit of a troubleshooting….

Metadata

sys.databases – log_reuse_wait_desc

Code


select 
		  tblSD.name
		, tblSD.database_id
		, tblSD.recovery_model_desc
		, tblSD.log_reuse_wait_desc 

from   sys.databases tblSD

Output

sys-databases-20161221-1013am

 

Explanation

  • There are two databases that have other than NOTHING for their log re-use
  • The two databases are
    • msdb
      • wait reason is DATABASE_SNAPSHOT_CREATION
    • rbpivr1
      • wait reason is CHECKPOINT

 

sys.dm_exec_requests

Code



select 
		 [isCheckpointSession]
			= case
				  when (tblSDER.command = 'CHECKPOINT') then 'Yes'
				  when (tblSDER.command like '%CKPT%') then 'Yes'
				  else 'No'
			  end	

		, tblSDER.[status]

		, tblSDER.session_id

		, tblSDER.command

		, tblSDER.database_id

		, [database]
			= db_name(tblSDER.database_id)

		, tblSDER.wait_type

		, tblSDER.last_wait_type

		--, tblSDER.estimated_completion_time

		, tblSSP.[last_batch]

		, tblSDER.start_time

		, [duration]
			= datediff(minute, tblSSP.last_batch, getdate())

		, tblSDER.wait_resource

		, tblSDER.wait_time

		, tblSDER.blocking_session_id

		, tblSDER.cpu_time
		, tblSDER.[reads]
		, tblSDER.[writes]

from   sys.dm_exec_requests tblSDER

inner join master.dbo.sysprocesses tblSSP

		on tblSDER.[session_id] = tblSSP.[spid]
order by 
		  1 desc
		, tblSDER.[status] asc
		, tblSDER.command asc


Output

 

sys-dm_exec_requests-20161221-0233pm

 

Explanation

  1. Because of our requested ordering, the sessions that relate to CHECKPOINT are listed  first
    • CHECKPOINT
      • The traditional CHECKPOINT session
    • XTP_CKPT_AGENT
    • XTP_OFFLINE_CKPT
      • For the databases that have in-memory tables, we have unique sessions
  2. There is no blocking going on
    • And, so not quite why it appears that our database is not being checkpoint

 

Get Transaction Log Size & Actual Usage

Performance Counters

Code
Code – Function – transactionLog.IUDF_UtilizationMetricsFromPerformanceCounters


use master
go

if schema_id('transactionLog') is null
begin

	exec('create schema [transactionLog] authorization [dbo]')

end
go

if object_id('[transactionLog].[IUDF_UtilizationMetricsFromPerformanceCounters]') is null
begin

	exec(
			'create function [transactionLog].[IUDF_UtilizationMetricsFromPerformanceCounters]
			()
			RETURNS table
			AS
			RETURN 
			(
				SELECT 
					[shell] = 1/0
		   );
	   	   '
		)

end
go


alter function [transactionLog].[IUDF_UtilizationMetricsFromPerformanceCounters]
()
returns TABLE
as
return
(

	with cte
	(
		  [dbid]
		, [database]
		, [counter]
		, [size]
	)
	as
	(
 
		SELECT
				  [dbid]
					= db_id(RTRIM(instance_name)) 
 
				, [database]
					= RTRIM(instance_name) 
 
				, tblDMOPC.counter_name
 
				, cntr_value log_size_kb
 
		FROM  sys.dm_os_performance_counters tblDMOPC
 
		WHERE tblDMOPC.object_name = 'SQLServer:Databases'
 
		AND   tblDMOPC.counter_name like 'Log File' + '%'
 
		AND   tblDMOPC.instance_name <> '_Total'
 
	)
	, ctePivot
	as
	(
		select *
 
		from   cte
		pivot 
				(
					avg ([size]) 
							for [counter] in
								(
									  [Log File(s) Used Size (KB)]                                                                                                           
									, [Log File(s) Size (KB)]                                                                                                           
								)
				) as avgSize
 
	)
 
	select 

			  [cteP].[dbid]

			, [cteP].[database]
 
			, [usedMB] = [Log File(s) Used Size (KB)]
							/ 1000
 
			, [totalMB] =  [Log File(s) Size (KB)]
							/ 1000
 
			, [%]
				= cast
					(
						( [Log File(s) Used Size (KB)] * 100.00)
						/ ( [Log File(s) Size (KB)])
						as decimal(10,2 )
					)

			, [recoveryModel]
				= tblSD.recovery_model_desc

			, [log reuse wait]
				= tblSD.log_reuse_wait_desc
			
	from   ctePivot cteP

	inner join sys.databases tblSD

		on cteP.[dbid] = tblSD.database_id
 

)
go



Invoke

select iutfPC.*

from   [transactionLog].[IUDF_UtilizationMetricsFromPerformanceCounters]() iutfPC

order by 
	iutfPC.[usedMB] desc


Output

performancecounter-20161221-0246pm

 

Explanation:
  1. The database that we are digging into, ppsivr, is the one that has the most in-use storage allocated

 

 

dbcc sqlperf(logpace)

The traditional method for getting transaction in-use is dbcc sqlperf(logspace).

 

Code


dbcc sqlperf(logspace)

Output

dbcc-sqlperf-logspace-20161221-0320pm

Explanation

  1. The database we are focused on is rbpivr1
    • It’s allocated size is 3300 MB
    • And, % utilization is 30

 

To increase our repertoire of diagnostic tools, we focused on using performance counters in this post.

 

dbcc opentran

Code


use rbpivr1;

dbcc opentran

Output

dbcc opentran came back empty.

 

Trace Flags

Does the errorlog file contain interesting clues.

To go forward we must enable certain Trace Flags.

Pertinent Trace Flags

TraceFlag File Version  Source  KB
 3502 Tracks when checkpoint is occurring Paul Randal How to monitor checkpoints
Link
 3504  More detailed information about checkpoint  Paul Randal How to monitor checkpoints
Link

 

 

Set Trace Flags


dbcc traceon (3502, 3504, 3605, -1 )

Review Trace Flags


dbcc tracestatus

Output

traceflagsreview

ErrorLog

 

Aggregate Checkpoints in the current error log file

Code



use[master]
go
 
set nocount on;
go
 
declare @p1  INT
declare @p2  INT
declare @p3  VARCHAR(255)
declare @p4  VARCHAR(255)
 
declare @tblErrorLog     TABLE
(
      [id] int not null
        identity(1,1)
 
    , [LogDate] datetime
 
    , [ProcessInfo] sysname
 
    , [Text]        varchar(600) null
 
    , primary key
        (
              [LogDate]
            , [id]
        )
 
    , INDEX INDX_ID
        (
            [id]
        )
 
)
 
set @p1 = 0 -- current
set @p2 = 1 -- error log
set @p3 = 'Ckpt'
set @p3 = 'Ckpt dbid'
set @p4  = ''
 
insert into @tblErrorLog
(
      [LogDate]
    , [ProcessInfo]
    , [Text]        
)
exec [sys].[sp_readerrorlog]
         @p1   = @p1  
       , @p2   = @p2
       , @p3   = @p3
       , @p4   = @p4
 
 
select
          tblSD.[name]
 
        , tblSD.log_reuse_wait_desc
 
        , [count] 
            = count(tblEL.LogDate)
 
        , [LogDateMin]
            = min(tblEL.LogDate)
 
        , [LogDateMax]
            = max(tblEL.LogDate)
  
from  sys.databases tblSD
 
left outer join @tblErrorLog tblEL
 
        on   tblEL.[Text]  like '%' + cast(tblSD.database_id as varchar(10)) + '%'
		 
        and  tblEL.[Text] like 'Ckpt dbid ' + '%' + 'started'

group by
 
           tblSD.[name]
         , tblSD.log_reuse_wait_desc
 
order by
 
           tblSD.log_reuse_wait_desc asc
         , count(tblEL.LogDate) desc
         , tblSD.[name]


readerrorlog-20161212-0343pm

 

Output

  1. Database
    • rbpivr1
      • We have yet to log CHECKPOINTS for the rbpivr1 database
  2. Whereas, we have logged 3 checkpoints for the MDW, ess, and master databases

Summary

Went for our corporate’s White Elephant Christmas party.

Upon returning ran the same set of scripts and we are now able to shrink the transaction logs.

Also, CHECKPOINTS are occurring on the rbpivr1 database.

Again, not so sure what the initial reason for the CHECKPOINTS not completing is.

Hopefully, will get read more on background processes and successfully troubleshooting them in the future.

 

Dedicated

It has been a long time coming.

Somehow never dedicated something to a thought leader, Paul Randal.

Late till now…

Otis Redding – A Change Is Gonna Come
Link

Hate to go there, but on Post, gotta be honest.

Had better feel for Otis Redding when I listened to Lil Wayne’s track.

Lil Wayne – Long Time Coming
Link

With rhymes like “But I can’t love ya I ain’t usher I’m far southern”.

It is all of 8 years behind us; right before your boy Hussein made it into Office, so don’t SMH.

 

Source Control

GitHub

  1. SQLServerTransactionLogMetrics
    Link

 

References

  1. Paul S. Randal
    • sqlmag
      • How to monitor checkpoints
        Link
  2.  Microsoft
    • technet.microsoft.com
      • Degremont Michel
        • Unable to shrink the transaction log
          Link
    • In Memory Tables
      • Extreme Transaction Processing (XTP, Hekaton) – the solution to everything?
        Link
    • Developer Network
      • Database Features >> In-Memory OLTP (In-Memory Optimization) >> Creating and Managing Storage for Memory-Optimized Objects >> Scalability
        Link
    • Support
      • FIX: Offline checkpoint thread shuts down without providing detailed exception information in SQL Server 2014
        Link
  3. Stack Overflow
    • How can I get the size of the Transaction Log in SQL 2005 programmatically?
  4. sqlity.net
    • Log Reuse Waits Explained: CHECKPOINT
      Link