Seeing a couple of errors on our DB:
'TCP://DBSales002:5022', the remote mirroring partner for database 'InStoreSales', encountered error 829, status 1, severity 21. Database mirroring has been suspended. Resolve the error on the remote server and resume mirroring, or remove mirroring and re-establish the mirror server instance. Error: 1453, Severity: 16, State: 1.
Upon Seeing these errors opted to research them a bit more. As for Error 829, here is MS Write-up:
How to troubleshoot Error 3414 and a failed database recovery with SQL Server http://support.microsoft.com/kb/2015741
In the case of a committed transaction that should be rolled forward, the page can be marked inaccessible (any future attempts to access the page result in Msg 829) and recovery can be allowed to complete. In this situation, the error must be corrected by restoring the page from a backup or deallocating the page by using DBCC CHECKDB with repair.
So from my reading, it appears that some entries were pre-written to the log files, but a System Crash prevented those entries from being propagated to the database files. Upon system reboot, the database engine is attempting to roll forward ( read the Database Log and harden the changes on to the database datafiles), but it appears to be having problem doing so. So what to do?
- Determine the extent of the corruption
- Consider repair avenues
Determine the extent of Corruption:
There are a couple of paths that one can consider to determine extent of corruption.
- Read through SQL Server Error Log
- Perform a select against msdb.dbo.suspect_pages
Though the “suspect_pages” has been SQL Server since v2005, I only started reading about it. Yet, it offers quite a granular and very useful insight into SQL Server’s own understanding and discovery of internal database files and page corruption.
select DB_NAME(tblSuspect.database_id) as dbName , tblFile.name as fileSymbolicName , tblFile.physical_name as fileName , tblSuspect.database_id , tblSuspect.file_id , tblSuspect.page_id , tblSuspect.event_type , case when (event_type = 1) then '824 errors' when (event_type = 2) then 'Bad Checksum' when (event_type = 3) then 'Torn Page' when (event_type = 4) then 'Restored' when (event_type = 5) then 'Repaired' when (event_type = 6) then 'Deallocated by DBCC' end as EventTypeLiteral , tblSuspect.error_count , tblSuspect.last_update_date from [msdb]..suspect_pages tblSuspect inner join sys.master_files tblFile on tblSuspect.database_id = tblFile.database_id and tblSuspect.file_id = tblFile.file_id order by tblSuspect.last_update_date desc
Using “Understanding & Managing suspect pages” web page, http://msdn.microsoft.com/en-us/library/ms191301(v=sql.90).aspx, was able to determine that:
- We have at least 5 pages that are corrupted
- The corruption is limited to a single database – database id –> 5. To determine the database name, issue db_name(<database-id>)
- Also able to determine when the corruption was detected; fairly recently
- One of the errors is already corrected via “DBCC”
- The are at least 4 pages that continue to have unresolved errors — The error type reads “Bad Checksum”
What is “Bad Checksum”? CheckSum and parity bits are used extensively in electronics transmission to detect & validate errors during data transmission. They are also used during data storage & retrieval to detect tampering.
Page Verify Specify the option used to discover and report incomplete I/O transactions caused by disk I/O errors. Possible values are None, TornPageDetection, and Checksum.
So in our case, it appears that SQL Server can not vouch for the accuracy of some of its pages. Next to do is to determine which database objects are corrupted and decide whether they can can be salvaged. To determine which objects are corrupted, googled for it and settled on the “never miss” assist by Elizabeth Redei. Personally, one of my favorite Blog postings of all time is Elizabeth’s web posting:
Lies, Damned Lies And Statistics http://sqlblog.com/blogs/elisabeth_redei/archive/2009/08/10/lies-damned-lies-and-statistics-part-i.aspx
Elisabeth Redei, Jan 9, 2009
Hi, You can use the un-everything command (well documented on the web though) DBCC PAGE (dbid, fileid, pageid) as in:dbcc page (15, 1,2236) with tableresults HTH /Elisabeth
So listening to Elizabeth tried out:
declare @databaseID int declare @fileID int declare @pageID int set @databaseID = 5 set @fileID = 4 set @pageID = 39283416 dbcc page(@databaseID, @fileID, @pageID) with tableresults
based on running “DBCC Page”, was able to trace things back to Object ID 99
In our case, the database is not accessible and so one has to use a different database. From the Object 99, one is able to deduce that that the object is a system table. And, from the Index ID 0, we are targeting the data pages themselves, and not Index Pages. So trouble… Yes, a bit. But Googled a bit and knew what our options are. What did we find: Well Paul Randal’s online response. Paul wrote most of SQL Server’s DBCC Internals while at MSFT. And, so just as Moses is the law giver and brought the Messianic prophecy, once Paul says it, it is so:
DBCC Check DB
There are a variety of documents and blog posts that detail the on-disk structures (e.g. Kalen’s books, my blog series on Inside The Storage Engine, etc) but you will not be able to fix the system tables. The error you’re getting suggests that there is more corruption in the system tables –
CHECKDB/CHECKTABLE will stop after the first gross corruption is found in a system table. Your options at this point are to manually use DBCC IND and DBCC PAGE to figure out the list of pages comprising your important tables in logical order and get as much info out of the pages one-at-a-time as you can.
Unless this is critical data and the database is small, this will take a *long* time to do. Unfortunately this isn’t the kind of situation that can be worked over a forum such as this, and Microsoft Product Support won’t help with data recovery (I know this as I co-wrote the policy document that prevents them doing it, while I was at MS).
Hope this helps.
Paul Randal SQL Server MVP, Managing Director, SQLskills.com
So again, what to do:
- If a passive end of DB Mirror relationship : backup active, disengage mirroring, restore backup (on passive), and re-engage mirroring
- If no mirror or active node of Mirror, Restore from a recent backup
- How to troubleshoot error 3414
- Data Corruption
- Understanding & Managing the Suspect pages table http://msdn.microsoft.com/en-us/library/ms191301(v=sql.90).aspx
- How to find the Object Name or ObjectID using the Page ID