Stack-overflow Database ( Years 2008 through 2010 )

Background

As always looking for sample code and databases.

This time it is a sample database.

Occasionally, new users ask a Database Administrator for learning tools and materials.

 

Stack Exchange

Stack Exchange, the parent company for Stack Overflow, thankfully avails their database has XML files.

Brent Ozar and Co avails the XML files as Microsoft SQL Server Data & log files.

 

Download

Read about downloading the 10 GB data which covers years 2008 thru 2010 from here.

The actual file is  here. And, it is packaged as a 1 GB compressed file.

 

Extract

Extract using 7-Zip.

 

Attach to SQL Server Instance

Launch SSMS and attach data and log file using something like this.

SQL


USE [master]
GO

CREATE DATABASE [StackOverflow2010]
ON
(
	FILENAME = N'C:\Microsoft\SQLServer\Datafiles\StackExchange\Stackoverflow\Stackoverflow.2010\StackOverflow2010.mdf'
)
LOG ON
(
	FILENAME = N'C:\Microsoft\SQLServer\Logfiles\StackExchange\Stackoverflow\Stackoverflow2010\StackOverflow2010_log.ldf'
)

FOR ATTACH

GO

 

Output

 

Converting database 'StackOverflow2010' from version 655 to the current version 869.
Database 'StackOverflow2010' running the upgrade step from version 655 to version 668.
...
...
Database 'StackOverflow2010' running the upgrade step from version 866 to version 867.
Database 'StackOverflow2010' running the upgrade step from version 867 to version 868.
Database 'StackOverflow2010' running the upgrade step from version 868 to version 869. 

Explanation

The version of the packaged database is v2008.

When attached to a later model SQL Server, it is upgraded to that later version.

That explains the output entries “upgrade step from version [m] to version [n].

 

Create Indexes

Based on your query needs please create indexes

Here are targeted indexes for our lone test query

Table – [dbo].[Votes]

Index – [dbo].[Votes].[INDX_UserId_BountyAmount]


use [StackOverflow2010]
go

create index [INDX_UserId_BountyAmount]
on [dbo].[Votes]
(
	  [UserId] asc
	, [BountyAmount] desc
)
with
(
	  FillFactor=100
	, DROP_EXISTING=OFF
)

 

Sample Queries

Query – Users with highest bounties

Query


set transaction isolation level read uncommitted
go

use [StackOverflow2010]
go

; with [cteBountyAmount]
as
(
	select top 15
			  tblV.[UserId]
			, [BountyAmount]
				= sum(tblV.[BountyAmount])

	from [dbo].[Votes] tblV

	group by
		tblV.[UserId]

	order by
		sum(tblV.[BountyAmount]) desc

)

select
		  [UserId]
			= tblU.[Id]

		, [name]
			= tblU.DisplayName

		, [url]
			= tblU.WebsiteUrl

		, [bountyAmount]
			= cteBA.[BountyAmount]

from   [dbo].[Users] tblU

inner join [cteBountyAmount] cteBA

	on tblU.[Id] = cteBA.UserId

order by
	cteBA.[BountyAmount] desc

 

Output

One thought on “Stack-overflow Database ( Years 2008 through 2010 )

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s