Reviewing SQL Server Wait Stats is a beaten down path towards understanding and optimizing your database’s bottlenecks.
What is CXPacket wait?
When the SQL Server Engine chooses to break down a query into smaller pieces and process the individual pieces in parallel there is a lag between when the first results is received and when the results are compiled.
Craig Freedman has a very good and short design lead in:
Craig Freedman – Introduction to Parallel Query Execution
One should keep in mind that CXPacket wait stats are not wholly distinct from other wait stats. That is, because waits are marked as CXPacket, one should not over-read and think that for a specific request waits related to I/O, CPU, and Memory, Locks and Latches are not contributing to the query’s CXPacket wait stat.
In fact, it seems that once a query is parallelized, a timer is started and only stopped once all parallelized threads complete and their results are assembled.
Based on Microsoft’s published documentation if the percentile of Wait Stats attributed to CXPacket exceeds 5% one should take a good and clear eye view of the settings that are relevant to Parallelism.
Extrapolate Wait Stats
Extrapolate Wait Stats – All Wait Stats
Let us quickly look at all Wait Stats.
Publicly stealing from Paul Randal:
Tell me where it hurts
;WITH [Waits] AS (SELECT * , ROW_NUMBER() OVER(ORDER BY [ResourceS] DESC) AS [RowNum] from ( select [wait_type], [wait_time_ms] / 1000.0 AS [WaitS], ([wait_time_ms] - [signal_wait_time_ms]) / 1000.0 AS [ResourceS], [signal_wait_time_ms] / 1000.0 AS [SignalS], [waiting_tasks_count] AS [WaitCount], (100.0 * [wait_time_ms]) / SUM ([wait_time_ms]) OVER() AS [Percentage] FROM sys.dm_os_wait_stats where waiting_tasks_count != 0 ) tblIn ) SELECT [W1].[wait_type] AS [WaitType], CAST ([W1].[WaitS] AS DECIMAL(14, 2)) AS [Wait_S], CAST ([W1].[ResourceS] AS DECIMAL(14, 2)) AS [Resource_S], CAST ([W1].[SignalS] AS DECIMAL(14, 2)) AS [Signal_S], [W1].[WaitCount] AS [WaitCount], CAST ([W1].[Percentage] AS DECIMAL(4, 2)) AS [Percentage], CAST (([W1].[WaitS] / [W1].[WaitCount]) AS DECIMAL (14, 4)) AS [AvgWait_S], CAST (([W1].[ResourceS] / [W1].[WaitCount]) AS DECIMAL (14, 4)) AS [AvgRes_S], CAST (([W1].[SignalS] / [W1].[WaitCount]) AS DECIMAL (14, 4)) AS [AvgSig_S] FROM [Waits] AS [W1] INNER JOIN [Waits] AS [W2] ON [W2].[RowNum] <= [W1].[RowNum] GROUP BY [W1].[RowNum], [W1].[wait_type], [W1].[WaitS], [W1].[ResourceS], [W1].[SignalS], [W1].[WaitCount], [W1].[Percentage] HAVING SUM ([W2].[Percentage]) - [W1].[Percentage] < 95 order by CAST ([W1].[Percentage] AS DECIMAL(4, 2)) desc ; -- percentage threshold GO
From the above we are able to see our most costly wait categories:
Extrapolate Wait Stats – Some Wait Stats
Filter out some wait stats categories, and only look at some.
Again resting on Paul Randal:
WITH [Waits] AS (SELECT [wait_type], [wait_time_ms] / 1000.0 AS [WaitS], ([wait_time_ms] - [signal_wait_time_ms]) / 1000.0 AS [ResourceS], [signal_wait_time_ms] / 1000.0 AS [SignalS], [waiting_tasks_count] AS [WaitCount], 100.0 * [wait_time_ms] / SUM ([wait_time_ms]) OVER() AS [Percentage], ROW_NUMBER() OVER(ORDER BY [wait_time_ms] DESC) AS [RowNum] FROM sys.dm_os_wait_stats WHERE [wait_type] NOT IN ( N'CLR_SEMAPHORE', N'LAZYWRITER_SLEEP', N'RESOURCE_QUEUE', N'SQLTRACE_BUFFER_FLUSH', N'SLEEP_TASK', N'SLEEP_SYSTEMTASK', N'WAITFOR', N'HADR_FILESTREAM_IOMGR_IOCOMPLETION', N'CHECKPOINT_QUEUE', N'REQUEST_FOR_DEADLOCK_SEARCH', N'XE_TIMER_EVENT', N'XE_DISPATCHER_JOIN', N'LOGMGR_QUEUE', N'FT_IFTS_SCHEDULER_IDLE_WAIT', N'BROKER_TASK_STOP', N'CLR_MANUAL_EVENT', N'CLR_AUTO_EVENT', N'DISPATCHER_QUEUE_SEMAPHORE', N'TRACEWRITE', N'XE_DISPATCHER_WAIT', N'BROKER_TO_FLUSH', N'BROKER_EVENTHANDLER', N'FT_IFTSHC_MUTEX', N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP', N'DIRTY_PAGE_POLL', N'SP_SERVER_DIAGNOSTICS_SLEEP') ) SELECT [W1].[wait_type] AS [WaitType], CAST ([W1].[WaitS] AS DECIMAL(14, 2)) AS [Wait_S], CAST ([W1].[ResourceS] AS DECIMAL(14, 2)) AS [Resource_S], CAST ([W1].[SignalS] AS DECIMAL(14, 2)) AS [Signal_S], [W1].[WaitCount] AS [WaitCount], CAST ([W1].[Percentage] AS DECIMAL(4, 2)) AS [Percentage], CAST (([W1].[WaitS] / [W1].[WaitCount]) AS DECIMAL (14, 4)) AS [AvgWait_S], CAST (([W1].[ResourceS] / [W1].[WaitCount]) AS DECIMAL (14, 4)) AS [AvgRes_S], CAST (([W1].[SignalS] / [W1].[WaitCount]) AS DECIMAL (14, 4)) AS [AvgSig_S] FROM [Waits] AS [W1] INNER JOIN [Waits] AS [W2] ON [W2].[RowNum] <= [W1].[RowNum] GROUP BY [W1].[RowNum], [W1].[wait_type], [W1].[WaitS], [W1].[ResourceS], [W1].[SignalS], [W1].[WaitCount], [W1].[Percentage] HAVING SUM ([W2].[Percentage]) - [W1].[Percentage] < 95; GO
sys.dm_os_wait_stats (Transact SQL)
Wait Stats Decision?
Which Wait Stats shall we look at ? As our title suggests, CXPacket.
Review CXPacket Wait Stats Related Settings
- Machine → Number of CPUs
- Machine → Hyper-Threading (HT) setting
- SQL Server Instance Setting → Max degree of parallelism
- SQL Server Instance Setting → Parallelism Cost Threshold
- SQL Server Instance Setting → CPU & I/O Affinity
- SQL Server – Exposes – NUMA Characteristics
Machine / Number of CPU
The decision as to what to set SQL Server’s Instance setting for “Max degree of Parallelism” to is based on Number of Processor Cores on the box.
There are various tools for determining the number of cores on your box; our choices includes:
- CPUID (http://www.cpuid.com/)
- System Information ( built into Windows \ Accessories \ System Tools \ System Information)
My box happens to be a virtualized host and I really need to dig in and determine what my true # of processors is.
Machine / Hyper-Threading setting
Like any groundbreaking Technology, it seems that it took a few iterations for Chip manufacturers (Intel and AMD), OS Vendor (Microsoft), and Database Vendor to fully “bake” HT Technology.
Discussions about HT and how it relates to SQL Server warrant several postings, and I will suggest that you Google for more popular postings
Machine / Hyper-Threading setting – SQL Command
To get an idea on whether Hyper-Threading is being used, how many logical processors is being exposed, and how many physical processors you have, please issue a query similar to the one below.
Courtesy of Dattatray Nale
SELECT cpu_count AS NumberOfLogicalCPUs , hyperthread_ratio , ( cpu_count / hyperthread_ratio ) AS NumberOfPhysicalCPUs , CASE WHEN hyperthread_ratio = cpu_count THEN cpu_count ELSE ( ( cpu_count - hyperthread_ratio ) / ( cpu_count / hyperthread_ratio ) ) END AS NumberOfCoresInEachCPU , CASE WHEN hyperthread_ratio = cpu_count THEN cpu_count ELSE ( cpu_count / hyperthread_ratio ) * ( ( cpu_count - hyperthread_ratio ) / ( cpu_count / hyperthread_ratio ) ) END AS TotalNumberOfCores FROM sys.dm_os_sys_info
wmic cpu get NumberofCores, NumberofLogicalProcessors
To rest, determine your HT Setting and ensure that it is properly considered as you make a decision as to # of CPU Cores and what to set your “Max degree of Parallelism” to.
SQL Server Instance / Max degree of Parallelism
Issue SQL Server Query:
exec sp_configure 'max degree of parallelism'
SQL Server Instance / Cost threshold for Parallelism
Issue SQL Server Query:
exec sp_configure 'cost threshold for parallelism'
SQL Server Instance – CPU & I/O Affinity
Each SQL Server Instance can be configured to use specific processor and I/O bindings.
To determine a SQL Server Instance current setting:
- In “Management Studio”, right click on the SQL Server Instance node
- From the drop-down menu, select the “Properties” menu entry
- From the “Server Properties”window, click the “Processor” page
- Review the settings for Processor and I/O Affinity
Here is ours:
We can see that:
- Processor and I/O affinity are not set
- We have a single NUMA node
exec sp_configure 'show advanced options', 1; reconfigure with override; --Processor Mask exec sp_configure 'affinity mask'; --IO mask exec sp_configure 'affinity I/O mask';
SQL Server Instance – Exposes – NUMA Characteristics
Non-Uniform Memory Access
Non-Uniform Memory Access is a computer memory design used in multiprocessing, where the memory access time depends on the memory location relative to a processor. Under NUMA, a processor can access its own local memory faster than non-local memory.
Limiting the number of memory accesses provided the key to extracting high performance from a modern computer. For commodity processors, this meant installing an ever-increasing amount of high-speed cache memory and using increasingly sophisticated algorithms to avoid cache misses. But the dramatic increase in size of the operating systems and of the applications run on them has generally overwhelmed these cache-processing improvements. Multi-processor systems without NUMA make the problem considerably worse. Now a system can starve several processors at the same time, notably because only one processor can access the computer’s memory at a time.
NUMA attempts to address this problem by providing separate memory for each processor, avoiding the performance hit when several processors attempt to address the same memory. For problems involving spread data (common for servers and similar applications), NUMA can improve the performance over a single shared memory by a factor of roughly the number of processors (or separate memory banks).
Thus, Multi-processor machines utilize NUMA to more gracefully partition and protect memory.
It is better to minimize cross NUMA Node memory access. And, so keep that in mind as you set “Max degree of Parallelism” .
Basically, you want to limit the maximum set of processors accessed to the number of processors per NUMA bank.
Once you have documented the relevant settings that might affect whether a query is ran in parallel or not, it really makes sense to gain awareness as to which queries are being ran in parallel.
Review Cached Plan:
Looked on the Net for sample code. And, quite a few of the usual public committers posted good queries:
- Pinal Dave ( http://blog.sqlauthority.com/2010/07/24/sql-server-find-queries-using-parallelism-from-cached-plan/ )
- StrateSQL (http://www.sqlservercentral.com/blogs/stratesql/2010/12/9/can-you-dig-it_3F00_-_1320_-parallelism-in-the-plan-cache/)
Settled on the one by StrateSQL; here it is:
/* StrateSQL Can You Dig It? – Parallelism in the Plan Cache http://www.sqlservercentral.com/blogs/stratesql/2010/12/9/can-you-dig-it_3F00_-_1320_-parallelism-in-the-plan-cache/ */ SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; WITH XMLNAMESPACES (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan') SELECT COALESCE(DB_NAME(p.dbid) , p.query_plan.value('(//RelOp/OutputList/ColumnReference/@Database)' ,'nvarchar(128)')) AS DatabaseName , DB_NAME(p.dbid) + '.' + OBJECT_SCHEMA_NAME(p.objectid, p.dbid) + '.' + OBJECT_NAME(p.objectid, p.dbid) AS ObjectName , cp.objtype , p.query_plan , cp.UseCounts , cp.plan_handle , CAST('<!--?query -' + CHAR(13) + q.text + CHAR(13) + '-?-->' AS xml) AS SQLText FROM sys.dm_exec_cached_plans cp CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) p CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) as q WHERE cp.cacheobjtype = 'Compiled Plan' AND p.query_plan.exist('//RelOp[@Parallel = "1"]') = 1 ORDER BY COALESCE(DB_NAME(p.dbid) , p.query_plan.value( '(//RelOp/OutputList/ColumnReference/@Database)' , 'nvarchar(128)' ) ) , UseCounts DESC
Review Active Sessions:
If on a whim, you will like to see which active, ongoing queries are running in parallel try the untested SQL Script I pasted below:
;with cteParallelProcesses (spid, numberofMultipleECIDs) as ( select spid , count(ecid) as numberofMultipleECIDs from sys.sysprocesses group by spid having COUNT(ecid) > 1 ) select tblSysProcess.spid , tblCTE.numberofMultipleECIDs , suser_sname(tblSessions.security_id) as [userName] , tblSysProcess.lastwaittype , tblSysProcess.waitresource , tblSysProcess.waittime , db_name(tblSysProcess.dbid) as [databaseName] , tblSysProcess.cpu , tblSysProcess.physical_io , tblSysProcess.memusage , tblSysProcess.status , tblSysProcess.hostname , tblSysProcess.program_name -- , tblSysProcess.cmd , tblSQLText.text as sqlText , object_name(tblSQLText.objectid, tblSQLText.dbid) as objectName from sys.sysprocesses tblSysProcess inner join sys.dm_exec_connections tblConnections on tblSysProcess.spid = tblConnections.session_id inner join sys.dm_exec_sessions tblSessions on tblSysProcess.spid = tblSessions.session_id cross apply sys.dm_exec_sql_text(most_recent_sql_handle) tblSQLText inner join cteParallelProcesses tblCTE on tblSysProcess.spid = tblCTE.spid where tblSysProcess.spid > 50 order by tblCTE.numberofMultipleECIDs desc ;
Understand Parallelized Queries
I think we will like to review the batch of Queries that are flagged as Parallelized and get an idea on why?
Here are some of the recurring reasons:
- Missing Indexes (which leads to more IO)
- Missing column statistics
- Complex SQL including ones included in exec (‘–sql—‘)
- Resource pressure that could end up serializing what would otherwise have benefited from being actually ran in parallel ( crediting sqldoctor @ http://sqldoctor.idera.com/wait-stats/parallelism-hurry-up-and-wait/)
There are so many blind sides to this parallelism story.
Variable Argument vs Concrete Argument
Though estimated Query Plan might indicate that parallelism will occur, when your run the query and supply actual values for the parameters the optimized might see that the data set is narrow enough and the arguments for Parallelism get decimated.
Here is a sampler:
- In the top pane, variables were submitted (@retirementAge) in the where clause and in the bottom actual values were submitted (‘7/1/2013’)
Hardening Parallelism for specific user through Resource Group
To constrain this posting I will only briefly touch on this point. My thoughts were excited via a blog posting from Cindy Gross.
Cindy Gross – The ins and outs of MAXDOP
“… and your login is not limited by MAX_DOP settings in a Resource Governor resource pool.”
Here are some contributory postings:
- Bob Dorr – CSS SQL Server Engineer – Resource Governor and MAXDOP, Parallel plan generation
- Jeremy – SQL Tips: When MAXDOP is not the same as MAXDOP
How much of Parallelism is based on your Hardware?
I have a lot more reading and LAB work to do to determine to what extent parallelism decisions are anchored to Hardware performance…
But, I think you want to to get the sets of queries that are being ran in parallel. This should be accessed across your entire deployment systems (Development, Testing, and Production).
You also want to give a bit more attention to data from your Production box; for the following reasons:
- It will have more data
- And, the hardware (Storage and CPU) is more representative to as what your users are experiencing
- Your concurrent workload which affects whether SQL Server Engine parallelizes is also more helpful when seen through your Production lenses
- Against Production DB, consider running server-side SQL Trace and using the replay functionality. And, in turn run that against systems in the lower tier to simulate Production DB load
Adjust your Settings
Max degree of Parallelism
What should you set “Max degree of Parallelism” to:
Here is a formula that I found on the Web for calculating Max degree of Parallelism:
Sakthivel Chidambaram – Wow… We have MAXDOP calculator for SQL Server… It makes my job easier:
Get # of NUMA Nodes
From SQL Server issue:
select COUNT(distinct memory_node_id) as memory_node_id from sys.dm_os_memory_clerks where memory_node_id != 64
Get # of Processor Cores
From command line issue:
powershell -Command "& Get-WmiObject -namespace "root\CIMV2" -class Win32_Processor -Property NumberofCores | select NumberofCores"
Here is what we came up with:
Cost threshold for Parallelism
Based on Joe Chang’s published hard-work, consider adjusting your “Cost threshold for Parallelism” to match your hardware.
That is, on bigger boxes, if you are think too may queries are being parallelized determine their average cost and raise your “Cost threshold for Parallelism” a bit higher.
I found Chris Mcgowan’s blog posting to be most useful.
Cost Threshold for Parallelism
declare @PlanCacheForMaxDop TABLE ( StatementSubTreeCost FLOAT , UseCounts INT , PlanSizeInBytes INT ); -- Collect parallel plan information SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; WITH XMLNAMESPACES (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan') INSERT INTO @PlanCacheForMaxDop ( StatementSubTreeCost , UseCounts , PlanSizeInBytes ) SELECT n.value('(@StatementSubTreeCost)', 'VARCHAR(128)') AS StatementSubTreeCost, ecp.usecounts, ecp.size_in_bytes FROM sys.dm_exec_cached_plans AS ecp CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS eqp CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS qn(n) WHERE n.query('.').exist('//RelOp[@PhysicalOp="Parallelism"]') = 1; -- Return grouped parallel plan information SELECT MAX(CASE WHEN StatementSubTreeCost BETWEEN 1 AND 5 THEN '1-5' WHEN StatementSubTreeCost BETWEEN 5 AND 6 THEN '5-6' WHEN StatementSubTreeCost BETWEEN 6 AND 7 THEN '6-7' WHEN StatementSubTreeCost BETWEEN 7 AND 8 THEN '7-8' WHEN StatementSubTreeCost BETWEEN 8 AND 9 THEN '8-9' WHEN StatementSubTreeCost BETWEEN 9 AND 10 THEN '9-10' WHEN StatementSubTreeCost BETWEEN 10 AND 11 THEN '10-11' WHEN StatementSubTreeCost BETWEEN 11 AND 12 THEN '11-12' WHEN StatementSubTreeCost BETWEEN 12 AND 13 THEN '12-13' WHEN StatementSubTreeCost BETWEEN 13 AND 14 THEN '13-14' WHEN StatementSubTreeCost BETWEEN 14 AND 15 THEN '14-15' WHEN StatementSubTreeCost BETWEEN 15 AND 16 THEN '15-16' WHEN StatementSubTreeCost BETWEEN 16 AND 17 THEN '16-17' WHEN StatementSubTreeCost BETWEEN 17 AND 18 THEN '17-18' WHEN StatementSubTreeCost BETWEEN 18 AND 19 THEN '18-19' WHEN StatementSubTreeCost BETWEEN 19 AND 20 THEN '19-20' WHEN StatementSubTreeCost BETWEEN 20 AND 25 THEN '20-25' WHEN StatementSubTreeCost BETWEEN 25 AND 30 THEN '25-30' WHEN StatementSubTreeCost BETWEEN 30 AND 35 THEN '30-35' WHEN StatementSubTreeCost BETWEEN 35 AND 40 THEN '35-40' WHEN StatementSubTreeCost BETWEEN 40 AND 45 THEN '40-45' WHEN StatementSubTreeCost BETWEEN 45 AND 50 THEN '45-50' WHEN StatementSubTreeCost > 50 THEN '>50' ELSE CAST(StatementSubTreeCost AS VARCHAR(100)) END) AS StatementSubTreeCost , COUNT(*) AS countInstance , avg(PlanSizeInBytes) /1000 avgPlanSizeInKB FROM @PlanCacheForMaxDop GROUP BY CASE WHEN StatementSubTreeCost BETWEEN 1 AND 5 THEN 2.5 WHEN StatementSubTreeCost BETWEEN 5 AND 6 THEN 5.5 WHEN StatementSubTreeCost BETWEEN 6 AND 7 THEN 6.5 WHEN StatementSubTreeCost BETWEEN 7 AND 8 THEN 7.5 WHEN StatementSubTreeCost BETWEEN 8 AND 9 THEN 8.5 WHEN StatementSubTreeCost BETWEEN 9 AND 10 THEN 9.5 WHEN StatementSubTreeCost BETWEEN 10 AND 11 THEN 10.5 WHEN StatementSubTreeCost BETWEEN 11 AND 12 THEN 11.5 WHEN StatementSubTreeCost BETWEEN 12 AND 13 THEN 12.5 WHEN StatementSubTreeCost BETWEEN 13 AND 14 THEN 13.5 WHEN StatementSubTreeCost BETWEEN 14 AND 15 THEN 14.5 WHEN StatementSubTreeCost BETWEEN 15 AND 16 THEN 15.5 WHEN StatementSubTreeCost BETWEEN 16 AND 17 THEN 16.5 WHEN StatementSubTreeCost BETWEEN 17 AND 18 THEN 17.5 WHEN StatementSubTreeCost BETWEEN 18 AND 19 THEN 18.5 WHEN StatementSubTreeCost BETWEEN 19 AND 20 THEN 19.5 WHEN StatementSubTreeCost BETWEEN 10 AND 15 THEN 12.5 WHEN StatementSubTreeCost BETWEEN 15 AND 20 THEN 17.5 WHEN StatementSubTreeCost BETWEEN 20 AND 25 THEN 22.5 WHEN StatementSubTreeCost BETWEEN 25 AND 30 THEN 27.5 WHEN StatementSubTreeCost BETWEEN 30 AND 35 THEN 32.5 WHEN StatementSubTreeCost BETWEEN 35 AND 40 THEN 37.5 WHEN StatementSubTreeCost BETWEEN 40 AND 45 THEN 42.5 WHEN StatementSubTreeCost BETWEEN 45 AND 50 THEN 47.5 WHEN StatementSubTreeCost > 50 THEN 100 ELSE StatementSubTreeCost END; GO
Interestingly enough it shows that as a group Statements costed below our default threshold of 5 are actually the highest on our list:
And, like they say that is not for now; but for another trek up the learning Tree…
My interest in this area was piqued by a couple of articles publicly published by Jimmy May. I have listed the postings in the References section.
One of the commenters, Mike B, posted:
There is a misunderstanding here. You can reduce CPU utilization even more by turning off the computer. It should be understood that parallel processing is inherently “inefficient”, but you use it because it executes faster.
With passion like this, who needs Court TV. Just like CSI you almost always learn something, once you have the requisite interest.
References – Wait Stats:
- Wait Stats
- Paul – Wait Statistics or please tell me where it hurts
- Performance Troubleshooting with Wait Stats
References – Wait Stats / SQLCLR:
- How It Works: What is behind the SQLCLR wait category in SQL Server 2008 Activity Monitor
References – Parallelism – Architecture & Design:
- Craig Freedman – Introduction to Parallel Query Execution
References – General:
- Microsoft SQL Server Performance Top Tip: Degrees of Parallelism and a Degree of Uncertainty
- Degree of Parallelism
- Parallelism – Hurry up and wait
References – Parallelism in your Plan Cache:
- SQL Strate – Parallelism in your Plan Cache http://www.sqlservercentral.com/blogs/stratesql/2010/12/9/can-you-dig-it_3F00_-_1320_-parallelism-in-the-plan-cache/
References – Max degree of Parallelism Settings
References – CPU (Processor):
- SQL Server – Find number of physical processors, cores, logical processors and RAM using T-SQL
References – Jimmy May
- Case Study: Part 1: CXPACKET Wait Stats & ‘max degree of parallelism’ Option: Introduction to Using Wait Stats to Identify & Remediate Query Parallelism Bottlenecks
- Case Study: Part 2: CXPACKET Wait Stats & ‘max degree of parallelism’ Option: Suppressing Query Parallelism Eliminated CXPACKET Waits & Liberated 30% of CPU
References – Joe Chang:
- HyperThreading performance
- Parallelism in Queries with Intermediate Plan Cost
- Parallelism Strategies and Comments
References – Maxdop Calculator:
- Wow – We have MAXDOP Calculator for SQL Server … It makes my job easier
- Maxdop Caclulator
References – Parallelism – Cost Threshold:
- Cost Threshold for Parallelism