Quantcast
Channel: SQL Server Blog
Viewing all articles
Browse latest Browse all 1849

Wait Stats Baseline Comparison – Idera Diagnostic Manager

$
0
0

Database servers have to wait on different resources, and we make changes to try to reduce those waits.  This could be anything from increasing memory, tuning a query, moving to different hardware, changing indexes, etc..  All of these things should reduce waits, but how effective were they?

The trick is to know what things looked like before so you have something to compare it to.  If you know what it looked like before then you’ll know how much of a difference you made.

My Default Monitoring Message

I’ll copy/paste this section on multiple posts, so feel free to skip it if you’ve seen it before.

The biggest issue is the word “before”.  If you weren’t watching your servers before then you have nothing to compare the current state of your server to.  Also, if you’re watching it consistently then you’ll be able to make an apples to apples comparison.

Your production servers should have some kind of monitoring software in place, and the monitoring software is already capturing this type of information to help make all the pretty charts and graphs you see through the GUI.  You need this software to run so you can get alerted when things go wrong, and pulling this data will just make it more valuable.

Monitoring software doesn’t perform magic, so you can pull this data on your own if you don’t have any.  However, that’s taking on a lot of responsibility, time, and effort.  If something is collecting the data already, take advantage of that.

For me, I use Idera Diagnostic Manager, so my queries are going to look at the data collected using this tool.  Other monitoring tools are going to collect the same basic data in a database you can read.

If anyone from one of these companies wants to provide me with scripts that fill the same need, I’ll make a write up on them and give you credit for being awesome supporting your product like that.  I can only write against what I can test against, it’s nothing personal.

The Script

There are 3 parameters at the start, and that’s typically all I ever change.  A couple things stick out with these parameters.  The times are in UTC, so I default to GetUTCDate().  I want the time range to be at least an hour so I’m not letting a single query that ran skew the results too much.  Then I’m also converting my instance name to upper case because Idera DM may very well be your only database that’s case sensitive, so I cheat by putting everything in upper case.

My upper case trick isn’t a recommended trick.  It causes scans and possible bad joins.  In this case I know it’s against small tables and that I don’t have two servers with the same name using different cases.  It’s safe here, but not everywhere.

DECLARE @StartTime DateTime
	, @EndTime DateTime
	, @InstanceName sysname
		
SET @EndTime = GetUTCDate()
SET @StartTime = DateAdd(Hour, -24, @EndTime)
SET @InstanceName = UPPER('Server\Instance')

IF OBJECT_ID('tempdb..#IgnoredWaits') IS NOT NULL BEGIN
    DROP TABLE #IgnoredWaits
END

IF OBJECT_ID('tempdb..#Now') IS NOT NULL BEGIN
    DROP TABLE #Now
END

IF OBJECT_ID('tempdb..#Yesterday') IS NOT NULL BEGIN
    DROP TABLE #Yesterday
END

IF OBJECT_ID('tempdb..#LastWeek') IS NOT NULL BEGIN
    DROP TABLE #LastWeek
END

IF OBJECT_ID('tempdb..#TwoWeeks') IS NOT NULL BEGIN
    DROP TABLE #TwoWeeks
END

CREATE TABLE #IgnoredWaits (Wait_Type NVarChar(60) PRIMARY KEY) 

--Values taken from Paul Randal of SQLskills
--http://www.sqlskills.com/blogs/paul/wait-statistics-or-please-tell-me-where-it-hurts/
INSERT INTO #IgnoredWaits 
SELECT N'BROKER_EVENTHANDLER'            
UNION SELECT N'BROKER_RECEIVE_WAITFOR'
UNION SELECT N'BROKER_TASK_STOP'               
UNION SELECT N'BROKER_TO_FLUSH'
UNION SELECT N'BROKER_TRANSMITTER'             
UNION SELECT N'CHECKPOINT_QUEUE'
UNION SELECT N'CHKPT'                          
UNION SELECT N'CLR_AUTO_EVENT'
UNION SELECT N'CLR_MANUAL_EVENT'               
UNION SELECT N'CLR_SEMAPHORE'
UNION SELECT N'DBMIRROR_DBM_EVENT'             
UNION SELECT N'DBMIRROR_EVENTS_QUEUE'
UNION SELECT N'DBMIRROR_WORKER_QUEUE'          
UNION SELECT N'DBMIRRORING_CMD'
UNION SELECT N'DIRTY_PAGE_POLL'                
UNION SELECT N'DISPATCHER_QUEUE_SEMAPHORE'
UNION SELECT N'EXECSYNC'                       
UNION SELECT N'FSAGENT'
UNION SELECT N'FT_IFTS_SCHEDULER_IDLE_WAIT'    
UNION SELECT N'FT_IFTSHC_MUTEX'
UNION SELECT N'HADR_CLUSAPI_CALL'              
UNION SELECT N'HADR_FILESTREAM_IOMGR_IOCOMPLETION'
UNION SELECT N'HADR_LOGCAPTURE_WAIT'           
UNION SELECT N'HADR_NOTIFICATION_DEQUEUE'
UNION SELECT N'HADR_TIMER_TASK'                
UNION SELECT N'HADR_WORK_QUEUE'
UNION SELECT N'KSOURCE_WAKEUP'                 
UNION SELECT N'LAZYWRITER_SLEEP'
UNION SELECT N'LOGMGR_QUEUE'                   
UNION SELECT N'ONDEMAND_TASK_QUEUE'
UNION SELECT N'PWAIT_ALL_COMPONENTS_INITIALIZED'
UNION SELECT N'QDS_PERSIST_TASK_MAIN_LOOP_SLEEP'
UNION SELECT N'QDS_SHUTDOWN_QUEUE'
UNION SELECT N'QDS_CLEANUP_STALE_QUERIES_TASK_MAIN_LOOP_SLEEP'
UNION SELECT N'REQUEST_FOR_DEADLOCK_SEARCH'    
UNION SELECT N'RESOURCE_QUEUE'
UNION SELECT N'SERVER_IDLE_CHECK'              
UNION SELECT N'SLEEP_BPOOL_FLUSH'
UNION SELECT N'SLEEP_DBSTARTUP'                
UNION SELECT N'SLEEP_DCOMSTARTUP'
UNION SELECT N'SLEEP_MASTERDBREADY'            
UNION SELECT N'SLEEP_MASTERMDREADY'
UNION SELECT N'SLEEP_MASTERUPGRADED'           
UNION SELECT N'SLEEP_MSDBSTARTUP'
UNION SELECT N'SLEEP_SYSTEMTASK'               
UNION SELECT N'SLEEP_TASK'
UNION SELECT N'SLEEP_TEMPDBSTARTUP'            
UNION SELECT N'SNI_HTTP_ACCEPT'
UNION SELECT N'SP_SERVER_DIAGNOSTICS_SLEEP'    
UNION SELECT N'SQLTRACE_BUFFER_FLUSH'
UNION SELECT N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP'
UNION SELECT N'SQLTRACE_WAIT_ENTRIES'          
UNION SELECT N'WAIT_FOR_RESULTS'
UNION SELECT N'WAITFOR'                        
UNION SELECT N'WAITFOR_TASKSHUTDOWN'
UNION SELECT N'WAIT_XTP_HOST_WAIT'             
UNION SELECT N'WAIT_XTP_OFFLINE_CKPT_NEW_LOG'
UNION SELECT N'WAIT_XTP_CKPT_CLOSE'            
UNION SELECT N'XE_DISPATCHER_JOIN'
UNION SELECT N'XE_DISPATCHER_WAIT'             
UNION SELECT N'XE_TIMER_EVENT'

SELECT s.InstanceName
	, t.WaitType
	, WaitTime_Minutes = SUM(WaitTimeInMilliseconds)/1000/60
	, dStart = Min(UTCCollectionDateTime)
	, dEnd = Max(UTCCollectionDateTime)
INTO #Now
FROM SQLdmrepository..WaitStatisticsDetails wsd
	INNER JOIN SQLdmrepository..WaitTypes t ON wsd.WaitTypeID = t.WaitTypeID
	INNER JOIN SQLdmrepository..WaitStatistics ws ON ws.WaitStatisticsID = wsd.WaitStatisticsID
	INNER JOIN SQLdmrepository..MonitoredSQLServers s ON ws.SQLServerID = s.SQLServerID
WHERE ws.UTCCollectionDateTime BETWEEN @StartTime and @EndTime
	AND	UPPER(s.InstanceName) = @InstanceName
	AND t.WaitType NOT IN (SELECT Wait_Type COLLATE SQL_Latin1_General_CP1_CI_AS FROM #IgnoredWaits)
	--AND t.WaitType LIKE 'PAGEIO%'
GROUP BY s.InstanceName, t.WaitType

SELECT @StartTime = @StartTime - 1
	, @EndTime = @EndTime - 1

SELECT s.InstanceName
	, t.WaitType
	, WaitTime_Minutes = SUM(WaitTimeInMilliseconds)/1000/60
	, dStart = Min(UTCCollectionDateTime)
	, dEnd = Max(UTCCollectionDateTime)
INTO #Yesterday
FROM SQLdmrepository..WaitStatisticsDetails wsd
	INNER JOIN SQLdmrepository..WaitTypes t ON wsd.WaitTypeID = t.WaitTypeID
	INNER JOIN SQLdmrepository..WaitStatistics ws ON ws.WaitStatisticsID = wsd.WaitStatisticsID
	INNER JOIN SQLdmrepository..MonitoredSQLServers s ON ws.SQLServerID = s.SQLServerID
WHERE ws.UTCCollectionDateTime BETWEEN @StartTime and @EndTime
	AND	UPPER(s.InstanceName) = @InstanceName	
	AND t.WaitType NOT IN (SELECT Wait_Type COLLATE SQL_Latin1_General_CP1_CI_AS FROM #IgnoredWaits)
	--AND t.WaitType LIKE 'PAGEIO%'
GROUP BY s.InstanceName, t.WaitType

SELECT @StartTime = @StartTime - 6
	, @EndTime = @EndTime - 6

SELECT s.InstanceName
	, t.WaitType
	, WaitTime_Minutes = SUM(WaitTimeInMilliseconds)/1000/60
	, dStart = Min(UTCCollectionDateTime)
	, dEnd = Max(UTCCollectionDateTime)
INTO #LastWeek
FROM SQLdmrepository..WaitStatisticsDetails wsd
	INNER JOIN SQLdmrepository..WaitTypes t ON wsd.WaitTypeID = t.WaitTypeID
	INNER JOIN SQLdmrepository..WaitStatistics ws ON ws.WaitStatisticsID = wsd.WaitStatisticsID
	INNER JOIN SQLdmrepository..MonitoredSQLServers s ON ws.SQLServerID = s.SQLServerID
WHERE ws.UTCCollectionDateTime BETWEEN @StartTime and @EndTime
	AND	UPPER(s.InstanceName) = @InstanceName	
	AND t.WaitType NOT IN (SELECT Wait_Type COLLATE SQL_Latin1_General_CP1_CI_AS FROM #IgnoredWaits)
	--AND t.WaitType LIKE 'PAGEIO%'
GROUP BY s.InstanceName, t.WaitType

SELECT @StartTime = @StartTime - 7
	, @EndTime = @EndTime - 7

SELECT s.InstanceName
	, t.WaitType
	, WaitTime_Minutes = SUM(WaitTimeInMilliseconds)/1000/60
	, dStart = Min(UTCCollectionDateTime)
	, dEnd = Max(UTCCollectionDateTime)
INTO #TwoWeeks
FROM SQLdmrepository..WaitStatisticsDetails wsd
	INNER JOIN SQLdmrepository..WaitTypes t ON wsd.WaitTypeID = t.WaitTypeID
	INNER JOIN SQLdmrepository..WaitStatistics ws ON ws.WaitStatisticsID = wsd.WaitStatisticsID
	INNER JOIN SQLdmrepository..MonitoredSQLServers s ON ws.SQLServerID = s.SQLServerID
WHERE ws.UTCCollectionDateTime BETWEEN @StartTime and @EndTime
	AND	UPPER(s.InstanceName) = @InstanceName	
	AND t.WaitType NOT IN (SELECT Wait_Type COLLATE SQL_Latin1_General_CP1_CI_AS FROM #IgnoredWaits)
	--AND t.WaitType LIKE 'PAGEIO%'
GROUP BY s.InstanceName, t.WaitType

SELECT TOP 10 n.InstanceName
	, n.WaitType
	, Now_Minutes = n.WaitTime_Minutes
	, Yesterday_Minutes = d1.WaitTime_Minutes
	, OneWeek_Minutes = d7.WaitTime_Minutes
	, TwoWeeks_Minutes = d14.WaitTime_Minutes  
FROM #Now n
	LEFT JOIN #Yesterday d1 ON n.WaitType = d1.WaitType
	LEFT JOIN #LastWeek d7 ON n.WaitType = d7.WaitType
	LEFT JOIN #TwoWeeks d14 ON n.WaitType = d14.WaitType 
ORDER BY n.WaitTime_Minutes DESC

The Results

The results pretty much speak for themselves.  What were the top 10 things you waited on recently and how much did you wait on them in the past.  It’s great to have a view of it where you’re comparing totals side-by-side.  Seeing the same thing in charts just doesn’t have the same effect for me.

These numbers are going to have natural variances to them.  Did a user run a couple large queries during one interval but not another?  Was the shared storage under different loads?  Really anything can come up.  The longer the period the more you smooth out these numbers.

So, the real question is what do these waits mean?  I’m not going to pretend like I can cover all the wait stats, their meanings, and how to affect them in this post.  What I can say is that there are good resources out there for you.

Paul Randal (b|t) has the best source I know of on his post Wait Statistics, or please tell me where it hurts.

CXPACKET

One wait type I like to talk about, and discredit a bit, is CXPACKET.  This wait type is just saying you used parallelism, nothing more.  I first covered this topic in my post Why worry about CXPACKET.

Think of it in terms of a manager who delegates a large project to multiple employees.  As the employees are hard at work accomplishing the task, the manager goes about their day and records their time as CXPACKET.  The task is typically completed in less overall duration because it was a group effort, but the total amount of work done goes up some because there’s overhead and small duplicated tasks when you delegate to a group.

That less overall duration could make or break your application, so the idea of having CXPACKET isn’t necessarily bad (setting MAXDOP = 1 is rarely a good idea).  However, a task being so large that it needed to be delegated to multiple employees may mean it’s not efficient, making query and index tuning a good approach.  Also, your manager may be too eager to delegate their tasks, so possibly adjust your Cost Threshold for Parallelism as suggested in Jeremiah Peschka’s (b|t) post Five SQL Server Settings to Change.

As I said before, I can’t cover all the wait types here, so go to Paul Randal’s post I suggested above for a better list.

When Idera Finds Out What I’m Doing….

Someone asked me for queries pulling from Diagnostic Manager, and I posted on Twitter when I made my Idera Diagnostic Manager Scripts page.  The outcry and rage from this was…very positive.

Vicky Harp (t), who is now Idera’s Corporate Strategist wrote pieces of Diagnostic Manager and its schema, took time out of her day to meet with me and talk about the details of querying the data they collect.

Scott Stone (t) who is Diagnostic Manager’s Product Manager also jumped in very quickly to see if I’d like to be a beta tester for them, bringing real power users into the loop to make their products better.

This is the level of support you want from your monitoring software.

While I don’t officially endorse or recommend any specific software at this time, I do recommend you look for responses like this.

 


Filed under: Idera, Monitoring, Scripts, SQL Server

Viewing all articles
Browse latest Browse all 1849

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>