SQL Black Box Recorder part II

A few months back I began a series, about the SQL Black Box Recorder that you can read here, with a promise of at least two more posts. black box recorderThe first article in the series laid the groundwork about the black box recorder in SQL Server. Furthermore, there was a brief introduction into the three components of the black box recorder. The three components as laid out in that article are:

  • Default Trace
  • system_health Extended Event Session
  • sp_server_diagnostics procedure

The first article went into deeper detail about the first leg of this three-legged black box recorder – the default trace. In this article, the focus will be directed to the next leg of the black box recorder – or the system_health Extended Event Session. If by some measure you are in the dark about what Extended Events is, then I recommend you read my personal blog series on the subject. There are numerous articles stepping through the subject in easy-to-digest fashion. You can find many of the articles in the following table of contents – here.

The focus of this article will not be to introduce the topic of Extended Events. Rather, it will be to introduce the system_health session and dive into it a fair bit.

SQL Black Box Recorder

What is the system_health session?

Beyond being a component of the black box for SQL Server, what exactly is this event session? The system_health is much as the name implies – it is a “trace” that attempts to gather information about various events that may affect the overall health of the SQL Server instance.

The event session will trap various events related to deadlocks, waits, clr, memory, schedulers, and reported errors. To get a better grasp of this, let’s take a look at the event session makeup based on the available metadata in the dmvs and catalog views.

DECLARE @SessionName VARCHAR(256) = 'system_health';
 
/* note: this only works if the session is running and the source is "server" not internal or something else.
*/
 
SELECT xs.name AS SessionName
		, xs.session_source
		, xse.event_name
		, xo.description AS EventDesc
		, xsea.action_name
		, xoa.description AS ActionDesc
		, xse.event_predicate
		, sese.predicate, sese.predicate_xml
	FROM sys.dm_xe_sessions xs
		INNER JOIN sys.dm_xe_session_events xse
			ON xs.address = xse.event_session_address
		LEFT OUTER JOIN sys.dm_xe_session_event_actions xsea
			ON xse.event_session_address = xsea.event_session_address
				AND xse.event_name = xsea.event_name
		INNER JOIN sys.dm_xe_objects xo
			ON xse.event_name = xo.name
				AND xse.event_package_guid = xo.package_guid
				AND xo.object_type = 'event'
		LEFT OUTER JOIN sys.dm_xe_objects xoa
			ON xsea.action_name = xoa.name
				AND xsea.action_package_guid = xoa.package_guid
				AND xoa.object_type = 'action'
		INNER JOIN sys.server_event_sessions ses
			ON xs.name = ses.name
		LEFT OUTER JOIN sys.server_event_session_events sese
			ON ses.event_session_id = sese.event_session_id
			AND xse.event_name = sese.name
	WHERE xs.name = @SessionName
	ORDER BY xs.name,xse.event_name, xsea.action_name;

In addition to the types of events I mentioned, there are also a few more interesting things to discover from the results of this particular query. For instance, the actions that are being employed help to gather even more information such as the callstack, tsql_callstack, database id, and sql text of the query that was executing at the moment of the event that is trapped.

One very peculiar event that is trapped is the sp_server_diagnostics_component_result event. Remember from earlier in this article that I noted the third leg of the black box record happens to be a very similarly named session – “sp_server_diagnostics procedure”. When I cover this third leg, it will make a little more sense. Understand, for now, that your suspicion of it being a bit more integral to the black box (than merely 1/3 of the overall black box) is confirmed.

This is a good set of information that can be trapped. Furthermore, this is a good set of information which can give a fairly good diagnosis of several different potential problems within your SQL Server Instance! (Yes it may seem repetitive, but I wanted to underscore the importance of the data that can be trapped.) That said, compare the types of events to those of the default trace. Take note of how many events actually overlap between the two components of the black box recorder.

For ease of comparison, let’s simplify that previous query to get a set of events to read without the duplication due to the actions on each event.

DECLARE @SessionName VARCHAR(256) = 'system_health';
 
SELECT xs.name AS SessionName, xse.event_name
	FROM sys.dm_xe_sessions xs
		INNER JOIN sys.dm_xe_session_events xse
			ON xs.address = xse.event_session_address
	WHERE xs.name = @SessionName
	ORDER BY xse.event_name;

And since I am feeling particularly generous, I have lumped the events from the two traces together in the following query:

DECLARE @SessionName VARCHAR(256) = 'system_health';
 
SELECT xs.name AS SessionName
		, xse.event_name AS EventName
		, mv.map_value AS CategoryName
	FROM sys.dm_xe_sessions xs
		INNER JOIN sys.dm_xe_session_events xse
			ON xs.address = xse.event_session_address
		INNER JOIN sys.dm_xe_objects xo
			ON xse.event_package_guid = xo.package_guid
			AND xse.event_name = xo.name
			AND xo.object_type = 'event'
		INNER JOIN sys.dm_xe_object_columns oc
			ON oc.OBJECT_NAME = xo.name
			AND oc.object_package_guid = xo.package_guid
			AND oc.name = 'KEYWORD'
		INNER JOIN sys.dm_xe_map_values mv
			ON oc.type_name = mv.name
			AND oc.column_value = mv.map_key
			AND oc.object_package_guid = mv.object_package_guid
	WHERE xs.name = @SessionName
UNION ALL
/* events */
SELECT 'Default Trace' AS SessionName
		, te.name AS EventName
		, tca.name AS CategoryName
	FROM sys.traces t
		CROSS APPLY ( SELECT DISTINCT gei.eventid
							FROM sys.FN_TRACE_GETEVENTINFO(t.id) gei
					) ca
		INNER JOIN sys.trace_events te
			ON te.trace_event_id = ca.eventid
		INNER JOIN sys.trace_categories tca
			ON te.category_id = tca.category_id
	WHERE t.is_default = 1
	ORDER BY EventName;

That should help to compare the 17 events from the system_health session and the 34 from the default trace with a little more ease and speed. The events from the two sessions are rather complimentary to each as they help to construct the SQL Server Black Box recorder.

How to use this session

With the events trapped by this session, there are a few ways to use the trapped data that should stand out pretty easily. To me, the quick application of the data from this session can come from the review of deadlocks, reported errors, or even waits analysis. Using the deadlock as probably the most common problem people will want to troubleshoot, this information is suddenly much more valuable and easy to access (no more need for that trace flag). Since the system_health is now trapping the deadlock by default, one could quickly access the deadlock graph from the event data and proceed to troubleshooting. More information on getting that deadlock graph can be found here and here.

Learn to use the data made available through this member of the black box recorder. Becoming familiar with this tool (and additionally with Extended Events) will make you a rock-star in the office. The information needed to troubleshoot most problems is right at your fingertips through the use of the different pieces of the black box recorder. If it is not available there, then more than likely it can be made available through a bit of tinkering with Extended Events. Get to know the tools and be that rock-star DBA.

Please share this

Leave a Reply

Related Articles

A toy machine and a microphone on a table Description automatically generated

The Right Tool for the Right Job

I’m looking forward to 2024. From many perspectives, I think we are going to see great advancements, including in technology, that on one hand will make our lives easier, but also make our decisions a bit harder.

Read More »