Better SQL Server Agent Job Failure Monitoring
An improved alternative to the native SQL Server Agent job failure alerts
An improved alternative to the native SQL Server Agent job failure alerts.
SQL Server Agent has a built-in alerting process for when jobs fail, but the information it provides isn’t very useful – you’re only told which job, what time, who ran it and which step failed. If you want to see why it failed, you have to review the job history manually. On a busy system with a lot of frequently run jobs, the job history could have cleared down by the time you go and look at it – especially if you’ve left the job history thresholds at the SQL Server defaults - something we routinely pick up in our Database Reviews.
As a best practice, we recommend increasing the Maximum job history log size and the Maximum job history rows per job to 100000 and 1000 respectively, to give you a better chance of getting useful job history from the Agent:
Additionally, for a job with multiple steps, where some steps are set to “go to next step” when they fail, you’ll never get an alert that they’ve failed.
To remove manual investigation, overcome these shortfalls and ensure that a useful alert is generated for any job step failures, create a SQL Server Agent job that executes the SQL query included below.
The SQL query provided uses a token to identify the job it is currently running in, and then uses that to work out when the job last ran, and get you all the job failures since then. This means you can schedule the job to run as frequently as you like, and it will always get you the failures since the last time it ran.
The SQL query sends the details in an email using SQL Server Database Mail. You’ll need to have this enabled, and have a default mail profile configured (or update the SQL query to specify your mail profile). It’ll send one email per job, with all instances of that job failing contained in the email. This means you’ll have everything in one place for multiple failures of a job, and each individual job can be sent to different people or teams to look into, if required.
The failure emails are formatted as below:
Here’s what we've been up to recently.
xTEN is now Cyber Essentials Plus certified
At xTEN security is a priority. Recently taken over by the IASME consortium (as of 1 April 2020), the Cyber Essentials certification consists of a self-assessment of 5 basic security controls which is then verified by a qualified assessor.