sp_checkAG — Check Your SQL Server Availability Groups for Availability Issues

sp_checkAG is a free stored procedure that surfaces SQL Server Availability Group health issues before they become outages. Learn how to install it, interpret its output, and automate AG monitoring.


Always-On Availability Groups are one of SQL Server's most powerful high-availability features. They're also one of the most complex to monitor. A secondary replica can fall behind. A replica can be disconnected. Synchronization can silently degrade. And if you're not watching closely, you won't know until a failover attempt fails — or worse, until someone calls you at 2 AM because the app is down.

That's where sp_checkAG comes in.

sp_checkAG is a free, open-source stored procedure written by Brian Kelley (Straight Path SQL) designed specifically to surface Availability Group issues before they become outages. This post walks through what it does, how to deploy it, how to interpret its output, and what to do when it flags a problem.

What Is sp_checkAG?

sp_checkAG is a T-SQL stored procedure that queries the AG-related DMVs and rolls them into a single diagnostic output. Rather than forcing you to write custom DMV queries every time something looks wrong, it gives you a consolidated health snapshot in seconds.

The procedure checks for:

  • Replica connectivity — Is every replica in a CONNECTED state?
  • Synchronization health — Are all databases in a HEALTHY or PARTIALLY_HEALTHY state?
  • Redo queue depth — Is the secondary keeping up with the primary, or is it falling behind?
  • Send queue depth — Is the log stream being transmitted from primary to secondary without stacking up?
  • Failover readiness — For synchronous replicas, is automatic failover actually viable?
  • Role mismatches — Is a replica unexpectedly in RESOLVING state?
  • Database-level issues — Are individual databases within the AG NOT SYNCHRONIZING when they should be?

Installing sp_checkAG

Free · 2 Minutes
How healthy is your database, really?
Get your free database health score — spot risks before they become incidents.
Get my health score

sp_checkAG is available on GitHub. The recommended installation is on the primary replica.

-- Run on your primary replica (or any instance you use for DBA monitoring)\nUSE DBA;\nGO\n\n-- Paste the sp_checkAG CREATE PROCEDURE script here\n-- Source: https://github.com/bertwagner/sp_checkAG\n-- or: https://straightpathsql.com

Once installed, execution is simple:

EXEC sp_checkAG;

Basic Usage and Output

Running sp_checkAG with no parameters returns a result set covering all Availability Groups on the instance.

Key output columns to watch:

  • connected_state: CONNECTED vs DISCONNECTED — a disconnected replica isn't receiving log at all
  • synchronization_health: HEALTHY, PARTIALLY_HEALTHY, or NOT_HEALTHY
  • redo_queue_size_kb: Log waiting to be hardened on the secondary — growing redo queue means the secondary is I/O bound
  • send_queue_size_kb: Log waiting to be sent from primary — growing send queue means a network or endpoint bottleneck
  • is_failover_ready: 1 means automatic failover will fire; 0 means it won't — treat a sync replica with is_failover_ready=0 as a P1

Filtering to a Specific AG

EXEC sp_checkAG @AGName = 'AG_PROD';

Checking Database-Level Status

Beyond replica-level health, sp_checkAG also surfaces database-level synchronization states — critical because a replica can appear connected and healthy while individual databases within it are NOT SYNCHRONIZING.

EXEC sp_checkAG @ShowDatabaseDetails = 1;

This adds a second result set showing each database within each replica. A database showing NOT SYNCHRONIZING on an otherwise healthy replica is the issue most likely to be missed without database-level visibility.

Real-World Scenarios and Remediation

Redo Queue Growing on a Synchronous Secondary

The secondary can't apply log records fast enough — almost always I/O-bound on the secondary. Move AG log files to faster storage (NVMe preferred). Check for blocking on the secondary — long-running read queries can block the redo thread.

Replica DISCONNECTED After Network Event

Check endpoint status: SELECT name, state_desc FROM sys.endpoints WHERE type = 4;. Common causes: endpoint stopped (ALTER ENDPOINT [Hadr_endpoint] STATE = STARTED), firewall blocking port 5022, or certificate expiry if using certificate auth.

is_failover_ready = 0 on a Synchronous Replica

Your automatic failover won't fire. Check synchronization health, WSFC quorum health, and whether a database was recently added to the AG but not yet fully synchronized.

Database NOT SYNCHRONIZING in an Otherwise Healthy AG

This is the silent gap. Common causes: log chain broken (re-seed needed), full transaction log on primary, or automatic seeding failure. Check sys.dm_hadr_automatic_seeding for recent failures.

Using sp_checkAG in a Monitoring Job

Set it up as a SQL Server Agent job running every 5–10 minutes. Alert when any replica shows CONNECTED_STATE != CONNECTED or synchronization_health != HEALTHY. Pair with Database Mail for alerts and reference the AG Health Playbook for response procedures.

DECLARE @issues INT = 0;\nSELECT @issues = COUNT(*) FROM sys.dm_hadr_availability_replica_states\nWHERE role_desc = 'SECONDARY' AND (connected_state_desc != 'CONNECTED' OR synchronization_health_desc != 'HEALTHY');\nIF @issues > 0 BEGIN EXEC sp_checkAG @ShowDatabaseDetails = 1; END

sp_checkAG vs. Manual DMV Queries

You can get this information from DMVs directly. The question is whether you want to write and maintain those queries, or use a community tool that already does it correctly. For teams without commercial monitoring, sp_checkAG fills the gap cleanly.

Integration with SQL Server 2025

AG behavior is largely unchanged in SQL Server 2025 at the monitoring layer — sp_checkAG works the same way it does on 2019/2022. The DMVs are stable. One area to watch: SQL Server 2025's OPTIMIZED_LOCKING can affect redo thread behavior on secondaries in high-write workloads during mixed-version AG transitions.

Key Takeaways

  • Run sp_checkAG with @ShowDatabaseDetails = 1 — database-level issues are invisible without it
  • Redo queue growth on a sync replica = secondary I/O problem. Fix the storage, not the AG
  • Send queue growth on primary = network or secondary endpoint bottleneck
  • is_failover_ready = 0 on a sync replica means your automatic failover won't fire. Treat it as P1
  • Automate it — a SQL Agent job running every 5–10 minutes catches problems before users do

The procedure is freely available from Brian Kelley at Straight Path SQL. Need help setting up AG monitoring or designing a failover runbook? Start with a free database health assessment.

Free · Takes 2 Minutes

Get your free database health score

Find out exactly where your database is vulnerable before it causes an incident. 30+ years of DBA expertise, distilled into a single assessment.

Back to all posts