June 7, 2026DBA

SQL Server Always On Availability Groups: A Deep Dive into Architecture, Replica Roles, and Failover

AGs are more than a HA feature — they're a replication and routing platform. This post goes under the hood: replica roles, synchronous vs asynchronous modes, failover mechanics, readable secondary behavior, and the T-SQL that actually controls it all.

SQL Server Always On Availability Groups: A Deep Dive into Architecture, Replica Roles, and Failover

AGs are more than a HA feature — they're a replication and routing platform. This post goes under the hood: replica roles, synchronous vs asynchronous modes, failover mechanics, readable secondary behavior, and the T-SQL that actually controls it all.

---

Always On Availability Groups have a reputation problem. They're usually introduced as "SQL Server's HA feature," and that framing causes people to miss what they actually are: a distributed replication and routing platform that happens to provide HA as a side effect. The distinction matters when you're debugging a replica that fell behind at 2am or trying to understand why your read-only query landed on the primary.

This post assumes you've already set up an AG (or are planning to) and want to understand what's happening inside it. If you're starting from scratch and want the step-by-step setup, go read the complete setup guide first — this one builds on that foundation.

The Architecture Is Layered, Not Flat

People think about AGs as "primary and secondaries" — a primary writes, secondaries replicate. That's correct at a high level, but it misses the layers that make the system work. Understanding them helps you debug problems that would otherwise look like black magic.

At the bottom: WSFC owns the cluster quorum and coordinates failover decisions. Above that: SQL Server's AG layer handles replication semantics and replica health. Above that: the listener routes client connections. Each layer has its own state machine, failure modes, and configuration knobs.

When something breaks, the error usually comes from the layer above where you're looking. An AG showing DISCONNECTED in DMVs might look like a SQL problem, but if the cluster lost quorum, that's the real cause. Conversely, a cluster showing healthy might mask a SQL-level replication stall that you'll only see in the database-level DMVs.

Replicas: Roles, Modes, and What Each State Means

Free · 2 Minutes

How healthy is your database, really?

Get your free database health score — spot risks before they become incidents.

Get my health score

Primary Replica

The primary is the write source. All writes go here and are then shipped to secondaries via the log block stream. The primary also handles the availability group listener IP registration in DNS — when failover happens, the new primary registers the listener IP with the cluster, and clients reconnect through it.

One important constraint: only one primary can exist at a time. AGs do not support multi-primary (that's a distributed AG consideration, and it's different). If your cluster loses quorum and two nodes both think they have quorum, you get a split-brain scenario — SQL Server has heuristics to prevent dual primaries from accepting writes, but the cluster's quorum mechanism is your first line of defense.

The primary also maintains the AG's configuration metadata in sys.availability\_groups and sys.availability\_replicas. Changes to the AG (adding a database, changing replica properties) happen on the primary and get pushed to secondaries through the replication stream.

Secondary Replicas

A secondary receives the log stream and applies it to its local copy of the database. What it can do with that copy depends on its configuration. There are three relevant settings:

Availability Mode: Synchronous vs Asynchronous

SYNCHRONOUS\_COMMIT means the primary waits for the secondary to harden the transaction log to disk before acknowledging the commit to the client. Full stop — no client gets "committed" until the secondary has persisted it. This is your zero-data-loss path.

The tradeoff: latency. Every transaction incurs the round-trip time to the secondary and back. For geographically close replicas (same data center, sub-millisecond latency), this is fine. For replicas 500+ km apart, synchronous commit will cap your transaction throughput at whatever your worst-case network latency can sustain. If your primary is in us-east-1 and your secondary is in eu-west-1, synchronous commit will be painful.

ASYNCHRONOUS\_COMMIT means the primary ships the log and doesn't wait for the secondary to acknowledge. The secondary receives the stream and applies it as fast as it can, typically a few seconds behind. This is your disaster recovery path — you accept a small RPO gap in exchange for unlimited distance between replicas.

Most production environments use synchronous for local HA replicas and asynchronous for DR replicas. The mode is configured per-replica, not per-AG, so you can have one synchronous secondary and one asynchronous secondary in the same AG.

-- Check availability mode on all replicas
SELECT ag.name AS ag_name,
       ar.replica_server_name,
       ar.availability_mode_desc,
       ar.failover_mode_desc,
       ars.role_desc,
       ars.connected_state_desc
FROM sys.availability_groups ag
JOIN sys.availability_replicas ar ON ag.group_id = ar.group_id
JOIN sys.dm_hadr_availability_replica_states ars ON ar.replica_id = ars.replica_id;

Readable Secondaries

By default, secondaries don't allow connections. You can change that:

ALTER AVAILABILITY GROUP [AG_Production]
MODIFY REPLICA ON N'SECONDARY-NODE'
WITH (SECONDARY_ROLE(ALLOW_CONNECTIONS = READ_ONLY));
GO

Allowing connections unlocks several use cases:

Reporting offload: Move read-only reports, dashboards, ETL read phases, and BI queries to the secondary. This reduces primary load significantly for read-heavy workloads.
Backup consolidation: Take log backups from the secondary. This offloads backup I/O from the primary and lets you run backups more frequently without impacting production I/O.
Consistency caveats: This is the part people miss. A readable secondary applies the log asynchronously, which means there's always some lag. The data on the secondary is seconds behind the primary — not milliseconds. Queries that need point-in-time consistency with the primary should run on the primary. On the secondary, use SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED or NOLOCK hints — blocking reads can hold up redo and cause the secondary to fall behind.

-- Verify secondary is accepting connections and how far behind it is
SELECT ar.replica_server_name,
       drs.is_primary_replica,
       drs.synchronization_state_desc,
       drs.log_send_queue_size,
       drs.log_send_rate,
       drs.redo_queue_size,
       drs.redo_rate,
       drs.last_commit_time,
       -- Compute lag in seconds
       DATEDIFF(SECOND, drs.last_commit_time, GETDATE()) AS estimated_lag_seconds
FROM sys.dm_hadr_database_replica_states drs
JOIN sys.availability_replicas ar ON drs.replica_id = ar.replica_id
WHERE drs.is_local = 0;

Readable secondaries also open up the possibility of running filtered workloads. If your application can handle slightly stale data and you have a reporting cluster that doesn't need to hit the primary, route those queries to the secondary. The connection string for this:

Server=AG-Listener,1433;Database=YourDatabase;
Integrated Security=True;
ApplicationIntent=ReadOnly;
MultiSubnetFailover=True;

The ApplicationIntent=ReadOnly connection property is the key. It tells the listener to route the connection to a readable secondary (or the primary if no readable secondary is available and AllowReadOnlyIntent is configured). Without it, all connections go to the primary regardless of read/write intent.

Readable Intent and Read-Only Routing

There's a third option between "no connections" and "fully readable": read-only intent only. Configure it like this:

ALTER AVAILABILITY GROUP [AG_Production]
MODIFY REPLICA ON N'SECONDARY-NODE'
WITH (SECONDARY_ROLE(ALLOW_CONNECTIONS = READ_ONLY));

-- Also configure read-only routing on the primary
ALTER AVAILABILITY GROUP [AG_Production]
MODIFY REPLICA ON N'PRIMARY-NODE'
WITH (PRIMARY_ROLE(READ_ONLY_ROUTING_LIST = (N'SECONDARY-NODE', N'TERTIARY-NODE')));

ALTER AVAILABILITY GROUP [AG_Production]
MODIFY REPLICA ON N'SECONDARY-NODE'
WITH (SECONDARY_ROLE(READ_ONLY_ROUTING_URL = N'TCP://secondary-node.domain.com:1433'));

ALTER AVAILABILITY GROUP [AG_Production]
MODIFY REPLICA ON N'TERTIARY-NODE'
WITH (SECONDARY_ROLE(READ_ONLY_ROUTING_URL = N'TCP://tertiary-node.domain.com:1433'));
GO

With this in place, a connection string with ApplicationIntent=ReadOnly will be routed to the secondary — and if the secondary is unavailable, it routes to the tertiary. This is a proper read scale-out setup.

Failover Modes: Automatic, Manual, and Forced

There are three types of failover. Understanding which one applies to each replica is essential for designing your HA strategy.

Automatic Failover

Automatic failover requires three things: synchronous commit mode, the replica configured with FAILOVER\_MODE = AUTOMATIC, and cluster quorum. When the primary becomes unavailable (service down, critical SQL error, node failure), the WSFC detects the failure within the health check timeout window (controlled by HEALTH\_CHECK\_TIMEOUT in the AG definition), promotes a synchronous secondary, and clients reconnect through the listener. Target RTO: under a minute for most configurations.

The important detail: automatic failover only promotes a synchronous secondary. An asynchronous secondary cannot be automatically promoted because it might have un-hardened transactions — promoting it would mean data loss. If your only automatic failover candidate is asynchronous, you don't have an auto-failover path.

-- Check failover readiness
SELECT ag.name,
       ar.replica_server_name,
       ar.failover_mode_desc,
       ar.availability_mode_desc,
       ars.role_desc,
       CASE WHEN ars.role_desc = 'PRIMARY' THEN 'Source'
            WHEN ar.availability_mode_desc = 'SYNCHRONOUS_COMMIT'
             AND ar.failover_mode_desc = 'AUTOMATIC' THEN 'Auto-failover candidate'
            WHEN ar.availability_mode_desc = 'SYNCHRONOUS_COMMIT' THEN 'Manual-only'
            ELSE 'Async / No auto-failover' END AS failover_role
FROM sys.availability_groups ag
JOIN sys.availability_replicas ar ON ag.group_id = ar.group_id
JOIN sys.dm_hadr_availability_replica_states ars ON ar.replica_id = ars.replica_id;

Manual Failover

Manual failover is controlled — you initiate it, the cluster performs it, and no data is lost (only valid with synchronous commit). This is what you use during maintenance windows. The command:

-- Run on the SECONDARY (the replica you want to promote)
ALTER AVAILABILITY GROUP [AG_Production] FAILOVER;
GO

The old primary becomes a secondary and starts synchronizing from the new primary. Your applications reconnect through the listener and resume operations. There is no data loss with synchronous commit because the secondary was fully synchronized before the failover.

Manual failover is your primary tool for rolling patches. Take the secondary offline for SQL Server updates, fail over to it as the new primary, update the old primary, rejoin it as a secondary. Zero downtime, zero data loss, zero disruption to applications.

Forced Failover (With Data Loss Risk)

Forced failover exists for one scenario: the primary is gone and no synchronous secondary is available. You cannot wait — the database is unreachable from the primary and you need to restore service. This is the "this is the only option left" button.

-- Run on the ASYNCHRONOUS secondary that you're promoting
-- THIS IS DANGEROUS — you will lose data
ALTER AVAILABILITY GROUP [AG_Production] FORCE_FAILOVER_ALLOW_DATA_LOSS;
GO

"Allow data loss" means exactly that: any transactions that were in the log stream but not yet hardened on the promoted secondary are gone. The old primary, when it comes back online, will see that its log sequence number is ahead of what the new primary has. SQL Server will mark the old primary's database as "restoring" and you will need to manually reconcile data or restore from backup.

Design your environment so forced failover is a last resort, not a plan. If you have only asynchronous secondaries in production, you have no zero-RPO path — accept that and plan your RPO accordingly, but don't pretend forced failover is a reliable HA mechanism.

Distributed Availability Groups

Distributed AGs are a separate construct. Instead of replicating within a single cluster, they replicate between two separate availability groups, each in its own WSFC, potentially in different data centers or even cloud regions. This is the architecture that lets you stretch an AG across Azure regions or two distant data centers without forcing every transaction to wait for a cross-continent round-trip.

The primary AG (on cluster A) forwards the log to the forwarder replica on cluster B. Cluster B's secondary replicas replicate locally from the forwarder. The cross-cluster replication can be synchronous or asynchronous; the local replication within cluster B is independent.

-- On the PRIMARY CLUSTER:
CREATE AVAILABILITY GROUP [AG_PrimaryCluster]
WITH (CLUSTER_TYPE = NONE)  -- External cluster type for distributed AG
FOR DATABASE [YourDatabase]
REPLICA ON
    N'PrimaryNode' WITH (
        ENDPOINT_URL = N'TCP://PrimaryNode.domain.com:5022',
        AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
        SEEDING_MODE = AUTOMATIC
    );

-- On the SECONDARY CLUSTER:
CREATE AVAILABILITY GROUP [AG_SecondaryCluster]
WITH (CLUSTER_TYPE = NONE)
FOR DATABASE [YourDatabase]
REPLICA ON
    N'SecondaryNode' WITH (
        ENDPOINT_URL = N'TCP://SecondaryNode.domain.com:5022',
        AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
        SEEDING_MODE = AUTOMATIC
    );

-- On the PRIMARY CLUSTER, create the distributed AG:
CREATE AVAILABILITY GROUP [DistributedAG]
WITH (DISTRIBUTED, CLUSTER_TYPE = NONE)
AVAILABILITY GROUP ON
    'AG_PrimaryCluster' WITH (
        LISTENER_URL = 'TCP://PrimaryListener.domain.com:5022',
        AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,
        SEEDING_MODE = AUTOMATIC
    ),
    'AG_SecondaryCluster' WITH (
        LISTENER_URL = 'TCP://SecondaryListener.domain.com:5022',
        AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,
        SEEDING_MODE = AUTOMATIC
    );

The CLUSTER\_TYPE = NONE setting is key — it tells SQL Server this is a distributed AG not tied to a specific WSFC. You can also use CLUSTER\_TYPE = NONE for clusterless AGs (no WSFC at all), which I'll cover below.

Distributed AGs are best suited for:

Cross-region DR where you can't tolerate cross-continent synchronous commit latency
Migration scenarios where you need to keep two sites synchronized during a cutover window
Scale-out read workloads that need to span two data centers without a single point of failure

They add operational complexity — failover is manual and must be coordinated across both clusters. Know what you're getting into before deploying them.

Clusterless Availability Groups

SQL Server 2019 introduced CLUSTER\_TYPE = NONE — the ability to create an AG without WSFC at all. This eliminates the cluster dependency and simplifies deployment in environments where you don't have Windows Server licensing for cluster nodes, or where your infrastructure team doesn't manage clusters.

The tradeoff: no automatic failover. Without a cluster to coordinate failover, you must manually initiate any failover. There is also no listener — you must manage connection routing yourself or use a connection string that lists all replica IPs.

-- Create a clusterless AG (no WSFC dependency)
CREATE AVAILABILITY GROUP [AG_Clusterless]
WITH (CLUSTER_TYPE = NONE)
FOR DATABASE [YourDatabase]
REPLICA ON
    N'Node1' WITH (
        ENDPOINT_URL = N'TCP://Node1.domain.com:5022',
        AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
        FAILOVER_MODE = MANUAL,
        SEEDING_MODE = AUTOMATIC,
        SECONDARY_ROLE(ALLOW_CONNECTIONS = YES)
    ),
    N'Node2' WITH (
        ENDPOINT_URL = N'TCP://Node2.domain.com:5022',
        AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
        FAILOVER_MODE = MANUAL,
        SEEDING_MODE = AUTOMATIC,
        SECONDARY_ROLE(ALLOW_CONNECTIONS = YES)
    );
GO

-- Join secondary on Node2
ALTER AVAILABILITY GROUP [AG_Clusterless] JOIN;
ALTER AVAILABILITY GROUP [AG_Clusterless] GRANT CREATE ANY DATABASE;

For applications that can't tolerate a listener, you can configure a read-scale-out AG with multiple readable replicas. The application connects using MultiSubnetFailover=True and lists all replicas in the connection string's server list:

Server=Node1,5022;Node2,5022;Database=YourDatabase;
MultiSubnetFailover=True;
ApplicationIntent=ReadOnly;

With MultiSubnetFailover, the client attempts connections to all IPs in parallel — whichever responds first gets the connection. This gives you load balancing across read replicas without a listener.

Monitoring AG Health: DMVs That Actually Tell You Something

Most people look at the SSMS dashboard and call it done. The dashboard is useful for a quick sanity check, but it hides the detail you need when something goes wrong. These queries give you the full picture.

Replica-level health

SELECT ag.name AS ag_name,
       ar.replica_server_name,
       ar.availability_mode_desc,
       ar.failover_mode_desc,
       ars.role_desc,
       ars.operational_state_desc,       -- ONLINE, PENDING, OFFLINE, etc.
       ars.recovery_health_desc,         -- ONLINE, DEGRADED, OFFLINE
       ars.synchronization_health_desc,  -- HEALTHY, PARTIALLY_HEALTHY, UNHEALTHY
       ars.connected_state_desc,         -- CONNECTED, DISCONNECTED, RE_CONNECTING
       DATEDIFF(SECOND, ars.last_commit_time, GETDATE()) AS last_commit_lag_s
FROM sys.availability_groups ag
JOIN sys.availability_replicas ar ON ag.group_id = ar.group_id
JOIN sys.dm_hadr_availability_replica_states ars ON ar.replica_id = ars.replica_id
ORDER BY ag.name, ar.replica_server_name;

The column you watch most: synchronization\_health\_desc. HEALTHY means the replica is fully synchronized and keeping up. UNHEALTHY means there's a problem — usually the replica is disconnected or the log stream has stalled. PARTIALLY\_HEALTHY is the gray zone — the replica is connected but falling behind.

Database-level detail

SELECT ag.name AS ag_name,
       ar.replica_server_name,
       d.name AS database_name,
       drs.is_local,
       drs.is_primary_replica,
       drs.synchronization_state_desc,   -- SYNCHRONIZED, SYNCHRONIZING, NOT SYNCHRONIZING
       drs.synchronization_health_desc,
       drs.log_send_queue_size / 1024.0 AS log_send_queue_mb,
       drs.log_send_rate / 1024.0 AS log_send_rate_mbps,
       drs.redo_queue_size / 1024.0 AS redo_queue_mb,
       drs.redo_rate / 1024.0 AS redo_rate_mbps,
       drs.last_commit_time,
       DATEDIFF(SECOND, drs.last_commit_time, GETDATE()) AS estimated_redo_lag_s
FROM sys.dm_hadr_database_replica_states drs
JOIN sys.availability_replicas ar ON drs.replica_id = ar.replica_id
JOIN sys.availability_groups ag ON drs.group_id = ag.group_id
JOIN sys.databases d ON drs.database_id = d.database_id
ORDER BY ag.name, ar.replica_server_name, d.name;

log\_send\_queue\_size and redo\_queue\_size are your lag indicators. These are measured in KB of log not yet sent and KB of log not yet redone on the secondary. When these start growing, your replica is falling behind the primary. When they don't shrink, you have a problem.

For a healthy synchronous replica under normal load, both queues should be near zero. Growth that persists (not a brief spike during peak load) means either the secondary can't keep up with redo (CPU, I/O, or memory pressure on the secondary) or the network can't ship log fast enough.

Listener and routing health

-- Check listener status
SELECT agl.dns_name,
       agl.port,
       agl.ip_configuration_string_from_cluster,
       ag.name AS ag_name,
       ar.replica_server_name,
       ars.role_desc
FROM sys.availability_group_listeners agl
JOIN sys.availability_groups ag ON agl.group_id = ag.group_id
JOIN sys.availability_replicas ar ON ag.group_id = ar.group_id
JOIN sys.dm_hadr_availability_replica_states ars ON ar.replica_id = ars.replica_id
WHERE ars.role_desc = 'PRIMARY';

-- Check read-only routing configuration
SELECT ag.name AS ag_name,
       ar.replica_server_name,
       ar.role_desc,
       rrl.routing_priority,
       rrl.routing_list,
       ar.read_only_routing_url
FROM sys.availability_groups ag
JOIN sys.availability_replicas ar ON ag.group_id = ar.group_id
LEFT JOIN sys.dm_hadr_availability_replica_states ars ON ar.replica_id = ars.replica_id
LEFT JOIN (
    SELECT replica_id, string_agg(replica_server_name, ', ') WITHIN GROUP (ORDER BY routing_priority) AS routing_list
    FROM sys.availability_read_only_routing_lists
    GROUP BY replica_id
) rrl ON ar.replica_id = rrl.replica_id;

A Practical Multi-Replica Configuration Walkthrough

Let's put all of this together with a real-world scenario: a three-replica AG with a primary for writes, a synchronous readable secondary for reports, and an asynchronous secondary for DR. This is a common production configuration.

Prerequisites recap

All nodes are in the same AD domain, running SQL Server 2022 Enterprise Edition, Windows Server 2022, and have been added to a WSFC with a file share witness for quorum. The service accounts are domain accounts with permissions to create computer objects in AD (for the listener).

All three nodes have Always On enabled and the mirroring endpoints created and running on port 5022 with Windows authentication and AES encryption.

Step 1: Create the AG with three replicas

-- Run on PRIMARY (Node1)
CREATE AVAILABILITY GROUP [AG_Production]
WITH (
    AUTOMATED_BACKUP_PREFERENCE = SECONDARY,  -- Backup from secondary if available
    FAILURE_CONDITION_LEVEL = 3,              -- Failover on critical conditions
    HEALTH_CHECK_TIMEOUT = 30000,            -- 30 second health check timeout
    DB_FAILOVER = ON                          -- Database-level failover
)
FOR DATABASE [YourAppDB]
REPLICA ON
    -- PRIMARY: all writes happen here
    N'Node1.domain.com' WITH (
        ENDPOINT_URL = N'TCP://Node1.domain.com:5022',
        AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
        FAILOVER_MODE = AUTOMATIC,            -- Auto-failover to sync secondaries only
        SEEDING_MODE = AUTOMATIC,
        SECONDARY_ROLE(ALLOW_CONNECTIONS = NO)  -- No read access on primary
    ),
    -- SECONDARY SYNC: readable, auto-failover target, backup source
    N'Node2.domain.com' WITH (
        ENDPOINT_URL = N'TCP://Node2.domain.com:5022',
        AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
        FAILOVER_MODE = AUTOMATIC,            -- Can be auto-promoted
        SEEDING_MODE = AUTOMATIC,
        SECONDARY_ROLE(
            ALLOW_CONNECTIONS = READ_ONLY,   -- Report queries only
            READ_ONLY_ROUTING_URL = N'TCP://Node2.domain.com:1433'
        )
    ),
    -- SECONDARY ASYNC: DR replica, NOT auto-failover candidate, manual failover only
    N'Node3.domain.com' WITH (
        ENDPOINT_URL = N'TCP://Node3.domain.com:5022',
        AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,  -- No auto-failover (data loss risk)
        FAILOVER_MODE = MANUAL,                -- Manual-only
        SEEDING_MODE = AUTOMATIC,
        SECONDARY_ROLE(ALLOW_CONNECTIONS = NO)  -- No connections during DR testing
    );
GO

Step 2: Configure read-only routing on the primary

The primary needs to know where to route ApplicationIntent=ReadOnly connections. Without this, read-only routing won't work even if the secondaries allow connections.

-- Configure read-only routing list (read-only queries go to this list in order)
ALTER AVAILABILITY GROUP [AG_Production]
MODIFY REPLICA ON N'Node1.domain.com'
WITH (PRIMARY_ROLE(READ_ONLY_ROUTING_LIST = (N'Node2.domain.com')));
GO

-- Grant routing permissions on the secondary
ALTER AVAILABILITY GROUP [AG_Production]
MODIFY REPLICA ON N'Node2.domain.com'
WITH (SECONDARY_ROLE(ALLOW_CONNECTIONS = READ_ONLY));
GO

Step 3: Join secondaries and start seeding

-- Run on Node2 (sync secondary)
ALTER AVAILABILITY GROUP [AG_Production] JOIN;
ALTER AVAILABILITY GROUP [AG_Production] GRANT CREATE ANY DATABASE;

-- Run on Node3 (async secondary)
ALTER AVAILABILITY GROUP [AG_Production] JOIN;
ALTER AVAILABILITY GROUP [AG_Production] GRANT CREATE ANY DATABASE;

Step 4: Create the listener

-- Run on primary
ALTER AVAILABILITY GROUP [AG_Production]
ADD LISTENER N'AG-Production-Listener' (
    WITH IP (
        (N'192.168.1.50', N'255.255.255.0'),   -- Node1/Node2 subnet
        (N'10.0.1.50', N'255.255.255.0')        -- Node3 DR subnet
    ),
    PORT = 1433
);
GO

Two IPs in the listener for multi-subnet support. Applications connect with MultiSubnetFailover=True and will connect to whichever IP responds first. The cluster owns the listener IP — whichever node is primary will register its local IP with the listener DNS name.

Step 5: Verify the full picture

-- Verify all three replicas are healthy
SELECT ar.replica_server_name,
       ar.availability_mode_desc,
       ar.failover_mode_desc,
       ars.role_desc,
       ars.synchronization_health_desc,
       ars.connected_state_desc
FROM sys.availability_groups ag
JOIN sys.availability_replicas ar ON ag.group_id = ar.group_id
JOIN sys.dm_hadr_availability_replica_states ars ON ar.replica_id = ars.replica_id
WHERE ag.name = 'AG_Production';

-- Verify database synchronization
SELECT d.name AS database_name,
       drs.synchronization_state_desc,
       drs.log_send_queue_size / 1024.0 AS log_send_queue_mb,
       drs.redo_queue_size / 1024.0 AS redo_queue_mb,
       DATEDIFF(SECOND, drs.last_commit_time, GETDATE()) AS lag_seconds
FROM sys.dm_hadr_database_replica_states drs
JOIN sys.databases d ON drs.database_id = d.database_id
JOIN sys.availability_replicas ar ON drs.replica_id = ar.replica_id
JOIN sys.availability_groups ag ON drs.group_id = ag.group_id
WHERE ag.name = 'AG_Production';

-- Verify read-only routing works
SELECT ar.replica_server_name,
       ar.secondary_role_allow_connections_desc,
       ar.read_only_routing_url,
       ars.role_desc
FROM sys.availability_replicas ar
JOIN sys.dm_hadr_availability_replica_states ars ON ar.replica_id = ars.replica_id
JOIN sys.availability_groups ag ON ar.group_id = ag.group_id
WHERE ag.name = 'AG_Production';

Failover test sequence

After verification, test in this order:

1. Manual failover to Node2 (synchronous secondary): ALTER AVAILABILITY GROUP [AG\_Production] FAILOVER; on Node2. Verify Node2 is primary, Node1 is synchronized secondary, Node3 (async) is still catching up. Applications reconnect transparently via listener.

2. Manual failback to Node1: Run the same failover command from Node1 (now secondary) back to Node1. Verify all three replicas return to SYNCHRONIZED.

3. Automatic failover test: Stop SQL Server service on the primary (Node1). Verify Node2 automatically promotes to primary within the health check timeout. Verify Node1, when restarted, joins as a synchronized secondary.

4. Async DR path test: Verify Node3 (async) can be manually failed to if needed — this is your forced failover drill. Confirm you understand the data loss window by comparing log\_send\_queue\_size on Node3 at the time of the test.

What Happens During Failover: The Sequence

When an automatic failover occurs, here's what happens end-to-end:

1. Health check fails: WSFC health check (every HEALTH\_CHECK\_TIMEOUT milliseconds, default 30000ms) finds the primary unresponsive.

2. Quorum confirms: The cluster confirms it has quorum (majority of votes). If quorum is lost, no failover happens — the cluster stops.

3. Cluster initiates failover: WSFC promotes the target synchronous secondary. SQL Server validates the secondary is in SYNCHRONIZED state (if not, failover is blocked to prevent data loss).

4. DNS registration moves: The listener IP is deregistered from the old primary's network interface and registered on the new primary's interface.

5. Clients reconnect: Applications with MultiSubnetFailover=True detect the connection drop, attempt reconnect, and the listener DNS resolves to the new primary IP. Non-MultiSubnetFailover clients wait for DNS TTL to expire.

6. Old primary recovers: When the old primary comes back online, it rejoins the cluster as a secondary and begins synchronizing from the new primary. RTO ends when the old primary is SYNCHRONIZED.

The total time is typically 20-60 seconds from service stop to application reconnection, depending on network latency, health check timeout settings, and cluster failover speed. If your health check timeout is 30 seconds and your application timeout is 15 seconds, you will see application failures during failover — increase the application connection timeout to 60+ seconds to ride through the window.

Standard Edition vs Enterprise: What You Get and What You Don't

If you're on Standard Edition, the feature set is different. Enterprise Edition gets up to 8 replicas (including the primary). Standard Edition gets 2 replicas total (primary + 1 secondary). The replicas in Standard Edition can be synchronous commit and support automatic failover — but you only have one secondary, so there's no read offload capability unless you use the single secondary for read-only traffic.

Distributed Availability Groups are Enterprise-only. If you're on Standard and want cross-region replication, you're looking at log shipping or database mirroring (which is deprecated but still functional in all editions) — not AGs.

Basic Availability Groups in Standard Edition 2016+ support a single database on a single replica pair, which is essentially a basic mirroring replacement. It's simpler to manage but doesn't give you the multi-replica flexibility of Enterprise.

For a detailed comparison of what Standard Edition supports vs Enterprise — including the distributed and clusterless AG configurations — see the deep dive on Standard Edition, Distributed AGs, and Clusterless AGs.

The DMVs You Need in Your Monitoring Scripts

Here's a production-ready monitoring query you can schedule as a SQL Agent job or wire into your monitoring platform:

-- Production AG health check — run as an alert query
-- Alert if any replica is unhealthy or any database is out of sync

-- Replica health check
SELECT ag.name AS ag_name,
       ar.replica_server_name,
       ar.availability_mode_desc,
       ar.failover_mode_desc,
       ars.role_desc,
       ars.synchronization_health_desc,
       ars.connected_state_desc,
       CASE WHEN ars.synchronization_health_desc != 'HEALTHY' THEN 1 ELSE 0 END AS is_unhealthy
FROM sys.availability_groups ag
JOIN sys.availability_replicas ar ON ag.group_id = ar.group_id
JOIN sys.dm_hadr_availability_replica_states ars ON ar.replica_id = ars.replica_id
WHERE ars.synchronization_health_desc != 'HEALTHY'
   OR ars.connected_state_desc != 'CONNECTED';

-- Database-level lag check (alert if queues are growing)
SELECT ag.name AS ag_name,
       ar.replica_server_name,
       d.name AS database_name,
       drs.synchronization_state_desc,
       drs.log_send_queue_size / 1024.0 AS log_send_queue_mb,
       drs.redo_queue_size / 1024.0 AS redo_queue_mb,
       DATEDIFF(SECOND, drs.last_commit_time, GETDATE()) AS estimated_lag_seconds
FROM sys.dm_hadr_database_replica_states drs
JOIN sys.availability_replicas ar ON drs.replica_id = ar.replica_id
JOIN sys.availability_groups ag ON drs.group_id = ag.group_id
JOIN sys.databases d ON drs.database_id = d.database_id
WHERE drs.synchronization_state_desc != 'SYNCHRONIZED'
   OR drs.redo_queue_size > 102400  -- Alert if redo queue > 100MB
   OR DATEDIFF(SECOND, drs.last_commit_time, GETDATE()) > 30;  -- Alert if lag > 30s

Replace the hard-coded thresholds with your actual SLA numbers. The 100MB redo queue threshold is a starting point — adjust based on your RPO requirements and your secondary's I/O capacity. A well-configured synchronous replica should have sub-second lag under normal load. If you have a 30-second lag during normal operations, something is wrong.

Wrapping Up

Always On Availability Groups are powerful but they have sharp edges. The key points that come up most in production support calls:

Readable secondaries are real — they offload read traffic, but they're always a few seconds behind the primary. Know your RPO and adjust expectations accordingly.
Automatic failover needs synchronous commit — if you have an async-only DR replica, you don't have an auto-failover path from it. Plan for manual failover to your async secondary if the primary goes down.
Logins and jobs don't fail over — this is the one that surprises people the most. You must synchronize them manually or through a scheduled process.
Monitor queue sizes, not just state — a SYNCHRONIZING state that doesn't clear is a problem. Watch redo\_queue\_size as your primary indicator of secondary health.
Distributed AGs and clusterless AGs are different tools — use distributed AGs for cross-cluster/cross-region replication; use clusterless AGs when you don't want WSFC dependency. Neither replaces a basic synchronous AG.

For the step-by-step setup and the prerequisites checklist, see the complete setup guide. For the failover testing runbook, see AG failover testing: a production runbook that actually works.

If you're designing an AG environment and want a second set of eyes on the replica configuration, sync vs async allocation, or failover topology — start the free environment assessment. I'll review your current configuration and flag the gaps before they become incidents.

Get your free database health score

Find out exactly where your database is vulnerable before it causes an incident. 30+ years of DBA expertise, distilled into a single assessment.

Get my free health score

Download the DBA checklist

← All posts

SQL Server Always On Availability Groups: A Deep Dive into Architecture, Replica Roles, and Failover

SQL Server Always On Availability Groups: A Deep Dive into Architecture, Replica Roles, and Failover

The Architecture Is Layered, Not Flat

Replicas: Roles, Modes, and What Each State Means

Primary Replica

Secondary Replicas

Availability Mode: Synchronous vs Asynchronous

Readable Secondaries

Readable Intent and Read-Only Routing

Failover Modes: Automatic, Manual, and Forced

Automatic Failover

Manual Failover

Forced Failover (With Data Loss Risk)

Distributed Availability Groups

Clusterless Availability Groups

Monitoring AG Health: DMVs That Actually Tell You Something

Replica-level health

Database-level detail

Listener and routing health

A Practical Multi-Replica Configuration Walkthrough

Prerequisites recap

Step 1: Create the AG with three replicas

Step 2: Configure read-only routing on the primary

Step 3: Join secondaries and start seeding

Step 4: Create the listener

Step 5: Verify the full picture

Failover test sequence

What Happens During Failover: The Sequence

Standard Edition vs Enterprise: What You Get and What You Don't

The DMVs You Need in Your Monitoring Scripts

Wrapping Up

Get your free database health score

Keep reading

Goodbye DMA, Hello SSMS Migration Extension: Migrating to SQL Server 2022 and Azure

Migrating SQL Server 2017 to 2025 Using a Distributed Availability Group — A Real-World Proof of Concept

The Ghost of TDE Past: When a One-Year-Old Certificate Still Broke My Restore