June 28, 2026DBA

Scaling Beyond the LAN: A Practical Guide to Distributed Availability Groups

Distributed Availability Groups allow you to span SQL Server clusters across disparate networks without the headache of a quorum-based cluster stretch. This guide covers the architectural fundamentals and the T-SQL required to get them running in production.

Why Distributed Availability Groups Matter

Traditional Always On Availability Groups (AGs) are tied to a single Windows Server Failover Cluster (WSFC) or a Linux Pacemaker cluster. This is fine for local high availability, but when you need to span geographic regions or migrate data between totally different domains, the 'Stretch Cluster' model becomes a nightmare. You end up fighting quorum witness issues, latency sensitivity, and complex networking.

Distributed Availability Groups (DAGs) solve this by acting as an 'AG of AGs.' It treats two independent clusters as separate entities, using a database mirroring endpoint to ship logs between them. The primary cluster handles local HA, while the secondary cluster (the forwarder) receives the stream and distributes it to its local nodes. This decoupling is the cleanest way to handle disaster recovery (DR) and large-scale migrations.

The Architecture: Primary and Forwarder

In a standard AG, every secondary node connects to the primary. In a Distributed AG, the architecture changes. You have a Global Primary on the first cluster. It sends data to a forwarder on the second cluster. The forwarder is the primary of its own local AG and is responsible for updating its own secondary replicas.

This reduces the outgoing bandwidth load on the Global Primary because it only sends one stream of data to the remote site, regardless of how many replicas are waiting on the other end.

Step 1: Preparing the Endpoints

Before you touch the DAG syntax, you must ensure your local AGs are healthy. For this example, let's assume we have AG_Local on Cluster A and AG_Remote on Cluster B.

Crucially, your endpoints must be accessible across the network. If these clusters are in different domains, you need to use certificate-based authentication for the mirroring endpoints. If they are in the same domain or trusted domains, Windows Auth works fine.

-- Run on all nodes to ensure endpoints exist
SELECT name, state_desc, port FROM sys.database_mirroring_endpoints;

Step 2: Creating the Distributed AG on Cluster A

You initiate the DAG from the current Global Primary. Note that the LISTENER_URL refers to the VNN (Virtual Network Name) of the local and remote AG listeners.

CREATE AVAILABILITY GROUP [Distro_AG]
WITH (DISTRIBUTED)
AVAILABILITY GROUP ON
'AG_Local' WITH
(
    LISTENER_URL = 'tcp://AGLocalListener.domain.com:5022',
    AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,
    FAILOVER_MODE = MANUAL,
    SEEDING_MODE = AUTOMATIC
),
'AG_Remote' WITH
(
    LISTENER_URL = 'tcp://AGRemoteListener.domain.com:5022',
    AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,
    FAILOVER_MODE = MANUAL,
    SEEDING_MODE = AUTOMATIC
);

Step 4: Joining Cluster B

Now, head over to the primary node of the second cluster. You need to join the second AG to the distributed group.

ALTER AVAILABILITY GROUP [Distro_AG]
JOIN
AVAILABILITY GROUP ON
'AG_Local' WITH
(
    LISTENER_URL = 'tcp://AGLocalListener.domain.com:5022',
    AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,
    FAILOVER_MODE = MANUAL,
    SEEDING_MODE = AUTOMATIC
),
'AG_Remote' WITH
(
    LISTENER_URL = 'tcp://AGRemoteListener.domain.com:5022',
    AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,
    FAILOVER_MODE = MANUAL,
    SEEDING_MODE = AUTOMATIC
);

Common Gotchas and Production Reality

1. Seeding Failures: Automatic seeding is a godsend, but it requires the file paths to be identical on both clusters. If Cluster A uses D:\Data and Cluster B uses E:\SQLData, automatic seeding will hang. In those cases, use manual seeding (backup/restore with NORECOVERY).

2. Listener Issues: The Distributed AG communicates via the Listeners. If your listeners aren't configured with the correct static IPs or if there is a firewall blocking port 5022 between the two subnets, the DAG will stay in a 'DISCONNECTED' state.

3. Transaction Log Growth: If the link between Cluster A and Cluster B goes down, the transaction log on Cluster A will grow until the link is restored. Monitor your disk space aggressively.

4. Routing: Read-only routing stays local to the cluster by default. If you want to read from the remote cluster, you connect to the local listener of the remote AG.

Monitoring the Pipeline

You won't see the Distributed AG details clearly in the standard SSMS Dashboard. You need to rely on DMVs. The most important one is sys.dm_hadr_distributed_availability_group_states.

Use this query to check the health of the synchronization:

SELECT 
    ag.name AS [Distributed_AG_Name], 
    ar.replica_server_name AS [Remote_AG_Name], 
    ls.synchronization_state_desc, 
    ls.last_hardened_lsn
FROM sys.availability_groups AS ag
JOIN sys.availability_replicas AS ar ON ag.group_id = ar.group_id
JOIN sys.dm_hadr_availability_replica_states AS ls ON ar.replica_id = ls.replica_id
WHERE ag.is_distributed = 1;

If synchronization_state_desc stays at NOT CONFIGURED for too long, check sys.dm_hadr_automatic_seeding to see if the engine is struggling to move the initial bits.

Final Word

Distributed AGs are arguably one of the most powerful features added to SQL Server in the last decade. They provide a clean boundary between environments. Whether you are performing a zero-downtime migration to the cloud or setting up a robust DR site, mastering this setup is mandatory for any modern DBA. Keep your listeners reachable, your paths consistent, and your logs monitored, and you will have a rock-solid data pipeline.


← All posts

Keep reading