How to perform Index maintenance of large tables in availability groups? | Index maintenance in availability groups for 10 TB Database

This article some best practices about

How to perform Index maintenance of large tables in availability groups? OR Index maintenance in availability groups for 10 TB Database

What is index in SQL Server?

Index is an object in SQL Server database which can be used to retrieve data faster and improve query performance. Mainly there are 2 types of index Clustered and Non-Clustered index. Data get physically sorted as per clustered index column.

What is index maintenance and why it is required?

Index getting used in SQL query execution plan and help to run query faster. But when insert, delete queries execute it also changes page orders and add fragmentation in indexes. For effective use of indexes you have to manage fragmentation in index which can be done by rebuild or reorganizing indexes.

Reorganize index : As a best practice if index fragmentation is less than 30 you can reorganize the index. This just reorder the pages in index. This operation is online but single threaded so if table size is huge it takes a lot of time to complete.

Rebuild index : Rebuild index will recreate index by dropping existing index. There are 2 option to rebuild index.

Rebuild online : In this operation it create replica of index so that index not getting locked. This required double the size of index in log file drive.

Rebuild offline : This will lock entire index till it get re created. Less space required than rebuild online operation.

Best approach for How to perform Index maintenance of large tables in availability groups

How availability group works in SQL Server?

Availability group has multiple node part of it and whenever any data update happen on primary node it has to synchronize with all other node.

If AG mode is asynchronous commit then data get committed on primary node without waiting for secondary node data commit.

If AG mode is synchronous commit then data first get committed on secondary node first and then get committed on primary node.

Best Practice to perform index maintenance on availability group or Index maintenance in availability groups for 10 TB Database

If database is in availability group performing index maintenance is complex task if database size is huge like more than 5 TB with multiple indexes on large tables.

Step 1 : Identify database having size more than 5 TB and schedule index maintenance in separate SQL job. If index size is huge like 500 GB+ then you can perform maintenance for individual index in SQL job.
Step 2 : Make sure your transaction log backup jobs is working properly and able to take log backup whenever index maintenance run. This will help to manage log file growth and log drive should not get full.
Step 3 : If your database size is huge then make sure database data and log files are on separate drive and also have enough free space.
Step 5 : If your index size is more than 500 GB, 
Then make sure you have at least 1.5 to 2 TB free space on log file drive to rebuild index online. 
If rebuild offline then it required less space but object will be locked and not available for query execution till index get rebuild.

Script to check index size :

SELECTOBJECT_SCHEMA_NAME(i.OBJECT_ID) AS SchemaName,OBJECT_NAME(i.OBJECT_ID) AS TableName,i.name AS IndexName,i.index_id AS IndexID,8 * SUM(a.used_pages) AS 'Indexsize(KB)'FROM sys.indexes AS iJOIN sys.partitions AS p ON p.OBJECT_ID = i.OBJECT_ID AND p.index_id = i.index_idJOIN sys.allocation_units AS a ON a.container_id = p.partition_id
WHERE OBJECT_NAME(i.OBJECT_ID) = 'table name'GROUP BY i.OBJECT_ID,i.index_id,i.nameORDER BY OBJECT_NAME(i.OBJECT_ID),i.index_id
Step 4 : if your index size is more than 500 GB and your index job having fragmentation level less than 5 ,which perform reorganize index this process is take huge time to complete as recognize index is single thread operation. Like it may takes in days. To avoid this, rebuild index directly  if fragmentation will increase more than 50 percent and in other case you can just update the statistics.
Step 5 : If your index size is more than 500 GB, 
Then make sure you have at least 1.5 to 2 TB free space on log file drive to rebuild index online. 
If rebuild offline then it required less space but object will be locked and not available for query execution till index get rebuild.
Step 6 : If your availability group, you have any synchronous commit node.
When you rebuild index online , log file start growing and that logs need to be transfer on all availability replica.

If your secondary replica is in asynchronous mode, once data get committed on primary without waiting for data commit on secondary node. So in this case when index size is 500+ GB , then rebuild index logs quickly transfer on secondary node.
Step 6 : If your secondary replica is in synchronous mode, Data get committed only on primary when data commit on secondary node. So in this case when index size is 500+ GB , then rebuild index logs will wait till data committed on secondary node and this add latency in data sync process.

We can see HAR_SYNC_COMMIT wait type in SQL Server process. This will halt connection and will stop log backup till the data get committed and you will see below error while taking Log backup on secondary.

Msg 35250, Level 16, State 11, Line 15
The connection to the primary replica is not active.  The command cannot be processed.
Msg 3013, Level 16, State 1, Line 15
BACKUP LOG is terminating abnormally.

Once log backup stop, index rebuild will grow your log file and also size of redo logs need to be sync on AG replica and till the time your data sync completed on secondary node log will grow and there are chance your disk get full and database will go out of sync and not accessible on Primary as well.
Step 7 : If you get error in step 6  and log backup stopped and you are not able to take log backup manually on any replica then change your replica mode from synchronous to asynchronous.
Once you change the mode , data will start syncing on all replica and redo log size will reduce slowly and also your transaction log backups will start working that will help to manage log file usage and avoid disk full issue.
Step 8 : Once all redo logs sync up with all replica you can change AG mode which changed to asynchronous in step 7 can be changed to synchronous.

Leave a Comment