Hide this message
Looking to compare latest NPM features with previous versions of NPM?
The NPM new feature summary offers a comparison of new features and improvements offered with this release.
This article provides an overview of database fragmentation and why an Orion database can be over 90% fragmented. This article is part of the Quick Orion Database Health Check Guide.
Similar to disk fragmentation, database fragmentation is a condition where data is stored in various places in the database file instead of sequentially (or next to) like data within the database. This issue often occurs when non-sequential keys and constant data inserting, updating and deleting causes the data to become fragmented, including the additional data pages to store the data.
Orion performs the same process for data summarization. As a result, high database fragmentation within the tables are normal and expected behavior. See the following resources for details:
Apart from fragmentation at the file-system level and within the log file, you can also have fragmentation within the data files—in the structures that store the table and index data.
Two basic types of fragmentation can occur within a data file:
Internal fragmentation is where a lot of empty space exists in a page. As shown in Figure 1, each database page is 8KB with a 96-byte page header.
In this example, a page can store roughly 8096 bytes of table or index data. You can encounter empty if each table or index record is more than half the size of a page because only a single record can be stored per-page.
This situation can be very hard or impossible to correct because it requires a table or index schema change—for example, changing an index key to be something that doesn't cause random insertion points like a GUID.
Internal fragmentation usually occurs due to data modifications, such as inserts, updates, and deletes that can leave empty space on a page. Mismanaged fill-factor can also contribute to fragmentation (see Books Online for more details). Depending on the table or index schema and the application characteristics, this empty space may never be reused after it is created and can lead to ever-increasing amounts of unusable space in the database.
For example, if you have a 100-million-row table with an average record size of 400 bytes, the application's data modification pattern over time can leave each page with an average of 2800 bytes of free space. The total space required by the table is about 59GB, calculated as 8096-2800 / 400 = 13 records per 8KB page, then dividing 100 million by 13 to get the number of pages. If the space was not being wasted, 20 records would fit per page, bringing the total space required down to 38GB. That's a huge savings.
As a result, wasted space on data or index pages can require additional pages to hold the same amount of data. Not only does this take up more disk space, it also requires a query to issue more I/Os to read the same amount of data.
And all these extra pages occupy more space in the data cache, requiring additional server RAM.
Logical scan fragmentation is caused by an operation called a page split. This occurs when a record is inserted on a specific index page (according to the index key definition), but there is not enough space on the page to fit the new data.
The page is split in half and roughly 50 percent of the records moved to a newly-allocated page. Since this new page is usually not physically contiguous with the old page, it is called fragmented.
Extent scan fragmentation is similar in concept. Fragmentation within the table or index structures limits SQL Server from performing efficient scans—whether over an entire table/index or bounded by a query WHERE clause (such as SELECT * FROM MyTable WHERE Column1 > 100 AND Column1 < 4000).
Figure 2 shows newly-created index pages with a 100% fill-factor and no fragmentation—the pages are full and the physical order of the pages matches the logical order. Figure 3 shows the fragmentation that can occur after random inserts, updates, and deletes.
You can minimize fragmentation by changing the table or index schema, but as mentioned earlier, this may be difficult or impossible to prevent. If prevention is not an option, there are ways to remove fragmentation once it has occurred—in particular, by rebuilding or reorganizing an index.
Rebuilding an index involves creating a new copy of the index—nicely compacted and as contiguous as possible—and dropping the old, fragmented index. Before Microsoft SQL Server deletes the old index and creates a new copy, it requires free space in the data files that is approximately equivalent to the size of the index. In SQL Server 2000, rebuilding an index was always an offline operation. However, in SQL Server 2005 Enterprise Edition, index rebuilding can be performed online with a few restrictions.
Reorganizing, on the other hand, uses an in-place algorithm to compact and defragment the index. It requires only 8KB of additional space to run, and it always runs online. In fact, in SQL Server 2000, I specifically wrote the index reorganize code as an online, space-efficient alternative to rebuilding an index.
In SQL Server 2005, the commands to investigate are ALTER INDEX … REBUILD to rebuild indexes, and ALTER INDEX … REORGANIZE to reorganize them. This syntax replaces the SQL Server 2000 commands DBCC DBREINDEX and DBCC INDEXDEFRAG, respectively.
There are many trade-offs between these methods, which include
See the Microsoft TechNet article Microsoft SQL Server 2000 Index Defragmentation Best Practices for details about these trade-offs. The article is based on SQL Server 2000, but the concepts translate well to later versions.
Some IT professionals choose to rebuild or reorganize all indexes every night or every week (for example, using a maintenance plan option) rather than figuring out which indexes are fragmented and whether any benefit will come from removing the fragmentation. While this solution may be suitable for an involuntary DBA who just wishes to put something in place with minimal effort, it can be a very poor choice for larger databases or systems where resources are at a premium.
A more sophisticated approach involves using the following dynamic management view (DMV) query sys.dm_db_index_physical_stats (or DBCC SHOWCONTIG in SQL Server 2000) to periodically determine which indexes are fragmented, and then choose whether and how to operate on those indexes. The TechNet article discusses using these more targeted choices.
Additionally, you can view example code to do this filtering in Example D in the MSDN Online entry for the DMV sys.dm_db_index_physical_stats query in SQL Server 2005 or Example E in the MSDN Online entry for DBCC SHOWCONTIG in SQL Server 2000 and later.
Whichever method you use, SolarWinds recommends inspecting and defragging your databases on a regular basis.