Hi all, In this article I am trying to explain “How to define indexes in SQL Server”. For this article am going to use Products table of Northwind database.
If we don't create any indexes then the SQL engine searches every row in table (also called as table scan). As the table data grows to thousand, millons of rows and further then searching without indexing becomes much slower and becomes expensive.
eg. Following query retrieves Customer information where country is USA from Customers table of the Northwind database.
As there is no Index on this table, database engine performs table scan and reads every row to check if Country is "USA". The query result is shown below. Database engine scans 91 rows and find 13 rows.
Indexes supports to the database Engine. Proper indexing always results in considerable increase in performance and savings in time of an application and vice-versa. When SQL Server process a query then it uses Indexes to find the data. Indexes cane created on one or more columns as well as on XML columns also. We can create Index by selecting one or more columns of a table being searched. Index creates model related with the table/view and constraints created using one or more columns. It is more likely a Balanced Tree. This helps the SQL Server to find out rows with the keys specified.
Both of these may be unique. Wherever we make changes to the data table, managing of indexes is done automatically.
SQL Server allows us to add non-key column at the leaf node of the non clustered index by passing current index key limit and to execute fully covered index query.
Automatic index is created wherever we create primary key, unique key constraints to table.
Table scans uses many Input-output operations, it also uses large number of resources as all rows from the table are scanned.
Index scan used for searching index key columns to find storage location.
The index containing fewer columns results in to faster query execution and vice-versa.
We also create this index by using query. This command mentions the name of index (PK_Customers) the table name (Customers), and the column to index (CustomerID).
We can also rename indexes with the sp_rename stored procedure. The sp_rename procedure takes, at a minimum, the current name of the object and the new name for the object. While renaming indexes, the current name must include the name of the table, a dot separator, and the name of the index, as shown below:
It is very sensible to remove index from database if it is not much of worth. eg. For instance, if we know the queries are no longer searching for records on a speicific column, we can remove the index. Unneeded indexes only take up storage space and diminish SQL command is shown below.
We can use a stored procedure named sp_helpindex. This stored procedure gives all of the indexes for a table with its all relevant attributes. The only input parameter to the procedure is the name of the table, as shown below.
The database takes the columns specified in a CREATE INDEX command and sorts the values into a special data structure known as a B-tree. A B-tree structure supports fast searches with a minimum amount of disk reads, allowing the database engine to quickly find the starting and stopping points for the query we are using.
Conceptually, every index entry has the index key. Each entry also includes a references to the table rows which share that particular value and from which we can retrieve the required information.
It is much similar to the back of a book helps us to find keywords quickly, so the database is able to quickly narrow the number of records it must examine to a minimum by using the sorted list of Key values stored in the index. Thus we avoid a table scan to fetch the query results. Following some of the scenarios where indexes offer a benefit. Advantages of Indexing
Indexes work well when searching for a record in DELETE and UPDATE commands as they do for SELECT statements.
When we require sorted results, the database tries to find an index and avoids sorting the results while execution of the query. We control sorting of a dataset by specifying a field, or fields, in an ORDER BY clause, with the sort order as ascending (ASC) or descending(DESC). E.g. Query below returns all customers sorted by Country:
When there is no indexes, the database will scan the Customers table and then sort the rows to process the query. However, the index we created on Country (Country) before will provide the database with a already sorted list of Countries. The database can simply scan the index from the first record to the last record and retrieve the rows in sorted order. The same index works same with the following query, It simply scans the index in reverse.
We can use a GROUP BY clause to group records and aggregate values, e.g. for counting the number of customers in a country. To process a query with a GROUP BY clause, the database will quite ofen sort the results on the columns included in the GROUP BY. The following query counts the number of customers from every country with the same UnitPrice value.
There are few drobacks of indexes. While indexes provide a substantial performance benefit to searches, there is also a downside to indexing.
Indexes are stored on the disk, and the amount of space required will depend on the size of the table, and the number and types of columns used in the index. Disk space is generally cheap enough to trade for application performance, particularly when a database serves a large number of users. To see the space required for a table, use the sp_spaceused system stored procedure in a query window. EXEC sp_spaceused Customers
Result
From the above output, the table data uses 24 kb, while the table indexes use about 18 times as much, or 176 kilobytes. The ratio of index size to table size can vary greatly, depending on the columns, data types, and number of indexes on a table.
If the data is change or modified on regular intervals then database engine requires to update all the indexes, thus too many indexes will slows down the performance. Thus Database used for transaction processing should use fewer indexes to allow for higher throughput on insert and updates. While in DSS (Decision Support System) and datawarehousing where information is static and queries is required largely for the reporting purposes than the modification purposes then heavy indexing is required to optimze the performance.
Another downside to using an index is the performance implication on data modification statements. Any time a query modifies the data in a table (INSERT, UPDATE, or DELETE), the database needs to update all of the indexes where data has changed. As we discussed earlier, indexing can help the database during data modification statements by allowing the database to quickly locate the records to modify, however, we now caveat the discussion with the understanding that providing too many indexes to update can actually hurt the performance of data modifications. This leads to a delicate balancing act when tuning the database for performance.
In order to create effective index choice of correct columns and types is very important.
It becomes harder for database engine to work on larger an index key. E.g. An integer key is smaller in size then a character field for holding 100 characters. Keep keep clustered indexes as short as possible.
We must try to avoid using character columns in an index, particularly primary key indexes. Integer columns will always have an advantage over character fields in ability to boost the performance of a query.
Indexes with a small percentage of duplicated values are always effective.
An index with a high percentage of unique values is a selective index. Obviously, a unique index is the most selective index of all, because there are no duplicate values. SQL Server will track statistics for indexes and will know how selective each index is. The query optimizer utilizes these statistics when selecting the best index to use for a query.
This article deals with -
- Query Optimizer
- Creating an Index
- Creating Unique Index
- Creating Clustered Index
- Creating Full-Text Index
- Changing properties of Index
- Renaming an Index
- Deleting an Index
- Specifying Fill factor of Index
- Create XML Index
- Delete XML Index
- Advantages of Indexing
- Disadvantages of Indexing
- Guidelines for Indexing
Explanation
Every organization has its database and each and every day with the increase in the data volume these organizations has to deal with the problems relating to data retrieval and accessing of data. There is need of system which will results into increase in the data access speed. An index (in simple words it like index of any book eg. While searching a word in Book we use index back of book to find the occurance of that word and its relevant page numbers), which makes it easier for us to retrieval and presentation of the data. An Index is a system which provides faster access to rows and for enforcing constraints.If we don't create any indexes then the SQL engine searches every row in table (also called as table scan). As the table data grows to thousand, millons of rows and further then searching without indexing becomes much slower and becomes expensive.
eg. Following query retrieves Customer information where country is USA from Customers table of the Northwind database.
Collapse
SELECT CustomerID,ContactName,CompanyName,City FROM Customers WHERE Country ='USA'
As there is no Index on this table, database engine performs table scan and reads every row to check if Country is "USA". The query result is shown below. Database engine scans 91 rows and find 13 rows.
Indexes supports to the database Engine. Proper indexing always results in considerable increase in performance and savings in time of an application and vice-versa. When SQL Server process a query then it uses Indexes to find the data. Indexes cane created on one or more columns as well as on XML columns also. We can create Index by selecting one or more columns of a table being searched. Index creates model related with the table/view and constraints created using one or more columns. It is more likely a Balanced Tree. This helps the SQL Server to find out rows with the keys specified.
Indexes may be either Clustered or Non-Clustered.
Clustered Index
Every table can have one and only Clustered Index because index is built on unique key columns and the key values in data rows is unique. It stores the data rows in table based on its key values. Table having clustered index also called as clustered table.Non-Clustered Index
It has structure different from the data rows. Key value of non clustered index is used for pointing data rows containing key values. This value is known as row locator. Type of storage of data pages determines the structure of this row Locator. Row locator becomes pointer if these data pages stored as a heap. As well as row locator becomes a clustered index key if data page is stored in clustered table.Both of these may be unique. Wherever we make changes to the data table, managing of indexes is done automatically.
SQL Server allows us to add non-key column at the leaf node of the non clustered index by passing current index key limit and to execute fully covered index query.
Automatic index is created wherever we create primary key, unique key constraints to table.
The Query Optimizer
Query Optimizer indexes to reduce operations of disk input-output and using of system resources when we fire query on data. Data manipulation Query statements (like SELECT, DELETE OR UPDATE) need indexes for maximization of the performance. When Query fires the most efficient method for retrieval of the data is evaluated among available methods. It uses table scans or index scans.Table scans uses many Input-output operations, it also uses large number of resources as all rows from the table are scanned.
Index scan used for searching index key columns to find storage location.
The index containing fewer columns results in to faster query execution and vice-versa.
Creating an Index
- Connect to Northwind database from Object Explorer, right click on the Customers table to create an index and click on modify.
- Click on Index/Keys from Table Desinger Menu on top or right click on any column and click onIndex/Keys.
- Click on Add from Indexes/Keys dialog box.
- From Selected Primary/Unique Key or Index list, select the new index and set properties for the index in the grid on right hand side.
- Now just specify other settings if any for the index and click Close.
- When we save the table, the index is created in the database.
Collapse
CREATE INDEX Country ON Customers (Country)
Creating Unique Index
SQL Server permits us to create Unique Indexes on columns which are unique to identify. (like employee’s Reference ID, Email-id etc.) We use set of columns to create unique index.- Right-click on the Customers and click Modify in Object Explorer.
- Now, click on Indexes/Keys from Table Designer menu and click on Click Add.
- The Selected Primary/Unique Key or Index list displays the automatically generated name of the new index.
- In the grid, click on Type, from the drop-down list, Choose Index.
- Under Column name,we can choose columns we want to index and click on OK. Maximum we can setup 16 columns. For optimum performance, it is recommended that we use one or two columns per index. For every column we values of these columns are arranged in ascending or descending order.
- In the grid, click Is Unique and select select Yes.
- Null is treated as duplicate values. So, it is not possible to create unique index on one column if it contains null in more than one row. Likewise index cannot be created on multiple columns if those columns contains null in same row.
- Now select Ignore duplicate keys option. If it is required to ignore new or updated data that will lead to creation of duplicate key in the index (with the INSERT or UPDATE statement).
- When we save the table, the index is created in the database.
Collapse
CREATE UNIQUE INDEX ContactName ON Customers (CompanyName,ContactName)
Creating Clustered Index
table can have only one clustered index. In Clustered index logical order and physical order of the index key values is identical of rows in the table.- In the Object Explorer click on the Northwind database, right click on the Customres to create an index and click on modify.
- Now we have Table Designer for the table.
- From the Table Designer menu, click Indexes/Keys and from Indexes/Keys dialog box, click Add.
- Now from Selected Primary/Unique Key or Index list, Select the new index
- In the grid, select Create as Clustered, and choose Yes from the drop-down list to the right of the property.
- When we save the table, the index is created in the database.
We also create this index by using query. This command mentions the name of index (PK_Customers) the table name (Customers), and the column to index (CustomerID).
Collapse
CREATE CLUSTERED INDEX PK_Customers on Customers(CustomerID)
Creating Full Text Search
For text based columns, full text search is always required to be performed under several times. In such situations full text index is used. A regular index is required to be prepared before creating full text index as the later relies on the former. Regular index is created on single column having not null. It is recommended to create regular index on column having small values. For several occasions, SQL Server management Studio is also used to create catalog.
- In the object explorer click on the Northwind database, right click on the customers to create an index and click on modify.
- Now, we have Table Designer for the customers table and then Click Fulltext Index from the Table Designer menu.
- Dialog box for full-text index opens. (Sometimes database is not enabled for full text indexing. In such situations add button disabled. To enable it check properties for database by right clicking on database. And check the full text indexing check box)
- Now we have to right click on storage>New Full-Text catalog to create a catalog. Enter some required information in dialog box.
- Now from Table Designer menu, open the Full Text Index property dialog and then click on Add.
- Now select new index from selected full-text index list and assign properties for index in the grid.
- When we save Table the index is automatically saved in database, and this index is available for modifications.
Changing index properties
- Connect to the SQL-Server 2005, In the object explorer click on the Northwind database.
- Click Indexes/Keys from the table designer menu.
- Now select index from the selected primary/unique key or index list. And Change the properties.
- When we save Table the index is automatically saved in database.
Renaming an Index
- Right-click the table with the index you want to rename and click Modify, in Object Explorer.
- Click Indexes/Keys from the Table Designer menu.
- Now from the Selected Primary/Unique Key or Index list, select the index.
- Click Name and type a new name into the text box in the grid.
- When we save Table the index is automatically saved in database.
We can also rename indexes with the sp_rename stored procedure. The sp_rename procedure takes, at a minimum, the current name of the object and the new name for the object. While renaming indexes, the current name must include the name of the table, a dot separator, and the name of the index, as shown below:
Collapse
EXEC sp_rename 'Customers.Country', 'Countries'
Deleting an Index
- Right-click the table with indexes you want to delete and click Modify In Object Explorer.
- Click Indexes/Keys from the Table Designer menu.
- Select the index you want to delete from the Indexes/Keys dialog box and Click on Delete.
- When we save Table the index is deleted from the database.
It is very sensible to remove index from database if it is not much of worth. eg. For instance, if we know the queries are no longer searching for records on a speicific column, we can remove the index. Unneeded indexes only take up storage space and diminish SQL command is shown below.
Collapse
DROP Index Customers.Country
Specifying Fill Factor
Fill Factor Fill factor is used by SQL Server to specify how full each page index. The fill factor is the percentage of allotted free space to an index. We can specify the amount of space to be filled. It is very important as the improper selection my slow down the performance.- Right-click the table with an index for which we want to specify fill factor and click Modify in Object Explorer
- Click Indexes/Keys, from the Table Designer menu.
- From Selected Primary/Unique Key or Index list, select the index.
- Type a number from 0 to 100, in the Fill Factor box. Value 100 denotes that index will fully filled up and storage space requirement will be minimum, this is recommended in situations where there are minimum changes of change in data. data fill factor. If there is regular modification and addition to the data, then set this value to minimum. Storage space is proportionate to the value set.
Creating XML Index
There somewhat different way to create XML indexes, we cannot create XML using Index/Keys dialog box. We create XML index from xml data type columns those are based on primary XML index. When we delete the primary XML index, then all the XML index will be deleted.- In Object Explorer, right-click the customers table to create an XML index and click Modify.
- Select the xml column for the index for the table opened in Table Designer.
- From the Table Designer menu, click XML Index,
- Click on add, in the XML Indexes dialog box
Deleting XML Indexes
- Right-click the customers table with the XML index you want to delete and click Modify in Object Explorer.
- click on XML Index, from the Table Designer menu.
- From selected XML Index column, Click the index you want to delete. And then Click on Delete.
Viewing Existing Indexes
We can view list of all indexes on a table in the dialog box we used to create an index. Just Click on the Selected index drop down control and scroll through the available indexes.We can use a stored procedure named sp_helpindex. This stored procedure gives all of the indexes for a table with its all relevant attributes. The only input parameter to the procedure is the name of the table, as shown below.
Collapse
EXEC sp_helpindex Customers
How Index works
The columns specified in the CREATE INDEX COMMAND taken by the database engine and sorts the values in Balanced Tree(B-Tree) data structure. B-Tree structure supports faster search with minimum dist reads, and allows the database engine to find quick start and end point for the stated query.The database takes the columns specified in a CREATE INDEX command and sorts the values into a special data structure known as a B-tree. A B-tree structure supports fast searches with a minimum amount of disk reads, allowing the database engine to quickly find the starting and stopping points for the query we are using.
Conceptually, every index entry has the index key. Each entry also includes a references to the table rows which share that particular value and from which we can retrieve the required information.
It is much similar to the back of a book helps us to find keywords quickly, so the database is able to quickly narrow the number of records it must examine to a minimum by using the sorted list of Key values stored in the index. Thus we avoid a table scan to fetch the query results. Following some of the scenarios where indexes offer a benefit. Advantages of Indexing
Searching For Records
The most important use for an index is in finding a record or set of records matching a WHERE clause. Indexes can help queries with speicfic range. as well as queries looking for a specific value. E.g the following queries can all benefit from an index on UnitPrice: Collapse
DELETE FROM Customers WHERE Country = "USA" UPDATE Customers SET Region = "Pacific" WHERE Country = "USA" SELECT * FROM Customers WHERE Country="USA" or "Brasil"
Indexes work well when searching for a record in DELETE and UPDATE commands as they do for SELECT statements.
Sorting Records
When we require sorted results, the database tries to find an index and avoids sorting the results while execution of the query. We control sorting of a dataset by specifying a field, or fields, in an ORDER BY clause, with the sort order as ascending (ASC) or descending(DESC). E.g. Query below returns all customers sorted by Country:
Collapse
SELECT * FROM Customers ORDER BY Country ASC
When there is no indexes, the database will scan the Customers table and then sort the rows to process the query. However, the index we created on Country (Country) before will provide the database with a already sorted list of Countries. The database can simply scan the index from the first record to the last record and retrieve the rows in sorted order. The same index works same with the following query, It simply scans the index in reverse.
Collapse
SELECT * FROM Customers ORDER BY Country DESC;
Grouping Records
We can use a GROUP BY clause to group records and aggregate values, e.g. for counting the number of customers in a country. To process a query with a GROUP BY clause, the database will quite ofen sort the results on the columns included in the GROUP BY. The following query counts the number of customers from every country with the same UnitPrice value.
Collapse
SELECT Count(*) FROM Products GROUP BY UnitPrice
Index Drawbacks
There are few drobacks of indexes. While indexes provide a substantial performance benefit to searches, there is also a downside to indexing.
Indexes and Disk Space
Indexes are stored on the disk, and the amount of space required will depend on the size of the table, and the number and types of columns used in the index. Disk space is generally cheap enough to trade for application performance, particularly when a database serves a large number of users. To see the space required for a table, use the sp_spaceused system stored procedure in a query window. EXEC sp_spaceused Customers
Result
Collapse
name rows reserved data index_size unused --------- ----------- ------------------ ------------------ ------------------ Customers 91 200 KB 24 KB 176 KB 0 KB
From the above output, the table data uses 24 kb, while the table indexes use about 18 times as much, or 176 kilobytes. The ratio of index size to table size can vary greatly, depending on the columns, data types, and number of indexes on a table.
Indexes and Data Modification
If the data is change or modified on regular intervals then database engine requires to update all the indexes, thus too many indexes will slows down the performance. Thus Database used for transaction processing should use fewer indexes to allow for higher throughput on insert and updates. While in DSS (Decision Support System) and datawarehousing where information is static and queries is required largely for the reporting purposes than the modification purposes then heavy indexing is required to optimze the performance.
Another downside to using an index is the performance implication on data modification statements. Any time a query modifies the data in a table (INSERT, UPDATE, or DELETE), the database needs to update all of the indexes where data has changed. As we discussed earlier, indexing can help the database during data modification statements by allowing the database to quickly locate the records to modify, however, we now caveat the discussion with the understanding that providing too many indexes to update can actually hurt the performance of data modifications. This leads to a delicate balancing act when tuning the database for performance.
Additional Index Guidelines
In order to create effective index choice of correct columns and types is very important.
Keeping Index Keys Short
It becomes harder for database engine to work on larger an index key. E.g. An integer key is smaller in size then a character field for holding 100 characters. Keep keep clustered indexes as short as possible.
We must try to avoid using character columns in an index, particularly primary key indexes. Integer columns will always have an advantage over character fields in ability to boost the performance of a query.
Distinct Index Keys
Indexes with a small percentage of duplicated values are always effective.
An index with a high percentage of unique values is a selective index. Obviously, a unique index is the most selective index of all, because there are no duplicate values. SQL Server will track statistics for indexes and will know how selective each index is. The query optimizer utilizes these statistics when selecting the best index to use for a query.
No comments:
Post a Comment