by Manish Rawat Indexing is one of the most crucial elements in increasing the performance of SQL Server. A well-written query will not show its effectiveness unless powered by an appropriate index or indexes on the table(s) used in a query, especially if the tables are large. Indexes exist to make data selection faster, so the focus of this article is on ways you can select the best indexes for faster data retrieval. This is done in a two-step process. (인덱스는 데이터의 검색을 빠르게 한다. 그래서, 이 글에선 빠른 데이터 검색을 위한 가장 좋은 인덱스를 선택하는 방법에 촛점을 맞춘다. 이것은 두 단계로 이루어진다.) - Step One: Gathering Information
(첫번째 단계 : 정보 수집)
- Step Two: Taking Actions on the Information Gathered
- (두번째 단계 : 수집된 정보를 바탕으로 작업 수행)
Indexing can be quite a challenging task if you are not familiar with your databases, the relationships between tables, and how queries are written in stored procedures and embedded SQL. How many companies you have worked for have a proper ERD diagram of their databases and have followed the textbook method style of programming? In the real world, time is often limited, resulting in poor SQL Server database performance. If you have been tasked with optimizing a database's performance (at least to a respectable level), or you want to be proactive with your databases to prevent potential future performance issues, following these steps should help you in tuning tables, just as they have helped me. These steps are applicable at any stage of project, even if a deadline is just around the corner. Step One (Gathering Information) Interact with the people who know about the database and its table structures. (데이터베이스와 그 데이터베이스의 테이블 구조에 대해 아는 사람들과 대화를 나눈다.) If you know it already, that’s great. This is very important and makes your life easier. (만약 이미 알고 있다면 좋은 일이다. 이것은 매우 중요하며, 작업을 쉽게 만들어 준다.) 1) Identify key tables, based on: (아래의 내용을 기반으로 KEY 테이블을 찾는다.) - Static tables (often called master tables).
(정적 테이블 - 마스터 테이블이라 불린다.)
- Highly transactional tables.
(많은 트랜잭션이 발생하는 테이블)
- Tables used within a lot of stored procedures or embedded SQL.
(임베디드 SQL이나 저장프로시저에서 많이 사용된 테이블)
- Tables with an index size greater then its data's size. You can use sp_spaceused with the table name to find table space usage.
(데이터 크기보다 큰 사이트의 인덱스를 갖는 테이블. 'sp_spaceused'를 이용해서 찾을 수 있다.)
- Top 10 or 15 big size tables. See a prior year database if available or applicable. The idea is to identify the largest tables in the database after it is in production. There is a script available at http://www.sqlteam.com/item.asp?ItemID=282 that you may find useful.
- (전체 테이블 중 10~15위권 안의 사이즈가 큰 테이블.)
2) Identify the most frequently called stored procedures/queries and list all of the tables used by them. (가장 자주 사용되는 저장 프로시저나 쿼리들을 찾아내고, 이 저장 프로시저나 쿼리에서 사용되는 테이블을 목록화한다.) 3) Get the SQL Profiler trace of : - Production site (if available/applicable). Running a trace on the production box during typical activity is worth the effort and will be fruitful in later analysis.
- Testing site (if one is available/applicable).
- Otherwise, get if from your development server.
It is advisable to write down information you collect in a document for later retrieval. 4) Before we dive into analyzing the information gathered, here are few things to keep in mind while tuning your tables: - To see the Query/Execution plans of queries, highlight them in SQL Query Analyzer (isqlw.exe) and select Display Estimated Query Plan (Cntl+L) from the Query menu. If you want to see the query plan of a stored procedure, select Show Execution Plan (Cntl+k) and execute the stored procedure. Also, turn on the “Set Statistics IO on “ command. Examining Query/Execution plans can be a bit time consuming. But you will find it easier if you really understand the database and its tables before you begin.
- You need to have a good foundation on how clustered and non-clustered indexes work. See this URL for a good foundation on this topic: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/architec/8_ar_da2_8sit.asp
Preferred SQL Server Index Types When you use Query Analyzer to produce a graphical execution plan, you will notice that there are several different ways SQL Server uses indexes. 1) Clustered Index Seek: A Clustered Index Seek uses the seeking ability of indexes to retrieve rows directly from a clustered index. In most cases, they provide the best performance on SELECT statements. In Query Analyzer, go to pubs database. Type following query: SELECT * FROM authors WHERE au_id LIKE'2%' Highlight the query and press. (Cntl + L) or highlight the query and press F5. You will see the following in the “Estimated Execution Plan” tab. Take a close look at the Argument section of the above illustration. Notice that the “UPKCL_auidind” clustered index is used to retrieve the data. 2) Index Seek: An Index Seek uses a non-clustered index to retrieve data, and in some ways, acts like a clustered index. This is because all of the data retrieved is fetched from the leaf layer of the non-clustered index, not from any of the data pages. You often see this behavior in a covering index. In Query Analyzer, go to pubs database and type following query: SELECT title_id, title FROM titles WHERE title LIKE 't%' Highlight the query and press. (Cntl + L) or highlight the query and press F5. You will see the following in the “Estimated Execution Plan” tab: In the Argument section in the above illustration, note that the “titleind” non-clustered index is used to retrieve the data. 3) Bookmark Lookup: A Bookmark Lookup uses a non-clustered index to select the data. It starts with an index seek in the leaf nodes of the non-clustered index to identify the location of the data from the data pages, then it retrieves the necessary data directly from the data pages. Leaf nodes of non-clustered indexes contain row locator that point to the actual data on data pages. In Query Analyzer, go to the pubs database. Type following query: SELECT * FROM titles WHERE title LIKE 't%' Highlight the query and press. (Cntl + L) or highlight the query and press F5. You will see the following in the “Estimated Execution Plan” tab. In the Argument section of the Index Seek, notice that the "titlecind" non-clustered index is used, but once the data pages are identified from looking them up in the leaf pages of the non-clustered index, then a Bookmark Lookup must be performed. Again, a Bookmark Lookup is when the Query Optimizer has to lookup the data from the data pages in order to retrieve it. In the Argument section of the Bookmark Lookup, note that a Bookmark Lookup called "Bmk1000" is used. This name is assigned automatically by the Query Optimizer. 4) Scans: Scans (Table scans, Index scan, and Clustered Index scans) are usually bad unless the table has very few rows and the Query Optimizer determines that a table scan will outperform the use of an available index. Watch out for scans in your execution plans. In Query Analyzer, go to pubs database and type the following query: SELECT * FROM employee WHERE hire_date > '1992-08-01' Highlight the query and press. (Cntl + L) or highlight the query and press F5. You will see the following in the “Estimated Execution Plan” tab: Notice that in this case, a Clustered Index Scan was performed, which means that every row in the clustered index had to be examined to fulfill the requirements of the query. Now that we understand some of the basics of how to read Query Execution Plans, lets take a look at some additional information that you will find useful when analyzing queries for proper index use: - If you create multiple query or a stored procedure execution plans at the same time in Query Analyzer, you can compare the cost of each query or stored procedure to see which is more efficient. This is useful for comparing different versions of the same query or stored procedure.
(쿼리분석기에서 다수의 쿼리나 저장프로시저에 대한 실행계획을 동시에 보면, 각각의 쿼리나 저장 프로시저의 비용을 보다 효과적으로 비교해 불 수 있다. 이런 방법은 같은 쿼리나 저장 프로시저의 다른 버젼들 간의 비교에 보다 유용하게 활용할 수 있다.)
- Primary Key constraints create clustered indexes automatically if no clustered index already exists on the table and a non-clustered index is not specified when you create the PRIMARY KEY constraint.
(기본키 제약조건은 테이블에 클러스터드 인덱스가 존재하지 않는 경우 클러스터드 인덱스를 자동적으로 생성한다.(역자주-7.0의 경우 넌클러스터드 인덱스였다. 2000이 되면서 바뀐 부분) 그리고 기본키 제약조건을 만들 때 넌클러스터드인덱스는 만들어지지 않는다.)
- Non-clustered indexes store clustered index keys as their row locators. This overhead can be used as a benefit by creating a covering index (explained later). Using covering indexes requires caution.
- A table's size comprises both the table’s data and the size of any indexes on that table.
- Adding too many indexes on a table increases the total index size of atable and can often degrade performance.
- Always add a clustered index to every table, unless there is a valid reason not to, like the table has few rows.
- Seeks shown in Query/Execution plans for SELECT statements are good for performance, while scans should be avoided.
- A table's size (number of rows) is also a major factor used up by Query Optimizer when determining best query plan.
- Index order plays an important role in many query plans. For example, in the authors table of the pubs database, a non-clustered index is defined in the order of au_lname, then au_fname.
Fine Query A
SELECT * FROM authors WHERE au_lname like 'r%' This uses a Bookmark Lookup and an Index seek. Fine Query B select * FROM authors WHERE au_lname LIKE 'r%' AND au_fname like ‘a’ This uses a Bookmark Lookup and an Index Seek. Not so Fine Query C SELECT * FROM authors WHERE au_fname LIKE ‘a’ This uses a Clustered Index Scan.
- SQL Server 2000 (not earlier versions) offers both ascending and descending sort orders to be specified in an index. This can be useful for queries, which uses the ORDER BY DESC clause.
- To find a particular word (for e.g. a table name) used in all stored procedure code, you can use the following code to identify it. For example, you can use this to find out a list of SP’s using a table.
SELECT DISTINCT a.name AS SPName FROM syscomments b, sysobjects a WHERE b.text LIKE '%authors%' AND a.id=b.id AND a.type='P'
This query brings all SP’s having text “authors” in their code. Note that this query might bring extra SP’s, for example, if a stored procedure uses text in a comment. Step Two: What to Do Once You Have Gathered the Necessary Information Actions for Key Tables For static tables (tables that rarely, if ever change), you can be liberal on the number of indexes that can be added. As mentioned earlier, too many indexes can degrade the performance of highly transactional tables, but this does not apply to tables whose data will not change. The only consideration possible could be disk space. Set all index fill factors on static tables to 100 in order to minimize disk I/O for even better performance. For highly transactional tables, try to limit the number of indexes. Always keep in mind that a non-clustered index contains the clustered index key. Because of this, limit the number of columns on your clustered index in order to keep their size small. Any index for busy transactional tables has to be highly justifiable. Choose the fill factor with caution (usually 80 to 90%) for these indexes in order to avoid potential page splitting. For tables used lot in stored procedures/embedded SQL, these tables play an important role in the total application lifetime as they are called most often. So they require special attention. What is important is look at how tables are being accessed in queries in order to eliminate scans and convert them into seeks. Watch the logical I/O used by "Set Statistics IO on" to help you determine which queries are accessing the most data. Less logical I/O is better than more. Choose clustered index with caution. Depending on how transactional the table is, choose a higher fill factor. For tables with index size greater then data size implies a lot of indexes, so review indexes and make sure their existence is useful and justified. For the Top 10 or 15 largest tables, keep this fact in mind when creating indexes for these types of tables, as their indexes will also be large. Also check to see if the tables are static or non-static, which is helpful information when deciding what columns need to be indexed. For the most frequently called Stored procedures/Embedded SQL, See the Query plans and Logical I/O page use. SQL Profiler Trace is a very good tool. It tracks calls getting executed in SQL Server at any given point of time, their execution time, I/O reads, user logins, executing SQL statement, etc. It can also be used as debugging tool. An analysis of a Profiler trace is important to identify slow running queries. You can set the duration to > 100ms to see queries which take more then 100 milliseconds to execute. Using a Covering Index + Non-clustered Index Uses Clustered Index as a Row Locator
One can leverage the fact that non-clustered indexes store clustered index keys as their row locators. Meaning that a non-clustered index can behave as a clustered index if the index has all of the columns referenced in SELECT list, WHERE clause, and JOIN conditions of a query. Example 1 In the Orders table the NorthWind database, there currently is a non-clustered index on the ShippedDate column. Try running the following: SELECT ShippedDate, shipcity FROM orders WHERE ShippedDate > '8/6/1996' The query plan of statement produces a Clustered Index Scan. Now add the column shipcity to the non-clustered index on ShippedDate. CREATE INDEX [ShippedDate] ON [dbo].[Orders] ([ShippedDate], [ShipCity]) WITH DROP_EXISTING Now run the query again. This time, the query plan of statement produces an Index Seek. This magic happened because all fields (ShippedDate and ShipCity) in the SELECT and the WHERE clauses are part of an index. Example 2 In the Titles table of the Pubs database, check out the following execution plan for this query: SELECT title_id, title FROM titles WHERE title LIKE 't%' Notice that the execution plan shows an Index Seek, not a Bookmark Lookup (which is what you usually find with a non-clustered index). This is because the non-clustered index on the Title column contains a clustered index key Title_Id, and this SELECT has only Title_Id, Title in the SELECT and in the WHERE clause. Now the Hard Part Starts By following the simple steps outlined here, you get useful pieces of information in no time that can help you improve the performance of your database. After you have data in front of you put it to work. These minor inputs will bring a majority of your results. Published with express written permission of the author. Copyright 2002.
|