Friday, March 30, 2007

writing OLAP queries - best practices

this is a list of some query tuning best practices i’ve compiled, with the general the goal being to avoid the problems of inefficient report queries that take a ton of time to execute.
  • begin each sproc with "SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED". this allows you to avoid NOLOCK hints within the query and ensures you don't accidentally forget one
  • write the query one join at a time
  • turn on ‘Show Execution Plan’ and study the plan with each join you add
    o look for the item taking the greatest % of execution time
    o check for thick lines (lots of data)
    o check for table scans/bookmark lookups

  • turn on 'Set Statistics Time' and 'Set statistics IO' from the Tools...Options...Connection Properties menu
    o look for extremely high logical reads (in general, < 10,000 logical reads per table is ok)
    o total execution time for a query of average complexity on a server under average load should not exceed 60 seconds for one day's worth of data

  • MAKE USE OF INDEXES! if you expected an index to be used that wasn't, figure out why
  • it could be that you are missing columns in your join; for example, an index on Table1 on presenceid,userid,starttime will not be used for "select starttime from Table1 where presenceid = sp.presenceid", but will be used for "select starttime from Table1 where presenceid = sp.presenceid and userid = sl.userid"
  • if there is no way to make use of an existing index, request that a new one be added or add columns to an existing one
  • use a "covering index" to avoid bookmark lookups
    o a covering index is an index that is both for the seek (where/join clause) and the selected data (select clause)
    o include the columns in your where/join clause first, then the columns you are selecting. for example, with an index on Table1 on presenceid,userid,starttime, no bookmark lookup will be needed when you "select starttime from Table1 where presenceid = sp.presenceid and userid = sl.userid", since presenceid and userid are in the where/join clause and starttime is in the select clause
    o in general, make the covering index as wide as you need to avoid bookmark lookups (the exception to this is extremely wide varchar columns that will inflate the size of your index too much)

  • if you still can't get the query execution time down, figure out another way to get the same data
  • use different tables/joins
  • use "EXISTS" instead of join
  • use subquery instead of join
  • use "=" not "<>", and "NOT EXISTS" instead of "NOT IN", whenever possible
  • follow examples in existing sprocs that perform well

lots of other good tips here: