Improve SQL Server query performance on large tables

Simple Answer: NO. You cannot help ad hoc queries on a 238 column table with a 50% Fill Factor on the Clustered Index.

Detailed Answer:

As I have stated in other answers on this topic, Index design is both Art and Science and there are so many factors to consider that there are few, if any, hard and fast rules. You need to consider: the volume of DML operations vs SELECTs, disk subsystem, other indexes / triggers on the table, distribution of data within the table, are queries using SARGable WHERE conditions, and several other things that I can’t even remember right now.

I can say that no help can be given for questions on this topic without an understanding of the Table itself, its indexes, triggers, etc. Now that you have posted the table definition (still waiting on the Indexes but the Table definition alone points to 99% of the issue) I can offer some suggestions.

First, if the table definition is accurate (238 columns, 50% Fill Factor) then you can pretty much ignore the rest of the answers / advice here ;-). Sorry to be less-than-political here, but seriously, it’s a wild goose chase without knowing the specifics. And now that we see the table definition it becomes quite a bit clearer as to why a simple query would take so long, even when the test queries (Update #1) ran so quickly.

The main problem here (and in many poor-performance situations) is bad data modeling. 238 columns is not prohibited just like having 999 indexes is not prohibited, but it is also generally not very wise.

Recommendations:

  1. First, this table really needs to be remodeled. If this is a data warehouse table then maybe, but if not then these fields really need to be broken up into several tables which can all have the same PK. You would have a master record table and the child tables are just dependent info based on commonly associated attributes and the PK of those tables is the same as the PK of the master table and hence also FK to the master table. There will be a 1-to-1 relationship between master and all child tables.
  2. The use of ANSI_PADDING OFF is disturbing, not to mention inconsistent within the table due to the various column additions over time. Not sure if you can fix that now, but ideally you would always have ANSI_PADDING ON, or at the very least have the same setting across all ALTER TABLE statements.
  3. Consider creating 2 additional File Groups: Tables and Indexes. It is best not to put your stuff in PRIMARY as that is where SQL SERVER stores all of its data and meta-data about your objects. You create your Table and Clustered Index (as that is the data for the table) on [Tables] and all Non-Clustered indexes on [Indexes]
  4. Increase the Fill Factor from 50%. This low number is likely why your index space is larger than your data space. Doing an Index Rebuild will recreate the data pages with a max of 4k (out of the total 8k page size) used for your data so your table is spread out over a wide area.
  5. If most or all queries have “ER101_ORG_CODE” in the WHERE condition, then consider moving that to the leading column of the clustered index. Assuming that it is used more often than “ER101_ORD_NBR”. If “ER101_ORD_NBR” is used more often then keep it. It just seems, assuming that the field names mean “OrganizationCode” and “OrderNumber”, that “OrgCode” is a better grouping that might have multiple “OrderNumbers” within it.
  6. Minor point, but if “ER101_ORG_CODE” is always 2 characters, then use CHAR(2) instead of VARCHAR(2) as it will save a byte in the row header which tracks variable width sizes and adds up over millions of rows.
  7. As others here have mentioned, using SELECT * will hurt performance. Not only due to it requiring SQL Server to return all columns and hence be more likely to do a Clustered Index Scan regardless of your other indexes, but it also takes SQL Server time to go to the table definition and translate * into all of the column names. It should be slightly faster to specify all 238 column names in the SELECT list though that won’t help the Scan issue. But do you ever really need all 238 columns at the same time anyway?

Good luck!

UPDATE
For the sake of completeness to the question “how to improve performance on a large table for ad-hoc queries”, it should be noted that while it will not help for this specific case, IF someone is using SQL Server 2012 (or newer when that time comes) and IF the table is not being updated, then using Columnstore Indexes is an option. For more details on that new feature, look here:
http://msdn.microsoft.com/en-us/library/gg492088.aspx (I believe these were made to be updateable starting in SQL Server 2014).

UPDATE 2
Additional considerations are:

  • Enable compression on the Clustered Index. This option became available in SQL Server 2008, but as an Enterprise Edition-only feature. However, as of SQL Server 2016 SP1, Data Compression was made available in all editions! Please see the MSDN page for Data Compression for details on Row and Page Compression.
  • If you cannot use Data Compression, or if it won’t provide much benefit for a particular table, then IF you have a column of a fixed-length type (INT, BIGINT, TINYINT, SMALLINT, CHAR, NCHAR, BINARY, DATETIME, SMALLDATETIME, MONEY, etc) and well over 50% of the rows are NULL, then consider enabling the SPARSE option which became available in SQL Server 2008. Please see the MSDN page for Use Sparse Columns for details.

Leave a Comment