In this example I play with the performance of a query that groups the data.
Clone database.
USE [master]; GO IF (DB_ID(N'PL_AdventureWorks2008R2') IS NOT NULL) BEGIN ALTER DATABASE [PL_AdventureWorks2008R2] SET SINGLE_USER WITH ROLLBACK IMMEDIATE; DROP DATABASE [PL_AdventureWorks2008R2]; END GO SELECT [name] FROM sys.databases WHERE [name] = N'PL_AdventureWorks2008R2'; GO BACKUP DATABASE [AdventureWorks2008R2] TO DISK = N'C:\Temp\Backups\PL_AdventureWorks2008R2_Data.bak' WITH INIT; GO RESTORE DATABASE [PL_AdventureWorks2008R2] FROM DISK = N'C:\Temp\Backups\PL_AdventureWorks2008R2_Data.bak' WITH RECOVERY , MOVE 'AdventureWorks2008R2_Data' TO 'C:\Temp\Databases\PL_AdventureWorks2008R2_Data.mdf' , MOVE 'AdventureWorks2008R2_Log' TO 'C:\Temp\Databases\PL_AdventureWorks2008R2_Log.ldf'; GO ALTER DATABASE [PL_AdventureWorks2008R2] MODIFY FILE ( NAME = N'AdventureWorks2008R2_Data' , NEWNAME = N'PL_AdventureWorks2008R2_Data' ); GO ALTER DATABASE [PL_AdventureWorks2008R2] MODIFY FILE ( NAME = N'AdventureWorks2008R2_Log' , NEWNAME = N'PL_AdventureWorks2008R2_Log' ); GO SELECT [name], [physical_name], [type_desc] FROM [PL_AdventureWorks2008R2].sys.database_files; GO
Create source table.
USE PL_AdventureWorks2008R2; GO CREATE SCHEMA pl; GO IF (OBJECT_ID(N'PL_AdventureWorks2008R2.pl.TestTableSource', N'U') IS NOT NULL) BEGIN DROP TABLE PL_AdventureWorks2008R2.pl.TestTableSource; END GO SELECT SOH.CustomerID, SOD.ProductID , SOD.OrderQty, SOD.LineTotal INTO PL_AdventureWorks2008R2.pl.TestTableSource FROM Sales.SalesOrderHeader AS SOH JOIN Sales.SalesOrderDetail As SOD ON SOH.SalesOrderID = SOD.SalesOrderID; GO -- (121317 row(s) affected) -- Add a few fake rows INSERT PL_AdventureWorks2008R2.pl.TestTableSource ( CustomerID, ProductID , OrderQty, LineTotal ) SELECT CustomerID, ProductID , (OrderQty + (101 * RAND(CONVERT(VARBINARY, NEWID())))) , (LineTotal + (101 * RAND(CONVERT(VARBINARY, NEWID())))) FROM PL_AdventureWorks2008R2.pl.TestTableSource; GO 6 /* Beginning execution loop (121317 row(s) affected) (242634 row(s) affected) (485268 row(s) affected) (970536 row(s) affected) (1941072 row(s) affected) (3882144 row(s) affected) Batch execution completed 6 times. */ -- http://sqlperformance.com/2014/10/t-sql-queries/bad-habits-count-the-hard-way SELECT SUM(P.[rows]) FROM PL_AdventureWorks2008R2.sys.partitions AS P JOIN PL_AdventureWorks2008R2.sys.tables AS T ON P.[object_id] = T.[object_id] JOIN PL_AdventureWorks2008R2.sys.schemas AS S ON T.[schema_id] = S.[schema_id] WHERE P.[index_id] IN (0, 1) -- heap or clustered index AND T.[name] = N'TestTableSource' AND S.[name] = N'pl'; GO -- 7764288 -- Space used EXEC sp_spaceused 'PL_AdventureWorks2008R2.pl.TestTableSource'; GO
Turn on the statistics and the execution plan to know what is going on under the hood.
Turn them off while testing manually the execution time.
SET STATISTICS IO ON; -- OFF SET STATISTICS TIME ON; -- OFF GO -- Ctrl + M (Include Actual Execution Plan)
Test query.
IF (OBJECT_ID(N'tempdb.dbo.#TestTableResult', N'U') IS NOT NULL) BEGIN DROP TABLE #TestTableResult; END SELECT CustomerID, ProductID , SUM(OrderQty) AS Sum_OrderQty , SUM(LineTotal) AS Sum_LineTotal INTO #TestTableResult FROM PL_AdventureWorks2008R2.pl.TestTableSource GROUP BY CustomerID, ProductID; GO
Turn off the statistics and the actual execution plan and run the tests multiple times to observe the execution time.
Test 1: Heap
Primary key column = No, Indexes = No.
(79433 row(s) affected)
Execution Time (sec): 7, 7, 7, 7, 7
Test 2: Primary Key (clustered index)
Add column ID – identity, primary key
(79433 row(s) affected)
Execution Time (sec): 7, 7, 7, 7, 7
ALTER TABLE PL_AdventureWorks2008R2.pl.TestTableSource ADD ID INT IDENTITY(1, 1); ALTER TABLE PL_AdventureWorks2008R2.pl.TestTableSource ADD CONSTRAINT PK_TestTableSource PRIMARY KEY (ID); GO
Test 3: PK = Yes; Nonclustered Index = Yes
Key columns are the columns that will be used for grouping.
(79433 row(s) affected)
Execution Time (sec): 7, 7, 7, 7, 7
CREATE NONCLUSTERED INDEX IX_Test3 ON PL_AdventureWorks2008R2.pl.TestTableSource (CustomerID, ProductID); GO
Test 4: PK = Yes; Nonclustered Indes with Included Columns = Yes
Delete the previously created index and create a new one.
Key columns are the columns that will be used for grouping.
Included columns are the ones that will be aggregated.
(79433 row(s) affected)
Execution Time (sec): 4, 3, 3, 3, 3
DROP NONCLUSTERED INDEX IX_Test3 ON PL_AdventureWorks2008R2.pl.TestTableSource; GO CREATE NONCLUSTERED INDEX IX_Test4 ON PL_AdventureWorks2008R2.pl.TestTableSource (CustomerID, ProductID) INCLUDE (OrderQty, LineTotal); GO
Test 5: PK = No; Nonclustered Indes with Included Columns = Yes
Delete column ID
(79433 row(s) affected)
Execution Time (sec): 3, 3, 3, 3, 3
ALTER TABLE PL_AdventureWorks2008R2.pl.TestTableSource DROP CONSTRAINT PK_TestTableSource; ALTER TABLE PL_AdventureWorks2008R2.pl.TestTableSource DROP COLUMN ID; GO
Space used by the nonclustered index is The Price I Pay.
-- Space used EXEC sp_spaceused 'PL_AdventureWorks2008R2.pl.TestTableSource'; GO
Delete database.
USE [master]; GO IF (DB_ID(N'PL_AdventureWorks2008R2') IS NOT NULL) BEGIN ALTER DATABASE [PL_AdventureWorks2008R2] SET SINGLE_USER WITH ROLLBACK IMMEDIATE; DROP DATABASE [PL_AdventureWorks2008R2]; END GO
Delete C:\Temp\Backups\PL_AdventureWorks2008R2_Data.bak
Conclusion:
- Adding a noncluster index with key columns the grouped columns and included the aggregated columns is dividing the execution time by 2.33.
- The primary key (Column ID) is not adding a value to the performance.
- In this example the nonclustered index doubles the disk space used.
Keep it simple :-)





