In this example I play with the performance of a query that groups the data.
Clone database.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
USE [master]; GO IF (DB_ID(N'PL_AdventureWorks2008R2') IS NOT NULL) BEGIN ALTER DATABASE [PL_AdventureWorks2008R2] SET SINGLE_USER WITH ROLLBACK IMMEDIATE; DROP DATABASE [PL_AdventureWorks2008R2]; END GO SELECT [name] FROM sys.databases WHERE [name] = N'PL_AdventureWorks2008R2'; GO BACKUP DATABASE [AdventureWorks2008R2] TO DISK = N'C:\Temp\Backups\PL_AdventureWorks2008R2_Data.bak' WITH INIT; GO RESTORE DATABASE [PL_AdventureWorks2008R2] FROM DISK = N'C:\Temp\Backups\PL_AdventureWorks2008R2_Data.bak' WITH RECOVERY , MOVE 'AdventureWorks2008R2_Data' TO 'C:\Temp\Databases\PL_AdventureWorks2008R2_Data.mdf' , MOVE 'AdventureWorks2008R2_Log' TO 'C:\Temp\Databases\PL_AdventureWorks2008R2_Log.ldf'; GO ALTER DATABASE [PL_AdventureWorks2008R2] MODIFY FILE ( NAME = N'AdventureWorks2008R2_Data' , NEWNAME = N'PL_AdventureWorks2008R2_Data' ); GO ALTER DATABASE [PL_AdventureWorks2008R2] MODIFY FILE ( NAME = N'AdventureWorks2008R2_Log' , NEWNAME = N'PL_AdventureWorks2008R2_Log' ); GO SELECT [name], [physical_name], [type_desc] FROM [PL_AdventureWorks2008R2].sys.database_files; GO |
Create source table.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 |
USE PL_AdventureWorks2008R2; GO CREATE SCHEMA pl; GO IF (OBJECT_ID(N'PL_AdventureWorks2008R2.pl.TestTableSource', N'U') IS NOT NULL) BEGIN DROP TABLE PL_AdventureWorks2008R2.pl.TestTableSource; END GO SELECT SOH.CustomerID, SOD.ProductID , SOD.OrderQty, SOD.LineTotal INTO PL_AdventureWorks2008R2.pl.TestTableSource FROM Sales.SalesOrderHeader AS SOH JOIN Sales.SalesOrderDetail As SOD ON SOH.SalesOrderID = SOD.SalesOrderID; GO -- (121317 row(s) affected) -- Add a few fake rows INSERT PL_AdventureWorks2008R2.pl.TestTableSource ( CustomerID, ProductID , OrderQty, LineTotal ) SELECT CustomerID, ProductID , (OrderQty + (101 * RAND(CONVERT(VARBINARY, NEWID())))) , (LineTotal + (101 * RAND(CONVERT(VARBINARY, NEWID())))) FROM PL_AdventureWorks2008R2.pl.TestTableSource; GO 6 /* Beginning execution loop (121317 row(s) affected) (242634 row(s) affected) (485268 row(s) affected) (970536 row(s) affected) (1941072 row(s) affected) (3882144 row(s) affected) Batch execution completed 6 times. */ -- http://sqlperformance.com/2014/10/t-sql-queries/bad-habits-count-the-hard-way SELECT SUM(P.[rows]) FROM PL_AdventureWorks2008R2.sys.partitions AS P JOIN PL_AdventureWorks2008R2.sys.tables AS T ON P.[object_id] = T.[object_id] JOIN PL_AdventureWorks2008R2.sys.schemas AS S ON T.[schema_id] = S.[schema_id] WHERE P.[index_id] IN (0, 1) -- heap or clustered index AND T.[name] = N'TestTableSource' AND S.[name] = N'pl'; GO -- 7764288 -- Space used EXEC sp_spaceused 'PL_AdventureWorks2008R2.pl.TestTableSource'; GO |
Turn on the statistics and the execution plan to know what is going on under the hood.
Turn them off while testing manually the execution time.
1 2 3 4 5 |
SET STATISTICS IO ON; -- OFF SET STATISTICS TIME ON; -- OFF GO -- Ctrl + M (Include Actual Execution Plan) |
Test query.
1 2 3 4 5 6 7 8 9 10 11 |
IF (OBJECT_ID(N'tempdb.dbo.#TestTableResult', N'U') IS NOT NULL) BEGIN DROP TABLE #TestTableResult; END SELECT CustomerID, ProductID , SUM(OrderQty) AS Sum_OrderQty , SUM(LineTotal) AS Sum_LineTotal INTO #TestTableResult FROM PL_AdventureWorks2008R2.pl.TestTableSource GROUP BY CustomerID, ProductID; GO |
Turn off the statistics and the actual execution plan and run the tests multiple times to observe the execution time.
Test 1: Heap
Primary key column = No, Indexes = No.
(79433 row(s) affected)
Execution Time (sec): 7, 7, 7, 7, 7
Test 2: Primary Key (clustered index)
Add column ID – identity, primary key
(79433 row(s) affected)
Execution Time (sec): 7, 7, 7, 7, 7
1 2 3 4 5 6 |
ALTER TABLE PL_AdventureWorks2008R2.pl.TestTableSource ADD ID INT IDENTITY(1, 1); ALTER TABLE PL_AdventureWorks2008R2.pl.TestTableSource ADD CONSTRAINT PK_TestTableSource PRIMARY KEY (ID); GO |
Test 3: PK = Yes; Nonclustered Index = Yes
Key columns are the columns that will be used for grouping.
(79433 row(s) affected)
Execution Time (sec): 7, 7, 7, 7, 7
1 2 3 |
CREATE NONCLUSTERED INDEX IX_Test3 ON PL_AdventureWorks2008R2.pl.TestTableSource (CustomerID, ProductID); GO |
Test 4: PK = Yes; Nonclustered Indes with Included Columns = Yes
Delete the previously created index and create a new one.
Key columns are the columns that will be used for grouping.
Included columns are the ones that will be aggregated.
(79433 row(s) affected)
Execution Time (sec): 4, 3, 3, 3, 3
1 2 3 4 5 6 7 8 |
DROP NONCLUSTERED INDEX IX_Test3 ON PL_AdventureWorks2008R2.pl.TestTableSource; GO CREATE NONCLUSTERED INDEX IX_Test4 ON PL_AdventureWorks2008R2.pl.TestTableSource (CustomerID, ProductID) INCLUDE (OrderQty, LineTotal); GO |
Test 5: PK = No; Nonclustered Indes with Included Columns = Yes
Delete column ID
(79433 row(s) affected)
Execution Time (sec): 3, 3, 3, 3, 3
1 2 3 4 5 6 |
ALTER TABLE PL_AdventureWorks2008R2.pl.TestTableSource DROP CONSTRAINT PK_TestTableSource; ALTER TABLE PL_AdventureWorks2008R2.pl.TestTableSource DROP COLUMN ID; GO |
Space used by the nonclustered index is The Price I Pay.
1 2 3 |
-- Space used EXEC sp_spaceused 'PL_AdventureWorks2008R2.pl.TestTableSource'; GO |
Delete database.
1 2 3 4 5 6 7 8 9 |
USE [master]; GO IF (DB_ID(N'PL_AdventureWorks2008R2') IS NOT NULL) BEGIN ALTER DATABASE [PL_AdventureWorks2008R2] SET SINGLE_USER WITH ROLLBACK IMMEDIATE; DROP DATABASE [PL_AdventureWorks2008R2]; END GO |
Delete C:\Temp\Backups\PL_AdventureWorks2008R2_Data.bak
Conclusion:
- Adding a noncluster index with key columns the grouped columns and included the aggregated columns is dividing the execution time by 2.33.
- The primary key (Column ID) is not adding a value to the performance.
- In this example the nonclustered index doubles the disk space used.
Keep it simple :-)