T-SQL: Aggregate Columns (CASE or Normalization + Aggregation)


In this example I will compare the execution of two possible options to aggregate multiple columns.
I have data columns A, B and C and I need to show the average.

  • “CASE” is using a formula ((A + B + C) / (1, 2 or 3)), simulating the scenarios when one or more columns (A, B, C) is NULL (divide by 1, 2 or 3).
  • “Normalization and Aggregation” is using a subquery to normalize the columns (A, B, C) and calculates the AVG().

A logic to replace 0 with NULL and not include it in the aggregation is applied in both cases.

I will execute the queries five times to check manualy the execution times.

Create test database, test table and insert 1,000,000 random rows.

Create database.

Create SourceTable.

Source Table

Aggregate Data1, Data2 and Data3 by using CASE statement.

CASE Result In the estimated execution and Actual execution plans we see that 1,000,000 rows are selected and calculated with no joins. Estimated Execution Plan. CASE Estimated Execution Plan Actual Execution Plan. CASE Actual Execution Plan Show/hide STATISTICS

CASE STATISTICS

Aggregate Data1, Data2 and Data3 by normalizing with subquery and AVG().

Normalization and Aggregation Result

The estimated execution plan shows that the aggregations are on every line (3 rows looped on every line).

Normalization and Aggregation Estimated Execution Plan

The actual execution plan shows that the aggregations are done once (1,000,000 rows * 3 columns = 3,000,000 rows). The Query optimizer decides to use a set-based operation instead of aggregation on every row.

Normalization and Aggregation Actual Execution Plan

Comparing the reads shows the same and the timea are close. It’s little slower when we normalize, because of the join of the data.

Normalization and Aggregation STATISTICS

Delete database.

 Conclusion:

  • The query optimizer gave me the answer to use “Normalization and Aggregation” option.
  • When using the “CASE” code, make sure that all the possible scenarios are covered.
  • In both cases, check if the data table is using 0 for NULL and decide to not/exclude 0 in the aggregation.
  • This exact example acts like a table-valued function, which is executed once. The scalar function is executed in every line.

Keep it simple :-)

Leave a comment

Your email address will not be published. Required fields are marked *