NULL and Aggregation (AVG())


While we create the database architecture, we decide to use the special value NULL or replace it with empty string or zero. Later on the development stage we manipulate the data and when we need to aggregate, we have to keep in mind the NULL or “blank” columns. The funny part is while using AVG().

Create a temp table in my playground.

-- Create temp table for the example
IF (OBJECT_ID('tempdb.dbo.#PL1') IS NOT NULL)
	BEGIN DROP TABLE #PL1; END
CREATE TABLE #PL1 (A FLOAT(1), B FLOAT(1), C FLOAT(1))
GO

INSERT #PL1
SELECT 5 AS A, NULL AS B, NULL AS C
UNION SELECT 15, NULL, NULL
UNION SELECT NULL, 1, 0
UNION SELECT NULL, 10, NULL
UNION SELECT NULL, NULL, 3
UNION SELECT NULL, NULL, 9
GO

Aggregate the data in a table that respects the NULL value.
SELECT *
FROM #PL1
GO

SELECT 1 AS [Order], MIN(A) AS A, MIN(B) AS B, MIN(C) AS C, 'MIN' AS [Type] FROM #PL1
UNION SELECT 2, MAX(A), MAX(B), MAX(C), 'MAX' AS [Type] FROM #PL1
UNION SELECT 3, AVG(A), AVG(B), AVG(C), 'AVG' AS [Type] FROM #PL1
UNION SELECT 4, SUM(A), SUM(B), SUM(C), 'SUM' AS [Type] FROM #PL1
ORDER BY [Order];
GO
-- Warning: Null value is eliminated by an aggregate or other SET operation.

1

A message “Warning: Null value is eliminated by an aggregate or other SET operation.” appears to tell us that the NULL value is not included in the aggregations. The numbers are good:

  • MIN of column A is 5; MAX is 15
  • AVG of column B is 5.5 –> ((1 + 10) / 2)…

Simulate a table that doesn’t respect the NULL value and uses empty string and zero instead.

UPDATE #PL1
	SET
		A = ISNULL(A, 0)
		, B = ISNULL(B, 0)
		, C = ISNULL(C, 0);
GO

Run the same example
SELECT *
FROM #PL1
GO

SELECT 1 AS [Order], MIN(A) AS A, MIN(B) AS B, MIN(C) AS C, 'MIN' AS [Type] FROM #PL1
UNION SELECT 2, MAX(A), MAX(B), MAX(C), 'MAX' AS [Type] FROM #PL1
UNION SELECT 3, AVG(A), AVG(B), AVG(C), 'AVG' AS [Type] FROM #PL1
UNION SELECT 4, SUM(A), SUM(B), SUM(C), 'SUM' AS [Type] FROM #PL1
ORDER BY [Order]
-- No Warning Message

2

There is no warning message, but the numbers…

  • MIN of A is 0, but 0 replaces NULL (Nothing) and infact it is not an actual number, representing the data; MAX is OK (if we store only negative values, all of them less than 0, it will be wrong)
  • AVG of B is wrong, as 0 isn’t representing a real data again –> (0 + 1 + 10) / 6 = 1.83

Conclusion:
SUM is OK as adding 0 to any summing doesn’t change it.
MIN and MAX are OK in some cases, when they don’t contradict the business logic.
AVG is the aggregate that needs a special attention as we are not sure which of the values are real zero and which are replacements of NULL.

Keep it simple :-)