T-SQL: Parse “regular” and XML Column(s) to Table


XML is dead, long live JSON!

Even before i knew JSON, i thought “XML contains more metadata than data”. Today in my new development, i prefer JSON and avoid using XML, but in some cases we need to fight with XML too.

In this quick example i show how to query table columns, XML columns and parse to table format.

Create database:

USE [master];
GO
IF (DB_ID(N'PL') IS NOT NULL)
  BEGIN
    ALTER DATABASE PL SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
    DROP DATABASE PL;
  END
GO

CREATE DATABASE PL;
GO

Create table with regular and XML column(s):

USE [master];
GO
IF (DB_ID(N'PL') IS NOT NULL)
  BEGIN
    ALTER DATABASE PL SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
    DROP DATABASE PL;
  END
GO

CREATE DATABASE PL;
GO

USE PL;
GO

DROP TABLE IF EXISTS XMLTest;
GO

CREATE TABLE dbo.XMLTest
(
  ID INT IDENTITY(1, 1) NOT NULL
  , [Name] VARCHAR(128) NOT NULL
  , XMLColumn XML
  , CONSTRAINT PK__XMLTest PRIMARY KEY (ID)
);
GO

INSERT dbo.XMLTest ([Name], XMLColumn)
SELECT 'Test 1', '<Customers>
	<Customer CustomerID="C001" CustomerName="Arshad Ali">
		<Orders>
			<Order OrderID="10248" OrderDate="2012-07-04T00:00:00">
				<OrderDetail ProductID="10" Quantity="5" />
				<OrderDetail ProductID="11" Quantity="12" />
				<OrderDetail ProductID="42" Quantity="10" />
			</Order>
		</Orders>
		<Address>Address line 1, 2, 3</Address>
    <Phone>+1 555-6666</Phone>
	</Customer>
	<Customer CustomerID="C002" CustomerName="Paul Henriot">
		<Orders>
			<Order OrderID="10245" OrderDate="2011-07-04T00:00:00">
				<OrderDetail ProductID="11" Quantity="17" />
				<OrderDetail ProductID="42" Quantity="6" />
			</Order>
		</Orders>
		<Address></Address>
    <Phone>+1 555-7777</Phone>
	</Customer>
		<Customer CustomerID="C003" CustomerName="Carlos Gonzlez">
		<Orders>
			<Order OrderID="10283" OrderDate="2012-08-16T00:00:00">
				<OrderDetail ProductID="72" Quantity="3" />
			</Order>
		</Orders>
		<Address>Address line 1, 4, 5</Address>
    <Phone>+1 555-8888</Phone>
	</Customer>
</Customers>'
UNION ALL SELECT 'Test 2', '<Customers>
	<Customer CustomerID="C004" CustomerName="Ali Baba">
		<Orders>
			<Order OrderID="51324" OrderDate="2017-08-04T00:00:00">
				<OrderDetail ProductID="17" Quantity="50" />
				<OrderDetail ProductID="1" Quantity="3" />
				<OrderDetail ProductID="2" Quantity="15" />
			</Order>
		</Orders>
		<Address>Address line 5, 2, 8</Address>
    <Phone></Phone>
	</Customer>
	<Customer CustomerID="C005" CustomerName="Sven Tarsen">
		<Orders>
			<Order OrderID="53276" OrderDate="2019-12-27T00:00:00">
				<OrderDetail ProductID="113" Quantity="8" />
				<OrderDetail ProductID="426" Quantity="84" />
			</Order>
		</Orders>
    <Phone>+1 555-4444</Phone>
	</Customer>
		<Customer CustomerID="C006" CustomerName="Chi Hui">
		<Orders>
			<Order OrderID="43267" OrderDate="2021-05-17T00:00:00">
				<OrderDetail ProductID="512" Quantity="51" />
			</Order>
		</Orders>
		<Address>Address line 1, 4, 5</Address>
    <Phone>+1 555-3333</Phone>
	</Customer>
</Customers>';
GO

SELECT *
FROM dbo.XMLTest;
GO

Query the table:

SELECT
  X.ID
  , X.[Name]
  , 'XMLColumn -->' AS [XMLColumn -->]
  , NULLIF(OD.value('../../../@CustomerID', 'VARCHAR(128)'), '') AS CustomerID -- @ is attribute
  , NULLIF(OD.value('../../../@CustomerName', 'VARCHAR(128)'), '') AS CustomerName -- NULLIF() to replace empty strings
  , NULLIF(OD.value('../../../Address[1]', 'VARCHAR(128)'), '') AS [Address] -- [1] is The first occurence of element
  , NULLIF(OD.value('../../../Phone[1]', 'VARCHAR(128)'), '') AS [Phone] -- ../ is 'select from the parent node'
  , NULLIF(OD.value('../@OrderID', 'INT'), '') AS OrderID
  , NULLIF(OD.value('../@OrderDate', 'DATETIME'), '') AS OrderDate
  , NULLIF(OD.value('@ProductID', 'INT'), '') AS ProductID
  , NULLIF(OD.value('@Quantity', 'INT'), '') AS Quantity
FROM
  XMLTest AS X
  CROSS APPLY XMLColumn.nodes('/Customers/Customer/Orders/Order/OrderDetail') AS O(OD); -- JOIN 'Order Detail';
GO

Result:

Explanation

When i look at the XML column schema:

I understand that it was extracted from this model:

First: there are two 1:M relations. That means that i need to CROSS APPLY the related tables to each other and to the regular columns.

Second: i get the XML attributes with @ and the elements with an index. I bravely use index 1, because if Address or Phone are multiple, we need a nested structure as shown above (separate table for Address or Phone).

Third: To keep NULL, instead of empty string for the missing values, i use function NULLIF(). I use it everywhere, but this is not needed for the NOT NULL columns in the source tables. As NULL is “Unknown”, you can ignore this and return NULL for the missing elements or attributes and empty string for the empty ones.

Forth: I JOIN OrderDetail (the last child) and get all the parent attributes/elements from the parent nodes.

Fifth: Customer Sven Tarsen doesn’t have element “Address” in the XML.

Keep it simple :-)

Leave a comment

Your email address will not be published. Required fields are marked *