Keep it simple
Keep it simple

Data Architecture: Design of a Transactional Database That Will Serve a Data Warehouse

In my projects i need to design a transactional database and since the beginning i know that it will be the source of an ETL, feeding a Data Warehouse.

I add Primary Key (single column or composite) and standardized metadata columns in all the tables. The date-tracking columns:

  • date_created
  • date_last_modified

Option 1

  • date_created = now()
  • date_last_modified = NULL

In MariaDB my CREATE TABLE statement includes:

…
, date_created DATETIME NOT NULL DEFAULT current_timestamp()
, date_last_modified DATETIME DEFAULT NULL ON UPDATE current_timestamp()

Later, when i need to query these columns, the code is:

SELECT …
FROM …
ORDER BY COALESCE(date_last_modified, date_created) DESC;

Option 2

  • date_created = now()
  • date_last_modified = now()

In this case, i can use column date_last_modified independently and when both column equal, i know that this row was never updated.

Option 3

  • date_last_modified = now()

In this case i use only column date_last_modified and i lose track of when this row was created.

Data Warehouse

On the Data Warehouse side, when ingesting into Bronze, i use column date_last_modified to select only the UPSERTed rows since the last execution:

SELECT <columns list>
FROM <table name>
WHERE
	date_last_modified     >= <datetime of the last execution>
	AND date_last_modified < <datetime now>;

Side note: CDC option in SQL Server is Change Tracking. I wrote some posts on this topic.

When i create the silver layer, i determine the last version of the row:

SELECT T.*
FROM
	(
		SELECT ROW_NUMBER() OVER 
			(
				PARTITION BY '<pk columns list>'
				ORDER BY date_last_modified DESC
			) AS R
			, *
		FROM <schema name>.<table name>
	) AS T
WHERE T.R = 1;

To generate the list of the Primary Key columns:

SELECT
    STRING_AGG(KU.COLUMN_NAME, CONCAT(CHAR(39), ',', CHAR(39))) AS columns_list
FROM
    INFORMATION_SCHEMA.TABLE_CONSTRAINTS     AS TC
    JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE AS KU ON  TC.CONSTRAINT_TYPE = 'PRIMARY KEY'
                                                   AND TC.CONSTRAINT_NAME = KU.CONSTRAINT_NAME
WHERE KU.TABLE_NAME = @table_name;

Keep it simple :-)

Leave a comment

Your email address will not be published. Required fields are marked *

Time limit exceeded. Please complete the captcha once again.