Azure: ETL with Synapse Serverless Database


Azure Synapse Analytics offers the so called “serverless” database. It keeps the data in .parquet files. One thing i was suspicious with was the data types in the .parquet file. I decided to test if they are all NVARCHAR(4000) or we can use the standard SQL Server data types.

In this example i create a little POC about these data types and show you in general the structure of an ETL with Synapse Serverless Database.

ADLS

In Azure Data Lake create the following containers:

  • bronze
  • silver
  • gold

Describe bronze, silver, and gold architecture

In each container create folders for each table that will be manipulated.

Synapse

SQL Databases

In Azure Synapse create the destination serverless database and its objects.

Execute the statements one by one (highlight and click Run).

CREATE DATABASE my_database_name;
USE my_database_name;

-- Schema
CREATE SCHEMA bronze;
CREATE SCHEMA silver;
CREATE SCHEMA gold;

-- External Data Source
IF NOT EXISTS (SELECT 1 FROM sys.external_data_sources WHERE [name] = 'eds_bronze')
    BEGIN
        CREATE EXTERNAL DATA SOURCE eds_bronze
        WITH (LOCATION = 'https://<storage-account-name>.dfs.core.windows.net/bronze');
    END

IF NOT EXISTS (SELECT 1 FROM sys.external_data_sources WHERE [name] = 'eds_silver')
    BEGIN
        CREATE EXTERNAL DATA SOURCE eds_silver
        WITH (LOCATION = 'https://<storage-account-name>.dfs.core.windows.net/silver');
    END

IF NOT EXISTS (SELECT 1 FROM sys.external_data_sources WHERE [name] = 'eds_gold')
    BEGIN
        CREATE EXTERNAL DATA SOURCE eds_gold
        WITH (LOCATION = 'https://<storage-account-name>.dfs.core.windows.net/gold');
    END

-- External File Format
IF NOT EXISTS (SELECT 1 FROM sys.external_file_formats WHERE [name] = 'eff_parquet')
    BEGIN
        CREATE EXTERNAL FILE FORMAT eff_parquet
        WITH (FORMAT_TYPE = PARQUET);
    END

-- External Table
-- silver.table_1
IF (OBJECT_ID(N'silver.table_1') IS NOT NULL)
    BEGIN DROP EXTERNAL TABLE silver.table_1; END

CREATE EXTERNAL TABLE silver.table_1 (
    [score] NVARCHAR(4000)
    , [title] NVARCHAR(4000)
    , [date_created] NVARCHAR(4000)
)
WITH (
    LOCATION = 'table_1/table_1.parquet',
    DATA_SOURCE = [eds_silver],
    FILE_FORMAT = [eff_parquet]
);

SELECT TOP (100) *
FROM silver.table_1;

-- gold.table_1
IF (OBJECT_ID(N'gold.table_1') IS NOT NULL)
    BEGIN DROP EXTERNAL TABLE gold.table_1; END

CREATE EXTERNAL TABLE gold.table_1 (
    [score] DECIMAL(18, 10)
    , [title] NVARCHAR(4000)
    , [date_created] DATETIME
)
WITH (
    LOCATION = 'table_1/table_1.parquet',
    DATA_SOURCE = [eds_gold],
    FILE_FORMAT = [eff_parquet]
);

SELECT TOP (100) *
FROM gold.table_1;

Linked Services

From Azure Portal , copy Serverless SQL endpoint.

In Synapse create linked service.:

Search for Azure Synapse Analytics:

Click Continue.

Integration datasets

In Synapse create integration dataset:

Pipelines

Extract from source to bronze

Skipped in this document.

Copy from bronze to silver

Source is the file in bronze.

Sink in parquet file in silver.

The destination datatypes are all UTF8, because all the columns in table silver.table_1 are NVARCHAR(4000).

Copy from silver to gold

We use the dataset that we created.

The source query:

SELECT
    CAST([score]          AS DECIMAL(18, 10)) AS [score]
    , CAST([title]        AS NVARCHAR(4000))  AS [title]
    , CAST([date_created] AS DATETIME)        AS date_created
FROM silver.result_1;

Sink in parquet file in gold.

The destination datatypes are the same as the columns in table gold.table_1.

Keep it simple :-)


About Peter Lalovsky

I am Microsoft SQL Server certified professional, creating with T-SQL, SSRS, SSIS, ASP.NET/C#, Azure, Python, PowerShell and more on a daily basis since year 2006. In 2016 i wrote a book for beginner and intermediate T-SQL programmers which you can download here. This blog is something like my personal programming documentation. When i am not in front of a computer, i am around my paper car – Trabant 601.

Leave a comment

Your email address will not be published.