When working with Azure Synapse Serverless, we store the data in Parquet Files, not in a database. These Parquet Files in Data Lake are external tables in the serverless database. One limitation related to the external tables exists – we can’t DML them, i.e. we can SELECT, but not INSERT or DELETE. The error is “DML Operations are not supported with external tables”
A possible runaround is:
- SELECT from the external table (the Parquet file)
- Manipulate the data:
- Append new row – INSERT
- Put the result in temp table – UPDATE (for relatively small tables)
- Save back the external table (the Parquet file)
Table of Contents
Create EXTERNAL TABLES
USE PL_1; -- Schema CREATE SCHEMA cfg; -- External Data Source IF NOT EXISTS (SELECT 1 FROM sys.external_data_sources WHERE [name] = 'eds_cfg') BEGIN CREATE EXTERNAL DATA SOURCE eds_cfg WITH (LOCATION = 'https://plstorageaccount1.dfs.core.windows.net/cfg'); END -- External File Format IF NOT EXISTS (SELECT 1 FROM sys.external_file_formats WHERE [name] = 'eff_parquet') BEGIN CREATE EXTERNAL FILE FORMAT eff_parquet WITH (FORMAT_TYPE = PARQUET); END -- External Table -- cfg.table_to_extract IF (OBJECT_ID(N'cfg.table_to_extract') IS NOT NULL) BEGIN DROP EXTERNAL TABLE cfg.table_to_extract; END CREATE EXTERNAL TABLE cfg.table_to_extract ( process_name SYSNAME -- To use this table for more than one process (ETL) , server_name SYSNAME -- Server to extract from , [database_name] SYSNAME -- Database to extract from , [schema_name] SYSNAME -- Schema to extract from , table_name SYSNAME -- Table to extract from , parameter VARCHAR(512) DEFAULT ('') -- Injected in the WHERE clause , [query] VARCHAR(8000) DEFAULT ('') -- If not blank, use it instead of the default query , file_system_name VARCHAR(128) -- The ADLS container name , directory_name_bronze VARCHAR(128) -- The ADLS directory where the extracted files will be saved (bronze) , [file_name] VARCHAR(128) -- The name of the extracted file (<filename>.<extension>) , is_active BIT DEFAULT (1) -- Only the active roes are manilupated ) WITH ( LOCATION = 'table_to_extract.parquet' , DATA_SOURCE = [eds_cfg] , FILE_FORMAT = [eff_parquet] );
Azure Synapse Pipelines
Click to see the JSON.
create_parquet_files
Source is just an UNIONed SELECT statement:
Sink is the Parquet file:
insert_in_parquet_file
Source is the external table (cfg.table_to_extract) and UNIONed row to INSERT:
Sink is the same as above.
update_parquet_file
Source is the following query:
IF NOT EXISTS ( SELECT 1 FROM sys.procedures WHERE SCHEMA_NAME([schema_id]) = 'cfg' AND [name] = 'get_table_to_extract' ) BEGIN EXEC('CREATE PROC cfg.get_table_to_extract AS BEGIN SELECT [process_name], [server_name], [database_name], [schema_name], [table_name] , [parameter], [query], [file_system_name], [directory_name_bronze], [file_name], [is_active] FROM cfg.table_to_extract END;'); END CREATE TABLE #table_to_extract ( process_name SYSNAME , server_name SYSNAME , [database_name] SYSNAME , [schema_name] SYSNAME , table_name SYSNAME , parameter VARCHAR(512) , [query] VARCHAR(8000) , file_system_name VARCHAR(128) , directory_name_bronze VARCHAR(128) , [file_name] VARCHAR(128) , is_active BIT ); INSERT INTO #table_to_extract ([process_name], [server_name], [database_name], [schema_name], [table_name] , [parameter], [query], [file_system_name], [directory_name_bronze], [file_name], [is_active]) EXEC cfg.get_table_to_extract UPDATE #table_to_extract SET server_name = 'updated_server_name' WHERE process_name = 'PL_Test'; SELECT * FROM #table_to_extract;
The stored procedure can be created once and removed from this statement.
Sink is the same as above.
Keep it simple :-)