When working with Azure Synapse Serverless, we store the data in Parquet Files, not in a database. These Parquet Files in Data Lake are external tables in the serverless database. One limitation related to the external tables exists – we can’t DML them, i.e. we can SELECT, but not INSERT or DELETE. The error is “DML Operations are not supported with external tables”
A possible runaround is:
- SELECT from the external table (the Parquet file)
- Manipulate the data:
- Append new row – INSERT
- Put the result in temp table – UPDATE (for relatively small tables)
- Save back the external table (the Parquet file)
Create EXTERNAL TABLES
USE PL_1;
-- Schema
CREATE SCHEMA cfg;
-- External Data Source
IF NOT EXISTS (SELECT 1 FROM sys.external_data_sources WHERE [name] = 'eds_cfg')
BEGIN
CREATE EXTERNAL DATA SOURCE eds_cfg
WITH (LOCATION = 'https://plstorageaccount1.dfs.core.windows.net/cfg');
END
-- External File Format
IF NOT EXISTS (SELECT 1 FROM sys.external_file_formats WHERE [name] = 'eff_parquet')
BEGIN
CREATE EXTERNAL FILE FORMAT eff_parquet
WITH (FORMAT_TYPE = PARQUET);
END
-- External Table
-- cfg.table_to_extract
IF (OBJECT_ID(N'cfg.table_to_extract') IS NOT NULL)
BEGIN DROP EXTERNAL TABLE cfg.table_to_extract; END
CREATE EXTERNAL TABLE cfg.table_to_extract
(
process_name SYSNAME -- To use this table for more than one process (ETL)
, server_name SYSNAME -- Server to extract from
, [database_name] SYSNAME -- Database to extract from
, [schema_name] SYSNAME -- Schema to extract from
, table_name SYSNAME -- Table to extract from
, parameter VARCHAR(512) DEFAULT ('') -- Injected in the WHERE clause
, [query] VARCHAR(8000) DEFAULT ('') -- If not blank, use it instead of the default query
, file_system_name VARCHAR(128) -- The ADLS container name
, directory_name_bronze VARCHAR(128) -- The ADLS directory where the extracted files will be saved (bronze)
, [file_name] VARCHAR(128) -- The name of the extracted file (<filename>.<extension>)
, is_active BIT DEFAULT (1) -- Only the active roes are manilupated
)
WITH (
LOCATION = 'table_to_extract.parquet'
, DATA_SOURCE = [eds_cfg]
, FILE_FORMAT = [eff_parquet]
);
Azure Synapse Pipelines
Click to see the JSON.
create_parquet_files
Source is just an UNIONed SELECT statement:
Sink is the Parquet file:
insert_in_parquet_file
Source is the external table (cfg.table_to_extract) and UNIONed row to INSERT:
Sink is the same as above.
update_parquet_file
Source is the following query:
IF NOT EXISTS
(
SELECT 1
FROM sys.procedures
WHERE
SCHEMA_NAME([schema_id]) = 'cfg'
AND [name] = 'get_table_to_extract'
)
BEGIN
EXEC('CREATE PROC cfg.get_table_to_extract
AS
BEGIN
SELECT
[process_name], [server_name], [database_name], [schema_name], [table_name]
, [parameter], [query], [file_system_name], [directory_name_bronze], [file_name], [is_active]
FROM cfg.table_to_extract
END;');
END
CREATE TABLE #table_to_extract
(
process_name SYSNAME
, server_name SYSNAME
, [database_name] SYSNAME
, [schema_name] SYSNAME
, table_name SYSNAME
, parameter VARCHAR(512)
, [query] VARCHAR(8000)
, file_system_name VARCHAR(128)
, directory_name_bronze VARCHAR(128)
, [file_name] VARCHAR(128)
, is_active BIT
);
INSERT INTO #table_to_extract
([process_name], [server_name], [database_name], [schema_name], [table_name]
, [parameter], [query], [file_system_name], [directory_name_bronze], [file_name], [is_active])
EXEC cfg.get_table_to_extract
UPDATE #table_to_extract
SET server_name = 'updated_server_name'
WHERE process_name = 'PL_Test';
SELECT *
FROM #table_to_extract;
The stored procedure can be created once and removed from this statement.
Sink is the same as above.
Keep it simple :-)




