Azure Synapse: Runaround “DML Operations are not supported with external tables”


When working with Azure Synapse Serverless, we store the data in Parquet Files, not in a database. These Parquet Files in Data Lake are external tables in the serverless database. One limitation related to the external tables exists – we can’t DML them, i.e. we can SELECT, but not INSERT or DELETE. The error is “DML Operations are not supported with external tables”

A possible runaround is:

  1. SELECT from the external table (the Parquet file)
  2. Manipulate the data:
    1. Append new row – INSERT
    2. Put the result in temp table – UPDATE (for relatively small tables)
  3. Save back the external table (the Parquet file)

Create EXTERNAL TABLES

USE PL_1;

-- Schema
CREATE SCHEMA cfg;

-- External Data Source
IF NOT EXISTS (SELECT 1 FROM sys.external_data_sources WHERE [name] = 'eds_cfg')
  BEGIN
    CREATE EXTERNAL DATA SOURCE eds_cfg
    WITH (LOCATION = 'https://plstorageaccount1.dfs.core.windows.net/cfg');
  END

-- External File Format
IF NOT EXISTS (SELECT 1 FROM sys.external_file_formats WHERE [name] = 'eff_parquet')
  BEGIN
    CREATE EXTERNAL FILE FORMAT eff_parquet
    WITH (FORMAT_TYPE = PARQUET);
  END

-- External Table
-- cfg.table_to_extract
IF (OBJECT_ID(N'cfg.table_to_extract') IS NOT NULL)
  BEGIN DROP EXTERNAL TABLE cfg.table_to_extract; END

CREATE EXTERNAL TABLE cfg.table_to_extract
(
  process_name            SYSNAME                    -- To use this table for more than one process (ETL)
  , server_name           SYSNAME                    -- Server to extract from
  , [database_name]       SYSNAME                    -- Database to extract from
  , [schema_name]         SYSNAME                    -- Schema to extract from
  , table_name            SYSNAME                    -- Table to extract from
  , parameter             VARCHAR(512)  DEFAULT ('') -- Injected in the WHERE clause
  , [query]               VARCHAR(8000) DEFAULT ('') -- If not blank, use it instead of the default query
  , file_system_name      VARCHAR(128)               -- The ADLS container name
  , directory_name_bronze VARCHAR(128)               -- The ADLS directory where the extracted files will be saved (bronze)
  , [file_name]           VARCHAR(128)               -- The name of the extracted file (<filename>.<extension>)
  , is_active             BIT           DEFAULT (1)  -- Only the active roes are manilupated
)
WITH (
  LOCATION = 'table_to_extract.parquet'
  , DATA_SOURCE = [eds_cfg]
  , FILE_FORMAT = [eff_parquet]
);

Azure Synapse Pipelines

Click to see the JSON.

create_parquet_files

Source is just an UNIONed SELECT statement:

Sink is the Parquet file:

insert_in_parquet_file

Source is the external table (cfg.table_to_extract) and UNIONed row to INSERT:

Sink is the same as above.

update_parquet_file

Source is the following query:

IF NOT EXISTS
(
  SELECT 1
  FROM sys.procedures
    WHERE
      SCHEMA_NAME([schema_id]) = 'cfg'
      AND [name] = 'get_table_to_extract'
)
BEGIN
  EXEC('CREATE PROC cfg.get_table_to_extract
    AS
    BEGIN
      SELECT
        [process_name], [server_name], [database_name], [schema_name], [table_name]
        , [parameter], [query], [file_system_name], [directory_name_bronze], [file_name], [is_active]
      FROM cfg.table_to_extract
    END;');
END

CREATE TABLE #table_to_extract
(
  process_name            SYSNAME
  , server_name           SYSNAME
  , [database_name]       SYSNAME
  , [schema_name]         SYSNAME
  , table_name            SYSNAME
  , parameter             VARCHAR(512)
  , [query]               VARCHAR(8000)
  , file_system_name      VARCHAR(128)
  , directory_name_bronze VARCHAR(128)
  , [file_name]           VARCHAR(128)
  , is_active             BIT
);


INSERT INTO #table_to_extract
  ([process_name], [server_name], [database_name], [schema_name], [table_name]
  , [parameter], [query], [file_system_name], [directory_name_bronze], [file_name], [is_active])
EXEC cfg.get_table_to_extract

UPDATE #table_to_extract
SET server_name = 'updated_server_name'
WHERE process_name = 'PL_Test';

SELECT *
FROM #table_to_extract;

The stored procedure can be created once and removed from this statement.

Sink is the same as above.

Keep it simple :-)


About Peter Lalovsky

I am Microsoft SQL Server certified professional, creating with T-SQL, SSRS, SSIS, ASP.NET/C#, Azure, Python, PowerShell and more on a daily basis since year 2006. In 2016 i wrote a book for beginner and intermediate T-SQL programmers which you can download here. This blog is something like my personal programming documentation. When i am not in front of a computer, i am around my paper car – Trabant 601.

Leave a comment

Your email address will not be published.