Databricks


Python: Extract from Kafka with Azure Data Factory (Synapse) and Databricks 2

This python code can be used to extract two files from Kafka in Azure Datalake (ADLS): extract/kafka/topic/topic_{YYYYMMDD_HHMMSS}.json – no duplicates (PK: parentId|id) extract/kafka/topic_history/topic_{YYYYMMDD_HHMMSS}.json – all the rows (PK: parentId|id|date_created) If case of error, the KafkaException is exported in a file with name error_topic_{YYYYMMDD_HHMMSS}.txt. ADF determines if there is an error, […]


Azure Databricks: Read/Write files from/to Azure Data Lake

Upload file result.csv into Azure Data Lake (in mystorageaccount/mycontainer). Databricks commands: spark.conf.set() define the access key for the connection to Data Lake. The access key can be found in Azure Portal. To list a folder in ADLS, use dbutils.fs.ls(). To read a file in ADLS, use spark.read(). The result is […]


Azure Databricks: Extract from REST API and save JSON file in Azure Data Lake

Databricks commands: Import library requests to be able to run HTTP requests. Define the parameters, the Basic Authentication attributes (username, password) and execute GET request. spark.conf.set() define the access key for the connection to Data Lake. The access key can be found in Azure Portal. Define the destination folder path […]