This python code can be used to extract two files from Kafka in Azure Datalake (ADLS): extract/kafka/topic/topic_{YYYYMMDD_HHMMSS}.json – no duplicates (PK: parentId|id) extract/kafka/topic_history/topic_{YYYYMMDD_HHMMSS}.json – all the rows (PK: parentId|id|date_created) If case of error, the KafkaException is exported in a file with name error_topic_{YYYYMMDD_HHMMSS}.txt. ADF determines if there is an error, […]
Azure Data Lake (ADLS)
In this example we create ADF pipeline that extracts from SQL Server and saves in CSV files in Data Lake. The point is just to demonstrate the logic so you can edit it as you need. The extract is driven by SQL Server table with all the parameters for a […]
Upload file result.csv into Azure Data Lake (in mystorageaccount/mycontainer). Databricks commands: spark.conf.set() define the access key for the connection to Data Lake. The access key can be found in Azure Portal. To list a folder in ADLS, use dbutils.fs.ls(). To read a file in ADLS, use spark.read(). The result is […]
Databricks commands: Import library requests to be able to run HTTP requests. Define the parameters, the Basic Authentication attributes (username, password) and execute GET request. spark.conf.set() define the access key for the connection to Data Lake. The access key can be found in Azure Portal. Define the destination folder path […]
Create table and insert data: Databricks commands: The values for hostname, port, database and username can be found in the connection string of the database. Create a widget to pass parameters from Azure Data Factory. As many as needed parameters can be declared and used in a dynamic SQL query. […]