NOTE: Incomplete, ETC – 12 May 2019
Many people who read my Cosmos DB articles are looking for an effective way to export data to SQL, either on-demand or in real-time. After performing a search term analysis for my blog earlier this year, I had made up my mind about posting a solid article on exporting data from Cosmos DB to SQL Server.
Real-time ETL using Cosmos DB Change Feed and Azure Functions
In this article, we will focus on creating a data pipeline to ETL (Extract, Transform and Load) Cosmos DB container changes to a SQL Server database. My main requirements or design considerations are:
- Fault-tolerant and near real-time processing
- Incur minimum additional cost
- Simple to implement and maintain
Cosmos DB Change Feed
Cosmos DB Change Feed listens to Cosmos DB containers for changes and outputs the list of items that were changed in the chronological order of their modification. Cosmos DB Change Feed enables building efficient and scalable solutions for the following use cases:
- Triggering a notification or calling an API
- Real-time stream processing
- Downstream data movement or archiving
Types of operations
- Change feed tracks only inserts and updates, deletes are not tracked yet
- Cannot control change feed to track only one kind of operation, for example only inserts
- For tracking deletes in the Change Feed, workaround is to soft-delete and assign a small TTL (Time To Live) value of “n” to automatically delete the item after “n” seconds
- Change Feed can be read for historic items, as long as the items have not been deleted
- Change Feed items are available in order of their modification time, per logical partition key
Read more about Azure Cosmos DB Change Feed from Microsoft docs to gain a thorough understanding. Change Feed can be processed using Azure Functions or Change Feed Processor Library. In this article, we will use Azure Functions.
Azure Functions is an event-driven, serverless compute platform for easily running small pieces of code in Azure. Key points to note are:
- Write specific code for a problem without worrying about an application or the infrastructure to run it
- Use either C#, F#, Node.js, Java, or PHP for coding
- Pay only for the time your code runs and trust Azure to scale
Read more from Microsoft docs to understand full capabilities of Azure Functions.
If you use Consumption plan pricing, it includes a monthly free grant of 1 million requests and 400,000 GBs of resource consumption per month per subscription in pay-as-you-go pricing across all function apps in that subscription, as per MS docs.