Azure Cosmos DB – Real-time ETL using Change Feed and Azure Functions

NOTE: Incomplete, ETC – 12 May 2019

Many people who read my Cosmos DB articles are looking for an effective way to export data to SQL, either on-demand or in real-time. After performing a search term analysis for my blog earlier this year, I had made up my mind about posting a solid article on exporting data from Cosmos DB to SQL Server.

CosmosDBSearches_SQLRoadie.jpg

Real-time ETL using Cosmos DB Change Feed and Azure Functions

In this article, we will focus on creating a data pipeline to ETL (Extract, Transform and Load) Cosmos DB container changes to a SQL Server database. My main requirements or design considerations are:

  • Fault-tolerant and near real-time processing
  • Incur minimum additional cost
  • Simple to implement and maintain

Cosmos DB Change Feed

Cosmos DB Change Feed listens to Cosmos DB containers for changes and outputs the list of items that were changed in the chronological order of their modification. Cosmos DB Change Feed enables building efficient and scalable solutions for the following use cases:

  • Triggering a notification or calling an API
  • Real-time stream processing
  • Downstream data movement or archiving

Types of operations

  • Change feed tracks only inserts and updates, deletes are not tracked yet
  • Cannot control change feed to track only one kind of operation, for example only inserts
  • For tracking deletes in the Change Feed, workaround is to soft-delete and assign a small TTL (Time To Live) value of “n” to automatically delete the item after “n” seconds
  • Change Feed can be read for historic items, as long as the items have not been deleted
  • Change Feed items are available in order of their modification time, per logical partition key

Read more about Azure Cosmos DB Change Feed from Microsoft docs to gain a thorough understanding. Change Feed can be processed using Azure Functions or Change Feed Processor Library. In this article, we will use Azure Functions.

Azure Functions

Azure Functions is an event-driven, serverless compute platform for easily running small pieces of code in Azure. Key points to note are:

  • Write specific code for a problem without worrying about an application or the infrastructure to run it
  • Use either C#, F#, Node.js, Java, or PHP for coding
  • Pay only for the time your code runs and trust Azure to scale

Read more from Microsoft docs to understand full capabilities of Azure Functions.

If you use Consumption plan pricing, it includes a monthly free grant of 1 million requests and 400,000 GBs of resource consumption per month per subscription in pay-as-you-go pricing across all function apps in that subscription, as per MS docs.

Compare hosting plans and check out pricing details for Azure Functions at the Functions pricing page to gain a thorough understanding of pricing options.

One thought on “Azure Cosmos DB – Real-time ETL using Change Feed and Azure Functions

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s