Export data from Cosmos DB to SQL Server

Cosmos DB is Microsoft’s latest NoSQL database offering low latency, high scalability and geo-distribution to any Azure region. Read more about Cosmos DB here. In its infancy, Cosmos DB was known as Document DB. Renaming was inevitable as Document DB evolved beyond just a document store.

Following is a simple note on how to export data from Cosmos DB to SQL Server. You may want a dump of data from a Cosmos DB collection to SQL Server for analysis, data integrity checks, troubleshooting a production issue or to derive insights.

There are a few methods to export data from Cosmos DB. The quickest one is to use Document DB / Cosmos DB Migration Tool. This is a tool provided by Microsoft  to migrate data TO/FROM various sources such as MongoDB, JSON, csv and SQL Server to Cosmos DB.

1. Use Azure Cosmos DB Migration tool to export data to json files:

  • Install the tool and launch as Administrator (use an Azure VM for best performance). Please be mindful of spike in RU costs when exporting data from your collection. To avoid throttling, scale your collection up as required just before you do the export or export in off-peak hours, if any.  Leave a comment if you need any pointer on this.LaunchCosmosDBMigrationTool
  • Choose source – DocumentDB (CosmosDB aka DocumentDB)

CosmosDBMigrationTool-Source

  • Specify connection string. You can find endpoint and key from Keys section of your CosmosDB account in Azure portal. Please note that CosmosDB is case-sensitive.

CosmosDBMigrationTool-ConnectionString

  • A peek into Advanced options
    • Include internal fields – for each document, Cosmos DB maintains a set of auto-generated internal/system fields such as _ts and _self. Their names start with an underscore, making it easy to differentiate them from user fields. This option lets you include internal fields in the output. This is handy, especially the _ts field, which indicates when the document was last updated.
    • Number of retries on failure & Retry interval – Set a reasonable number of retries. In this case, I have used the value 1.
    • Connection Mode you want to use Gateway to get best performance and to bypass firewall rules.

CosmosDBMigrationTool-AdvancedOptions

  • Specify target information. In this case, we want to export to a json document. We could either output to a local file or a blob.

CosmosDBMigrationTool-TargetInformation

  • Error Logging. Set these options to enable error logging.

CosmosDBMigrationTool-ErrorLogging

Hit Import and if there are no errors, you will soon have a new json file with all the data from your collection. If you want only a subset of data, all you need to do is modify your source query.

2. Import json files to SQL Server:

SQL Server 2016 introduced a JSON parse function called OPENJSON. If none of your user databases are upgraded to 2016 yet, but you have a 2016 engine, context-switch to a system database to use OPENJSON.

DECLARE @productNutrition varchar(max);

--Read from the json file using openrowset
SELECT @productNutrition = BulkColumn
FROM OPENROWSET(BULK'C:\Users\Smruthi\Downloads\Arjun\productnutrition_20180427.json', SINGLE_BLOB) JSON;

--Pass the variable containing json as parameter to OPENJSON function
SELECT *
FROM OPENJSON (@productNutrition)
WITH
(
ProductID varchar(20) '$.id',
ProductDescription varchar(100) '$.description',
ProductGroup varchar(200) '$.foodGroup',
ServingAmount float '$.servings[1].amount',
ServingUnit varchar(10) '$.servings[1].description',
nutrients nvarchar(max) as json --note that json is case sensitive
)

Give it a go yourself. If you have any question, leave a comment and I will be happy to assist.

jsonfileSample

openjson Example

Advertisements

Live Query Statistics – SQL Server 2016

SQL Server Management Studio (SSMS) 2016 has a handy new feature to View Live Query Statistics. This feature also works when connected to a 2014 instance using SSMS 2016.

Why use it?

Previously, similar stats were available by doing so:

SET STATISTICS "Detail" ON;

With the new feature, the main difference is that as the query progresses, stats are refreshed, thus providing a live coverage of what SQL Server is up to. This is a fantastic feature to learn about query processing and execution plans, and it helps to identify performance problems. Only yesterday, I used it to identify a “bad plan” where SQL was doing an Eager Spool by cross-joining two million record tables, and thus filling tempdb.

How to use it?

To use the feature, simply click on “Include Live Query Statistics” button in the toolbar or under menu Query -> Include Live Query Statistics.
IncludeLiveQueryStatistics-SQLServer2016

Watch the magic unfold when you execute query.

LiveQueryStatistics-Example.png

 

If you are not using SQL Sentry Plan Explorer already, please install PRO version for free from Sentry One. I find it amazing and it’s probably the best free tool out there to do performance analysis. Install the SSMS add-in, and you can right click on plan and view plan in SQL Sentry Plan Explorer. It becomes incredibly useful for complex plans that are cumbersome to navigate in SQL Server. Not to mention the host of other info that’s presented nicely to make performance tuning enjoyable.

SQLSentryPlanExplorer-AddIn

SQLSentryPlanExplorer.jpg

Watch out for that..

Live Query Statistics is an interesting feature, but I had a nervous few moments when my long-running ETL stored procedure began to produce a plan that was way too big for SSMS to handle. I had to eventually cancel execution and view plan from DMV. So, while it is a nice feature, watch out for a few gotchas. Microsoft also advises that there might be a mild performance dip in execution of your query when you are using this feature.

Who can use it?

You need to have SHOWPLAN and VIEW SERVER STATE permissions to be able to use this feature effectively.

Overall, this is a great feature that makes information available at finger tips when you are possibly scrambling to address that performance issue that is driving everyone nuts!

XQuery

In 2005 version of SQL Server, XQuerying was introduced. It’s a powerful feature and when used together with a CTE, it helps to keep code simple and clean. I use XML extensively in my code and many a time, I find myself looking up code, searching the internet or trying things out myself. I hope this post will be a single point of reference to most of the usual XQuery coding.

Following is a simple script to read from an XML variable. Same code can be used to read from an XML column as well.

NOTE: I will append more XQuery code samples to this post.

DECLARE @xmlProduct XML = 
'<Catalog>
	<Product>
		<Name>Selle Italia Road</Name>
		<Code>SIR-1</Code>
		<Category>Saddle</Category>
		<Description>Sleek saddle from Selle Italia</Description>
		<UnitPrice>50</UnitPrice>
		<Currency>USD</Currency>
	</Product>
	<Product>
		<Name>Brooks Tourer</Name>
		<Code>BT-T</Code>
		<Category>Saddle</Category>
		<Description>Brooks leather saddle for touring</Description>
		<UnitPrice>120</UnitPrice>
		<Currency>USD</Currency>
	</Product>
</Catalog>
'
--Read from nodes
SELECT
c.value('Name[1]','varchar(50)') [ProductName],
c.value('Code[1]','varchar(50)') [Code],
c.value('Category[1]','varchar(50)') [Category],
c.value('UnitPrice[1]','varchar(50)') + CHAR(32) + c.value('Currency[1]','varchar(50)') [Price],
c.value('Description[1]','varchar(50)') [Description]
FROM 
@xmlProduct.nodes('//Catalog/Product') AS TAB(c)

Output:
XQueryOutput

Retrieve unsaved query

I was working on a remote machine and while I was away, the machine was restarted. Lost the scripts I had not saved, or so I thought; and then I came up with this – was a life saver!

SELECT TOP 10 
EST.text
FROM sys.dm_exec_query_stats EQS
CROSS APPLY sys.dm_exec_sql_text(eqs.sql_handle) EST
WHERE EST.text LIKE '%text%'
ORDER BY last_execution_time DESC

Find tables which are most commonly used in Stored Procedures

When you start working on a new project, one of the important tasks as a database developer is to get used to the table names. Here’s a script which will help you identify the most commonly used tables (assuming that the most commonly used are the most important).

SELECT TOP 50
t.name, COUNT(1) tcount
FROM INFORMATION_SCHEMA.routines r
INNER JOIN sys.tables t ON r.ROUTINE_NAME LIKE ('%' + t.name + '%')
GROUP BY t.name 
ORDER BY tcount desc