Kusto Chronicles: Turning Your Mutable Cosmos DB Data into an Immutable Audit Trail
Kusto, also known as Azure Data Explorer, is an append-only database optimized for fast queries and analytics. Unlike Cosmos DB, which is designed for transactional workloads with mutable data, Kusto provides a scalable, immutable record of data over time. This makes it an ideal choice for scenarios requiring historical tracking, auditing, and large-scale data analysis.
By integrating Cosmos DB with Kusto, we can create a historic record of changes made to data in Cosmos DB. This setup allows transactional applications to scale efficiently with Cosmos DB while ensuring that Kusto maintains a Day 2 reporting database, enabling auditable change tracking and analysis over time.
Configuring the Managed Identity
To ingest data from Cosmos DB into Kusto, I first set up a user-assigned managed identity on my Kusto cluster. This identity would allow the Kusto cluster to authenticate and access Cosmos DB without requiring explicit credentials. The Terraform configuration for this looked like this:
resource "azurerm_user_assigned_identity" "kusto" {
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
name = "mi-${var.application_name}-${var.environment_name}-kusto"
}
Once the identity was created, I attached it to the Kusto cluster so that it could operate using the assigned permissions:
resource "azurerm_kusto_cluster" "main" {
name = "${var.application_name}-${var.environment_name}"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
sku {
name = "Standard_E16d_v5"
capacity = 2
}
identity {
type = "UserAssigned"
identity_ids = [
azurerm_user_assigned_identity.kusto.id
]
}
}
Granting Cosmos DB Access
The next step was to ensure that the managed identity had the necessary permissions to read data from Cosmos DB. Initially, I assigned it the built-in “Reader” role for Cosmos DB’s data plane role assignments:
data "azurerm_cosmosdb_sql_role_definition" "reader" {
resource_group_name = azurerm_cosmosdb_account.main.resource_group_name
account_name = azurerm_cosmosdb_account.main.name
role_definition_id = "00000000-0000-0000-0000-000000000001"
}
resource "azurerm_cosmosdb_sql_role_assignment" "kusto_reader" {
resource_group_name = azurerm_resource_group.main.name
account_name = azurerm_cosmosdb_account.main.name
role_definition_id = data.azurerm_cosmosdb_sql_role_definition.reader.id
principal_id = azurerm_user_assigned_identity.kusto.principal_id
scope = azurerm_cosmosdb_account.main.id
}
Creating the Cosmos DB Connection
With permissions in place, I attempted to create a Cosmos DB connection for Kusto:
resource "azurerm_kusto_cosmosdb_data_connection" "main" {
name = "kusto-cosmos-ingestion-tasks"
location = azurerm_resource_group.main.location
cosmosdb_container_id = azurerm_cosmosdb_sql_container.tasks.id
kusto_database_id = azurerm_kusto_database.assessment.id
managed_identity_id = data.azurerm_user_assigned_identity.kusto_identity.id
table_name = "Tasks"
mapping_rule_name = "TasksJsonMapping"
retrieval_start_date = "2025-02-01T00:00:00.0Z"
}
These were designed to work with the Table and Mapping Rule that I already created in the Kusto Cluster.
.create-merge table Tasks (id:string, createdts:datetime, lastupdatedts:datetime, tenantId:string, status:string, source:string, rawdata:string) with (folder = "", docstring = "")
.create table Tasks ingestion json mapping "TasksJsonMapping" '[
{"Column": "id", "Properties": {"Path": "$.id"}},
{"Column": "createdts", "Properties": {"Path": "$.createdOn"}},
{"Column": "lastupdatedts", "Properties": {"Path": "$.lastUpdatedOn"}},
{"Column": "tenantId", "Properties": {"Path": "$.tenantId"}},
{"Column": "status", "Properties": {"Path": "$.status"}},
{"Column": "source", "Properties": {"Path": "$.source"}},
{"Column": "rawdata", "Properties": {"Path": "$"}}
]'
The table mapping defines how JSON data from Cosmos DB is transformed into structured columns in Kusto. Each entry in the mapping specifies a column name in Kusto and the corresponding JSON path in Cosmos DB where the data originates. This ensures that data ingested from Cosmos DB retains its structure and can be queried efficiently.
Authorization Failure: Management Read vs. Data Plane Read
At this point, I hit an error:
Error: creating Data Connection (Subscription: “a8dc551f-cbe8–47e9–87c1-d9570ac6d69d” │ Resource Group Name: “rg-tsg-core-workload-dev” │ Cluster Name: “tsg-core-workload-dev” │ Database Name: “tsg-svc-assessment” │ Data Connection Name: “kusto-cosmos-ingestion-tasks”): polling after CreateOrUpdate: polling failed: the Azure API returned the following error: │ │ Status: “Failed” │ Code: “BadInput” │ Message: “[BadRequest] {\”ErrorCode\”:\”CosmosDbAccountUnauthorized\”,\”ErrorMessage\”:\”The Managed Identity is not authorized to perform management read actions over the Cosmos DB account ‘{0}’\”,\”ErrorParameters\”:[\”/subscriptions/a8dc551f-cbe8–47e9–87c1-d9570ac6d69d/resourceGroups/rg-tsg-core-workload-dev/providers/Microsoft.DocumentDB/databaseAccounts/cosmos-tsg-core-workload-dev-5nwafc\”],\”HttpStatus\”:403}” │ Activity Id: “”
This is strange because the Managed Identity does have access to that Cosmos DB Account via the data plane role assignment. However maybe the error message is telling me that it is looking for something the data plane operations aren’t being given:
The Managed Identity is not authorized to perform management read actions over the Cosmos DB account
This was odd because the managed identity already had access via the data plane role assignment. However, the error specifically referenced “management read actions,” which is separate from data plane operations. After reviewing the available roles, I found two potential options:
- Cosmos DB Account Reader Role
- Cosmos DB Operator
Once I assigned the Cosmos DB Account Reader Role, I was able to create the data connection.
Encountering Service Maintenance Issues
While adding additional connections, I hit a different issue:
Error: creating Data Connection (Subscription: “a8dc551f-cbe8–47e9–87c1-d9570ac6d69d” │ Resource Group Name: “rg-tsg-core-workload-dev” │ Cluster Name: “tsg-core-workload-dev” │ Database Name: “tsg-svc-assessment” │ Data Connection Name: “kusto-cosmos-ingestion-results”): polling after CreateOrUpdate: polling failed: the Azure API returned the following error: │ │ Status: “Failed” │ Code: “ServiceIsInMaintenance” │ Message: “[Conflict] Cluster ‘tsg-core-workload-dev’ is in process of maintenance for a short period. You may retry to invoke the operation in a few minutes.” │ Activity Id: “” │ │ — - │ │ API Response: │ │ — — [start] — — │ {“id”:”/subscriptions/a8dc551f-cbe8–47e9–87c1-d9570ac6d69d/providers/Microsoft.Kusto/locations/westus/operationresults/a6cf0673-cefd-4e07–8670–1736ddc4691b”,”name”:”a6cf0673-cefd-4e07–8670–1736ddc4691b”,”status”:”Failed”,”startTime”:”2025–02–24T23:48:07.2473803Z”,”endTime”:”2025–02–24T23:48:07.2473813Z”,”percentComplete”:1.0,”properties”:{“operationKind”:”DmServiceCosmosDbDataConnectionAddOrUpdateCommand”,”provisioningState”:”Failed”,”operationState”:”BadInput”},”error”:{“code”:”ServiceIsInMaintenance”,”message”:”[Conflict] Cluster ‘tsg-core-workload-dev’ is in process of maintenance for a short period. You may retry to invoke the operation in a few minutes.”}} │ — — -[end] — — - │ │ │ with azurerm_kusto_cosmosdb_data_connection.results, │ on kusto.tf line 39, in resource “azurerm_kusto_cosmosdb_data_connection” “results”: │ 39: resource “azurerm_kusto_cosmosdb_data_connection” “results” { │ │ creating Data Connection (Subscription: “a8dc551f-cbe8–47e9–87c1-d9570ac6d69d” │ Resource Group Name: “rg-tsg-core-workload-dev” │ Cluster Name: “tsg-core-workload-dev” │ Database Name: “tsg-svc-assessment” │ Data Connection Name: “kusto-cosmos-ingestion-results”): polling after CreateOrUpdate: polling failed: │ the Azure API returned the following error: │ │ Status: “Failed” │ Code: “ServiceIsInMaintenance” │ Message: “[Conflict] Cluster ‘tsg-core-workload-dev’ is in process of maintenance for a short period. │ You may retry to invoke the operation in a few minutes.” │ Activity Id: “” │ │ — - │ │ API Response: │ │ — — [start] — — │ {“id”:”/subscriptions/a8dc551f-cbe8–47e9–87c1-d9570ac6d69d/providers/Microsoft.Kusto/locations/westus/operationresults/a6cf0673-cefd-4e07–8670–1736ddc4691b”,”name”:”a6cf0673-cefd-4e07–8670–1736ddc4691b”,”status”:”Failed”,”startTime”:”2025–02–24T23:48:07.2473803Z”,”endTime”:”2025–02–24T23:48:07.2473813Z”,”percentComplete”:1.0,”properties”:{“operationKind”:”DmServiceCosmosDbDataConnectionAddOrUpdateCommand”,”provisioningState”:”Failed”,”operationState”:”BadInput”},”error”:{“code”:”ServiceIsInMaintenance”,”message”:”[Conflict] │ Cluster ‘tsg-core-workload-dev’ is in process of maintenance for a short period. You may retry to invoke │ the operation in a few minutes.”}} │ — — -[end] — — -
This was suspicious. It made me think that the way these are added Terraform was trying to add them too quickly in succession. So I brought them back one at a time.
It looked like a small change in the retrieval start date caused the item to be drop-created.
# azurerm_kusto_cosmosdb_data_connection.tasks must be replaced
-/+ resource "azurerm_kusto_cosmosdb_data_connection" "tasks" {
~ id = "/subscriptions/a8dc551f-cbe8-47e9-87c1-d9570ac6d69d/resourceGroups/rg-tsg-core-workload-dev/providers/Microsoft.Kusto/clusters/tsg-core-workload-dev/databases/tsg-svc-assessment/dataConnections/kusto-cosmos-ingestion-tasks" -> (known after apply)
name = "kusto-cosmos-ingestion-tasks"
~ retrieval_start_date = "2025-02-01T00:00:00Z" -> "2025-02-01T00:00:00.0Z" # forces replacement
# (6 unchanged attributes hidden)
}
A simple formatting inconsistency in the date-time string led Terraform to treat the resource as needing replacement, which was completely unnecessary. Once I corrected the format, I was able to keep my existing setup intact. “00Z” instead of “0Z”. Outrageous!
Granting Kusto Permissions
Even after everything was configured, data wasn’t flowing. The issue? The User Assigned Identity also needed access to Kusto itself:
resource "azurerm_kusto_cluster_principal_assignment" "kusto_managed_id" {
name = "managed-id"
resource_group_name = azurerm_resource_group.main.name
cluster_name = azurerm_kusto_cluster.main.name
tenant_id = data.azurerm_client_config.current.tenant_id
principal_id = azurerm_user_assigned_identity.kusto.principal_id
principal_type = "App"
role = "AllDatabasesAdmin"
}
Once I added this role assignment, my data connections were in place and ready.
Conclusion
By setting up Cosmos DB ingestion into Kusto, we created a scalable solution for capturing a historical record of changes in transactional data. This enables organizations to preserve mutable data from Cosmos DB in an immutable, queryable format, supporting analytics, compliance, and auditability.
Throughout this process, we encountered several common pitfalls in automating this setup:
- User Assigned Identity vs. System Assigned Identity: When using a system-assigned identity, additional permissions via the data plane did not seem necessary. However, with a user-assigned identity, explicit role assignments were required to grant proper access.
- Cosmos DB Data Plane vs. Control Plane Role Assignments: Initially, we granted access via the data plane role assignment, but this was not sufficient. The error message revealed that management read actions required an additional control plane role assignment, such as the Cosmos DB Account Reader Role.
- Multiple Cosmos DB Connections in One Terraform Apply: Attempting to create multiple Cosmos DB connections in a single Terraform apply run appeared to trigger maintenance mode, preventing the provisioning of additional connections.
- Timestamp Formatting in Retrieval Start Date: A seemingly small formatting issue in the retrieval start date(2025-02-01T00:00:00Z vs. 2025-02-01T00:00:00.0Z) forced Terraform to delete and recreate the resource, leading to unnecessary disruptions.
Despite these challenges, once the necessary permissions and configurations were set correctly, we achieved a fully repeatable Terraform codebase that can manage this ingestion pipeline going forward. This ensures a scalable, automated approach to maintaining an immutable, analytics-ready dataset in Kusto while supporting transactional workloads in Cosmos DB.
Happy Azure Terraforming!!!