Change Data Capture Engine and Kusto Query
The Change Data Capture (CDC) Engine provides scalable detection and processing of data changes that trigger lifecycle events in EmpowerID's identity lifecycle automation. Rather than manually coding change detection logic for each workflow event, the CDC Engine leverages SQL Server's native Change Data Capture capabilities combined with Azure Log Analytics and Kusto Query Language to automatically track, analyze, and respond to identity attribute changes.
This architecture enables EmpowerID to detect complex lifecycle events—such as mover events triggered by multiple attribute changes within specific timeframes—without requiring custom code for each scenario.
The Challenge of Capturing Data Changes
EmpowerID's event engine enables users to create and configure various flow events for joiners, movers, new mailbox discoveries, and security breach events without coding. After defining these events, the engine evaluates them against established policies to determine appropriate workflows to execute in response. During these workflows, data changes occur throughout the Identity Warehouse, and capturing exactly what changed presents significant challenges:
Complexity of Event Detection
- Some events cannot be mapped through simple rule-based processes
- A "mover" event might occur when two out of three attributes (manager, department, or job title) change within 24 hours
- Different organizations define lifecycle events differently based on their business requirements
- Writing custom code for each event type is neither scalable nor maintainable
Scale and Performance Requirements
- EmpowerID manages thousands of identity changes across large organizations
- Real-time or near-real-time detection is required for security and compliance
- Change tracking must not impact transactional performance of the Identity Warehouse
- Historical change data must be retained for audit and compliance purposes
The CDC Engine Solution
EmpowerID addresses these challenges by implementing a scalable Change Data Capture solution that:
- Applies CDC selectively to specific tables and attributes in the Identity Warehouse
- Generates comprehensive changelogs of all modifications automatically
- Transmits change data to Azure Log Analytics for advanced querying
- Leverages Kusto Query Language for flexible event detection logic
- Triggers workflows when specific change patterns are identified
This approach eliminates the need for manual coding of change detection logic while providing the flexibility to define complex event criteria through queries rather than code.
Architecture Components
The CDC Engine consists of several integrated Azure and SQL Server components working together to detect and process identity changes:
EmpowerID Identity Warehouse
The Identity Warehouse is hosted in Azure SQL Database and serves as the central repository for identity data. When CDC is enabled on specific tables, SQL Server automatically creates:
- CDC History Tables - Named with "_CT" (Change Tracking) suffix, these tables store all insert, update, and delete operations with metadata columns tracking change type, time, and sequence
- Deleted Records Tables - Store records of deleted entries for audit purposes
- System Functions - Provide queryable interfaces to retrieve changes within specific time ranges or sequence numbers
CDC operates at the database level and requires minimal configuration once enabled. The feature is available by default in Azure SQL Database.
PowerShell Scripts
PowerShell scripts deployed in Azure Kubernetes Cluster as containers perform the data transformation and transfer functions:
Master Script
- Runs an infinite loop with 5-minute intervals
- Triggers the change streaming script on schedule
- Manages container lifecycle and error recovery
Change Streaming Script (change_streaming_script.ps1)
- Loads required PowerShell modules (SQLServer, Az.Accounts, Az.OperationalInsights)
- Retrieves configuration from Container Secrets (credentials, workspace IDs)
- Verifies CDC is enabled at database level
- Reads configuration from config.json to identify tables and columns to monitor
- Executes the ChangeStreaming stored procedure to retrieve and transform CDC data
- Pushes transformed JSON data to Azure Log Analytics Workspace
- Executes Kusto queries against Log Analytics data
- Inserts qualifying results into BusinessRequestFlowEventInbox table
Configuration File (config.json)
- Specifies tables and columns for CDC monitoring
- Allows developers to extend CDC coverage by adding entries
- Provides flexibility to customize change detection scope without code changes
Azure Log Analytics Workspace
Log Analytics serves as the intermediate storage and querying layer for change data:
- Data Ingestion - Receives JSON-formatted change data from PowerShell scripts every 5 minutes
- Automatic Schema Creation - Converts JSON data into queryable tables automatically
- Retention Policies - Maintains change history according to configured retention periods
- Query Performance - Optimized for time-series and log data queries
Kusto Query Language
Kusto Query Language (KQL) provides powerful data analysis capabilities similar to SQL but optimized for log and time-series data:
- Complex Event Detection - Identify patterns across multiple attribute changes and time windows
- Aggregation and Filtering - Group changes by person, attribute type, or time period
- Real-time Analysis - Execute queries against streaming change data
- Flexible Criteria - Define event detection logic through queries rather than code
Example Kusto queries can identify scenarios such as:
- Department changes within 24 hours of manager changes (mover event)
- Multiple failed login attempts followed by successful authentication (security event)
- New accounts created without associated Person objects (joiner event requiring resolution)
BusinessRequestFlowEventInbox Table
The BusinessRequestFlowEventInbox table serves as the integration point between change detection and workflow execution:
- Query results from Kusto are inserted as records with specific event types
- Each record includes the PersonID affected and event-specific data
- The Initiator is set to 'EmpowerIDSystem' to indicate automated detection
- EmpowerID jobs monitor this table and trigger configured workflows based on event type
- Workflows execute business logic, notifications, and provisioning actions
CDC Engine Architecture Flow
Detailed Process Flow
-
Data Modification by EmpowerID Workflows
- EmpowerID workflows or flow events execute Insert, Delete, or Update (DML) statements against the Identity Warehouse
- These operations modify tables configured for Change Data Capture
-
Change Data Capture in SQL Database
- SQL Server CDC automatically records changes in dedicated history tables
- Metadata columns track operation type (insert/update/delete), timestamp, and sequence number
- Changes are recorded transactionally with zero impact on application performance
-
PowerShell Script Execution
- Master script triggers change streaming script every 5 minutes
- Script authenticates to Azure using credentials from Container Secrets
- Verifies CDC is enabled at database level before proceeding
-
Configuration Reading
- Script reads config.json to determine which tables and columns to process
- Configuration allows selective CDC monitoring without code changes
-
Stored Procedure Execution
- ChangeStreaming stored procedure is called with table name and transfer mode parameters
- Procedure dynamically generates queries to retrieve changes from CDC tables
- Data is transformed into user-readable JSON format including before/after values
-
Data Transfer to Log Analytics
- JSON change data is pushed to Azure Log Analytics Workspace via REST API
- Log Analytics automatically converts JSON to tabular format for querying
- Sequence numbers are tracked to prevent data loss or duplication during interruptions
-
Kusto Query Execution
- Predefined Kusto queries execute against Log Analytics data
- Queries identify changes matching specific lifecycle event criteria
- Results include PersonID and relevant change details
-
Event Inbox Population
- Query results are inserted into BusinessRequestFlowEventInbox table
- Records are typed according to event category (e.g., 'Person Mover')
- Initiator is set to 'EmpowerIDSystem' to indicate automated detection
-
Workflow Triggering
- EmpowerID jobs continuously monitor the BusinessRequestFlowEventInbox table
- When new records are detected, corresponding workflows are triggered
- Workflows execute business logic, approvals, notifications, and provisioning actions
Data Transfer Modes
The CDC Engine supports three transfer modes to manage data flow from SQL Server to Log Analytics, balancing performance, reliability, and flexibility:
Transfer Mode 1: Automatic (Recommended)
Configuration: Set runmode parameter to 1
Behavior:
- System identifies the last sequence number successfully transferred to Log Analytics
- Only processes changes generated since that sequence number
- Prevents redundant data processing and duplicate transfers
- Automatically resumes from interruption point if script or container restarts
Use Case: Normal production operation with continuous change processing
Transfer Mode 2: Viewing
Configuration: Set runmode parameter to 2
Behavior:
- Retrieves and displays change data without transferring to Log Analytics
- Used for inspection, testing, and validation purposes
- Does not update sequence tracking or modify any data
Use Case: Development, testing, and troubleshooting of CDC queries
Transfer Mode 3: Custom Range
Configuration: Set runmode parameter to 3
Behavior:
- Allows manual specification of starting and ending sequence numbers
- Transfers a specific range of changes to Log Analytics
- Useful for backfilling data or reprocessing specific time periods
Use Case: Historical data transfer, error recovery, or selective reprocessing
The sequence-based tracking ensures no data is lost during interruptions and prevents duplicate processing across container restarts.
Example: Mover Event Detection
The following example illustrates how the CDC Engine detects and processes a mover event:
Scenario
A Person's Department attribute changes in the Identity Warehouse, indicating a potential organizational transfer:
-
Attribute Change Occurs
- HR system updates employee department from "Sales" to "Marketing"
- EmpowerID Attribute Flow synchronizes change to Person table
- SQL CDC records the change in Person_CT table
-
Change Detection
- PowerShell script executes ChangeStreaming procedure every 5 minutes
- Procedure queries Person_CT table for recent changes
- Transforms CDC data showing Department change from "Sales" to "Marketing"
-
Log Analytics Processing
- Change data pushed to Log Analytics as JSON
- Kusto query executes to identify mover patterns:
ChangeData_CL
| where TableName == "Person"
| where ColumnName == "Department"
| where PreviousValue != NewValue
| where PreviousValue != "" // Must have had department before
| where TimeGenerated > ago(5m)
| project PersonID, PreviousValue, NewValue, TimeGenerated -
Event Creation
- Query result inserted into BusinessRequestFlowEventInbox
- EventType set to "Person Mover"
- Initiator set to "EmpowerIDSystem"
- EventJsonData includes department change details
-
Workflow Execution
- Mover workflow triggered for the Person
- Business Role and Location Recompiler recalculates role assignments
- Provisioning policies adjust access based on new department
- Manager and IT team receive notifications
- Access to previous department resources reviewed for revocation
Advanced Mover Detection
More complex mover scenarios can be detected through sophisticated Kusto queries:
Multiple Attribute Changes in Time Window:
ChangeData_CL
| where TableName == "Person"
| where ColumnName in ("Department", "Manager", "JobTitle")
| where TimeGenerated > ago(24h)
| summarize ChangeCount = count(), Changes = make_set(ColumnName) by PersonID
| where ChangeCount >= 2
| project PersonID, ChangeCount, Changes
This query identifies Persons with two or more significant attribute changes within 24 hours, triggering comprehensive mover workflows rather than simple attribute updates.
Implementation Considerations
Performance Optimization
CDC Impact on Database:
- SQL Server CDC operates asynchronously with minimal transaction impact
- CDC history tables consume additional storage proportional to change volume
- Consider retention policies and cleanup jobs for CDC tables in high-change environments
PowerShell Script Execution:
- 5-minute execution interval balances timeliness with resource utilization
- Can be adjusted based on organizational requirements for change detection latency
- Container resource allocation should account for peak change volumes
Log Analytics Costs:
- Data ingestion and retention costs scale with change volume
- Consider data retention policies aligned with audit requirements
- Use Kusto query optimization to minimize compute costs
Extensibility
Adding Tables to CDC:
- Update config.json to include additional tables and columns
- Enable CDC on the table at database level
- Customize ChangeStreaming stored procedure if special transformations needed
- Define new Kusto queries for event detection patterns
Custom Event Types:
- Define new event types beyond joiner/mover/leaver patterns
- Create corresponding Kusto queries for detection logic
- Configure workflows to respond to new event types
- Update BusinessRequestFlowEventInbox processing to handle new types
Monitoring and Troubleshooting
Verification Queries:
Check recent mover events detected by CDC:
SELECT * FROM dbo.BusinessRequestFlowEventInbox
WHERE EventJsonData LIKE '%TimeGenerated%'
ORDER BY CreatedDate DESC
Verify CDC is enabled on database:
SELECT name, is_cdc_enabled
FROM sys.databases
WHERE name = 'EmpowerID'
Check CDC tables for specific table:
SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_Person(@from_lsn, @to_lsn, 'all')
Common Issues:
- CDC not enabled: Verify CDC is activated at database level before table level
- Script failures: Check Container Secret values for authentication credentials
- Missing changes: Verify config.json includes desired tables and columns
- Duplicate events: Confirm sequence tracking is functioning correctly in Automatic mode
Integration with Identity Lifecycle
The CDC Engine integrates with identity lifecycle management at multiple points:
Joiner Processes
- Detection of new Person objects created via Account Inbox
- Identification of initial role assignments requiring approval
- Monitoring of onboarding workflow completion status
Mover Processes
- Detection of attribute changes indicating role or location changes
- Identification of manager changes requiring workflow initiation
- Monitoring of access changes requiring review or approval
Leaver Processes
- Detection of termination date population in Person records
- Identification of account disabling events
- Monitoring of cleanup workflow execution status
Compliance and Audit
- Complete change history for identity attributes
- Audit trail of automated workflow triggers
- Compliance evidence for policy-based provisioning
Related Documentation
For additional context on components referenced in the CDC Engine architecture:
- EmpowerID Identity Warehouse - Central repository architecture
- Inventory - Account discovery and Person provisioning
- Account Inbox - Join and Provision rules for new accounts
- Attribute Flow - Bidirectional synchronization with connected systems
- Business Role and Location Assignments - Role recalculation triggered by CDC events
For workflow configuration using detected events:
- No Code Flows - Workflow orchestration responding to CDC events
- Provisioning Policies - Automated provisioning triggered by role changes