cveBuster — Sentinel MCP + Data Lake + Spark + Security Copilot Demo

This repo captures a hands-on spike to exercise four things end-to-end:

Sentinel MCP server features (querying the Data Lake via MCP tools)
Data Lake tables with KQL analytics and Spark notebooks
Security Copilot agents that use Sentinel MCP tools
A small vulnerability reporting demo app we built called “cveBuster” to generate, ingest, and analyze synthetic vuln data

What follows documents the artifacts we created, what they do, and how they fit together so you can reproduce or extend the demo.

1) cveBuster data model and ingestion (fake data → Data Lake table)

Files

tableCreation/cvebuster_v4.json — Synthetic vulnerability records used for testing.
tableCreation/sendData.py and tableCreation/sendDataCliCred.py — Python scripts that batch-ingest JSON into a Data Collection Endpoint/Rule (DCE/DCR) stream for a custom table.
tableCreation/tableCreator.ps1 — PowerShell helper to create a new Sentinel/Log Analytics custom table by copying schema (or using a BYOS JSON schema) and selecting the right plan (Analytics, Auxiliary/Data Lake, or Basic).

What we achieved

Defined a simple record shape for cveBuster, then generated fake data to simulate vulnerabilities and assets.
Created a custom Data Lake/Auxiliary table to land the data, using the table creator script to mirror schema and set retention.
Implemented two ingestion flows:
- App (client credentials) flow (sendData.py) to push batches to the DCR stream.
- User (Azure CLI) flow (sendDataCliCred.py) to avoid storing secrets during development.
Normalized a few fields on ingest to keep the schema consistent (for example, coercing CVSS to string for certain streams; normalizing EicarVM IP used in test scenarios).

Data model (expected fields)

MachineName, HostId, IPAddress, OSFamily
Application, AppFilePath
VulnId, VulnTitle, Severity, CVSS
ExploitAvailable, ExploitedInWild, PatchAvailable
FirstSeen, LastSeen, LastScanTime
AssetCriticality, BusinessOwner, Source

Notes and tips

Table creation: tableCreator.ps1 can copy an existing table’s schema or consume a JSON schema via -SchemaFile. It supports plans: Analytics, Auxiliary/Data Lake, and Basic, plus retention (-retention, -totalRetention). For Auxiliary/Data Lake tables that don’t support dynamic, use -ConvertToString to emit *_str string columns and include the transformKql snippet in your DCR.
Ingestion: Both send scripts POST JSON arrays to your DCR stream in batches. Set these values per environment: DCE_INGEST_URL, DCR_IMMUTABLE_ID, STREAM_NAME, INPUT_PATH, BATCH.
- sendData.py uses MSAL client credentials; never commit secrets. Prefer environment variables or a secret store.
- sendDataCliCred.py uses AzureCliCredential and requests a token for the audience https://monitor.azure.com/.default (no “ing”). This is convenient in dev where you’re already signed in with Azure CLI.

2) KQL analytics (cveBusterQueries.kql)

File

tableCreation/cveBusterQueries.kql

What we achieved

Verified table shape and row volume (total rows, distinct hosts/apps/CVEs) and built quick views of data freshness.
Profiled severity, CVSS distribution, exploitability, and patchability.
Produced a “prioritization” score per host combining CVSS, exploit flags, and simple asset weighting (e.g., IIS/Oracle boost) to rank top candidates for remediation.
Provided “quick wins” by grouping patchable high-CVSS items per owner and host.
Added targeted slices for a known test host/IP (EicarVM) to validate the pipeline end-to-end.
Summarized “first seen” CVEs and owner-level dashboards to support reporting.

Highlights (query themes)

Inventory and time series: counts, distinct dimensions, hourly bins
Severity breakdown and CVSS histogram (bucketing by 1.0)
Risk scoring roll-up by MachineName using RowScore = CVSS*10 + weights
Quick wins: PatchAvailable == true and CVSS >= threshold grouped by BusinessOwner, MachineName
Host spotlight: machine/IP focus including counts, max CVSS, latest timestamps, exploit flags
CVE “first seen” and owner summaries

These queries served double duty: validating the ingestion/schema and becoming the basis for agent prompts and notebook exploration.

3) Notebooks + Spark exploration (exploreScorev2)

Files

Notebooks+Spark/exploreScorev2.ipynb
Notebooks+Spark/exploreScorev2.job.yaml

What we achieved

Exercised a Spark-backed notebook workflow to explore the same cveBuster dataset at scale. The notebook is configured to run via a Livy-backed session (evidence in cell outputs), suitable for testing Spark against Data Lake content.
Built exploratory transformations and simple visuals (PNG outputs present) around the same scoring logic used in KQL, letting us compare approaches and validate business rules outside the KQL context.
Proved we can schedule or operationalize the notebook via the job YAML for repeatable runs.

Typical steps in the notebook

Session/bootstrap: initialize Spark session via Livy
Load and prep: read vulnerability rows (e.g., from the exported lake table or a derived dataset)
Feature/score: compute numeric CVSS, exploit/asset weights, and a per-row score; aggregate per machine
Visualize: quick distributions and top-N plots to validate scoring behavior

Run notes

The notebook includes multiple code cells with Livy metadata and image outputs; it’s meant for a Spark-enabled environment. Use your preferred Spark runtime and align paths/auth to your workspace.

4) Security Copilot agents using Sentinel MCP tools

Files

Agents/cveBusterServiceAgent.yaml and Agents/cveBusterServiceAgentV6.yaml
Agents/cveBusterQuickWinsAgent.yaml
Agents/cveBusterCriticalIntelAgent.yaml
Reference: sentinelMCP/MCPToolAvailableforAgentBuilding.md

What we achieved

Authored three Security Copilot agents that rely on Sentinel MCP tooling to query the data lake and produce actionable outputs:
1. Service Agent (V3/V6 variants)
  - Skills: Prioritize (Top-N hosts to remediate), QuickWins (by owner), MachineDetails (per-host/IP details). V6 also adds a “Dashboard” skill combining sections and includes optional SecurityAlert overlay.
  - Tooling: Uses list_sentinel_workspaces and query_lake; preflight checks ensure tools are attached and data is present for the lookback window.
  - Output: Clear, tabular summaries with rationale columns (Score, MaxCvss, Vulns, exploitability, patchability, sample IP), with friendly guidance when no data is returned.
2. Quick Wins Agent
  - Focuses solely on patchable, high-CVSS findings grouped by owner/host, offering a concise target list for fast remediation.
3. Critical Intel Agent
  - Lists the most critical CVEs observed in your environment and, when available, enriches them via a Threat Intelligence briefing tool, adding short IOC/context notes per CVE.

Design patterns we used

Fail-fast preflight in instructions: verify tool bindings and minimal data presence; otherwise return a clear action message.
Let MCP compose KQL when possible by describing the required filters/aggregations in instructions; include concrete KQL only when helpful.
Keep outputs predictable: small set of labeled columns and ordered results fit well for downstream use and human scanning.

Deploying/testing

Use the Security Copilot portal to create and deploy agents using the provided YAML definitions. Ensure the “Descriptor: Name” matches exactly when deploying, and attach the required Sentinel MCP tools (at minimum: list_sentinel_workspaces and query_lake; add ThreatIntelligenceBriefing for the Intel agent).
See sentinelMCP/MCPToolAvailableforAgentBuilding.md for tool workflow, including start/compose/deploy flows and evaluation handling.

How the pieces connect

Generate synthetic vuln data → Ingest to custom table via DCR stream → Analyze with KQL
Explore scoring and validate assumptions in Spark notebook → adjust logic if needed
Encode the same logic into Security Copilot agents backed by Sentinel MCP tools → get repeatable, guided outputs for operators

Repository guide

Agents
- cveBusterServiceAgent.yaml, cveBusterServiceAgentV6.yaml — Service agent variants (priorities, quick wins, details, dashboard)
- cveBusterQuickWinsAgent.yaml — Standalone quick-wins agent
- cveBusterCriticalIntelAgent.yaml — Critical CVE + Threat Intel enrichment
Notebooks+Spark
- exploreScorev2.ipynb — Spark notebook for scoring exploration and visuals
- exploreScorev2.job.yaml — Example job config to schedule the notebook
sentinelMCP
- MCPToolAvailableforAgentBuilding.md — Reference notes for Security Copilot MCP agent flows/tools
tableCreation
- cvebuster_v4.json — Synthetic data used for ingestion
- sendData.py — Ingest via client credentials (use secrets securely)
- sendDataCliCred.py — Ingest via Azure CLI credential (no stored secrets; dev-friendly)
- cveBusterQueries.kql — KQL library used throughout the demo
- tableCreator.ps1 — Create/clone tables and adjust plan/retention; supports BYOS schema

Security and operational notes

Do not commit secrets. The sample sendData.py shows client credential flow for illustration only; replace with environment variables or a secret store.
Validate DCR stream schemas: some streams treat CVSS as string. The scripts include options to coerce CVSS to string to match the stream’s expected type.
Auxiliary/Data Lake tables do not support dynamic columns; either avoid them or use -ConvertToString and include the transformKql in your DCR.
For agent runs, bind the Sentinel MCP tools and test the smallest lookback window first to catch ingestion issues early.

— If you want, we can add a small end-to-end “Try it” guide next that wires up one environment’s DCR/stream, runs an ingestion batch on Windows PowerShell, executes the KQLs, and shows the agent outputs in Security Copilot.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cveBuster — Sentinel MCP + Data Lake + Spark + Security Copilot Demo

1) cveBuster data model and ingestion (fake data → Data Lake table)

2) KQL analytics (cveBusterQueries.kql)

3) Notebooks + Spark exploration (exploreScorev2)

4) Security Copilot agents using Sentinel MCP tools

How the pieces connect

Repository guide

Security and operational notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Agents		Agents
Notebooks		Notebooks
TableCreation		TableCreation
sentinelMCP		sentinelMCP
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

cveBuster — Sentinel MCP + Data Lake + Spark + Security Copilot Demo

1) cveBuster data model and ingestion (fake data → Data Lake table)

2) KQL analytics (cveBusterQueries.kql)

3) Notebooks + Spark exploration (exploreScorev2)

4) Security Copilot agents using Sentinel MCP tools

How the pieces connect

Repository guide

Security and operational notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages