This repo captures a hands-on spike to exercise four things end-to-end:
- Sentinel MCP server features (querying the Data Lake via MCP tools)
- Data Lake tables with KQL analytics and Spark notebooks
- Security Copilot agents that use Sentinel MCP tools
- A small vulnerability reporting demo app we built called “cveBuster” to generate, ingest, and analyze synthetic vuln data
What follows documents the artifacts we created, what they do, and how they fit together so you can reproduce or extend the demo.
Files
tableCreation/cvebuster_v4.json— Synthetic vulnerability records used for testing.tableCreation/sendData.pyandtableCreation/sendDataCliCred.py— Python scripts that batch-ingest JSON into a Data Collection Endpoint/Rule (DCE/DCR) stream for a custom table.tableCreation/tableCreator.ps1— PowerShell helper to create a new Sentinel/Log Analytics custom table by copying schema (or using a BYOS JSON schema) and selecting the right plan (Analytics, Auxiliary/Data Lake, or Basic).
What we achieved
- Defined a simple record shape for cveBuster, then generated fake data to simulate vulnerabilities and assets.
- Created a custom Data Lake/Auxiliary table to land the data, using the table creator script to mirror schema and set retention.
- Implemented two ingestion flows:
- App (client credentials) flow (
sendData.py) to push batches to the DCR stream. - User (Azure CLI) flow (
sendDataCliCred.py) to avoid storing secrets during development.
- App (client credentials) flow (
- Normalized a few fields on ingest to keep the schema consistent (for example, coercing CVSS to string for certain streams; normalizing EicarVM IP used in test scenarios).
Data model (expected fields)
- MachineName, HostId, IPAddress, OSFamily
- Application, AppFilePath
- VulnId, VulnTitle, Severity, CVSS
- ExploitAvailable, ExploitedInWild, PatchAvailable
- FirstSeen, LastSeen, LastScanTime
- AssetCriticality, BusinessOwner, Source
Notes and tips
- Table creation:
tableCreator.ps1can copy an existing table’s schema or consume a JSON schema via-SchemaFile. It supports plans: Analytics, Auxiliary/Data Lake, and Basic, plus retention (-retention,-totalRetention). For Auxiliary/Data Lake tables that don’t supportdynamic, use-ConvertToStringto emit*_strstring columns and include thetransformKqlsnippet in your DCR. - Ingestion: Both send scripts POST JSON arrays to your DCR stream in batches. Set these values per environment: DCE_INGEST_URL, DCR_IMMUTABLE_ID, STREAM_NAME, INPUT_PATH, BATCH.
sendData.pyuses MSAL client credentials; never commit secrets. Prefer environment variables or a secret store.sendDataCliCred.pyusesAzureCliCredentialand requests a token for the audiencehttps://monitor.azure.com/.default(no “ing”). This is convenient in dev where you’re already signed in with Azure CLI.
File
tableCreation/cveBusterQueries.kql
What we achieved
- Verified table shape and row volume (total rows, distinct hosts/apps/CVEs) and built quick views of data freshness.
- Profiled severity, CVSS distribution, exploitability, and patchability.
- Produced a “prioritization” score per host combining CVSS, exploit flags, and simple asset weighting (e.g., IIS/Oracle boost) to rank top candidates for remediation.
- Provided “quick wins” by grouping patchable high-CVSS items per owner and host.
- Added targeted slices for a known test host/IP (EicarVM) to validate the pipeline end-to-end.
- Summarized “first seen” CVEs and owner-level dashboards to support reporting.
Highlights (query themes)
- Inventory and time series: counts, distinct dimensions, hourly bins
- Severity breakdown and CVSS histogram (bucketing by 1.0)
- Risk scoring roll-up by MachineName using RowScore = CVSS*10 + weights
- Quick wins: PatchAvailable == true and CVSS >= threshold grouped by BusinessOwner, MachineName
- Host spotlight: machine/IP focus including counts, max CVSS, latest timestamps, exploit flags
- CVE “first seen” and owner summaries
These queries served double duty: validating the ingestion/schema and becoming the basis for agent prompts and notebook exploration.
Files
Notebooks+Spark/exploreScorev2.ipynbNotebooks+Spark/exploreScorev2.job.yaml
What we achieved
- Exercised a Spark-backed notebook workflow to explore the same cveBuster dataset at scale. The notebook is configured to run via a Livy-backed session (evidence in cell outputs), suitable for testing Spark against Data Lake content.
- Built exploratory transformations and simple visuals (PNG outputs present) around the same scoring logic used in KQL, letting us compare approaches and validate business rules outside the KQL context.
- Proved we can schedule or operationalize the notebook via the job YAML for repeatable runs.
Typical steps in the notebook
- Session/bootstrap: initialize Spark session via Livy
- Load and prep: read vulnerability rows (e.g., from the exported lake table or a derived dataset)
- Feature/score: compute numeric CVSS, exploit/asset weights, and a per-row score; aggregate per machine
- Visualize: quick distributions and top-N plots to validate scoring behavior
Run notes
- The notebook includes multiple code cells with Livy metadata and image outputs; it’s meant for a Spark-enabled environment. Use your preferred Spark runtime and align paths/auth to your workspace.
Files
Agents/cveBusterServiceAgent.yamlandAgents/cveBusterServiceAgentV6.yamlAgents/cveBusterQuickWinsAgent.yamlAgents/cveBusterCriticalIntelAgent.yaml- Reference:
sentinelMCP/MCPToolAvailableforAgentBuilding.md
What we achieved
- Authored three Security Copilot agents that rely on Sentinel MCP tooling to query the data lake and produce actionable outputs:
- Service Agent (V3/V6 variants)
- Skills: Prioritize (Top-N hosts to remediate), QuickWins (by owner), MachineDetails (per-host/IP details). V6 also adds a “Dashboard” skill combining sections and includes optional SecurityAlert overlay.
- Tooling: Uses
list_sentinel_workspacesandquery_lake; preflight checks ensure tools are attached and data is present for the lookback window. - Output: Clear, tabular summaries with rationale columns (Score, MaxCvss, Vulns, exploitability, patchability, sample IP), with friendly guidance when no data is returned.
- Quick Wins Agent
- Focuses solely on patchable, high-CVSS findings grouped by owner/host, offering a concise target list for fast remediation.
- Critical Intel Agent
- Lists the most critical CVEs observed in your environment and, when available, enriches them via a Threat Intelligence briefing tool, adding short IOC/context notes per CVE.
- Service Agent (V3/V6 variants)
Design patterns we used
- Fail-fast preflight in instructions: verify tool bindings and minimal data presence; otherwise return a clear action message.
- Let MCP compose KQL when possible by describing the required filters/aggregations in instructions; include concrete KQL only when helpful.
- Keep outputs predictable: small set of labeled columns and ordered results fit well for downstream use and human scanning.
Deploying/testing
- Use the Security Copilot portal to create and deploy agents using the provided YAML definitions. Ensure the “Descriptor: Name” matches exactly when deploying, and attach the required Sentinel MCP tools (at minimum:
list_sentinel_workspacesandquery_lake; add ThreatIntelligenceBriefing for the Intel agent). - See
sentinelMCP/MCPToolAvailableforAgentBuilding.mdfor tool workflow, including start/compose/deploy flows and evaluation handling.
- Generate synthetic vuln data → Ingest to custom table via DCR stream → Analyze with KQL
- Explore scoring and validate assumptions in Spark notebook → adjust logic if needed
- Encode the same logic into Security Copilot agents backed by Sentinel MCP tools → get repeatable, guided outputs for operators
- Agents
- cveBusterServiceAgent.yaml, cveBusterServiceAgentV6.yaml — Service agent variants (priorities, quick wins, details, dashboard)
- cveBusterQuickWinsAgent.yaml — Standalone quick-wins agent
- cveBusterCriticalIntelAgent.yaml — Critical CVE + Threat Intel enrichment
- Notebooks+Spark
- exploreScorev2.ipynb — Spark notebook for scoring exploration and visuals
- exploreScorev2.job.yaml — Example job config to schedule the notebook
- sentinelMCP
- MCPToolAvailableforAgentBuilding.md — Reference notes for Security Copilot MCP agent flows/tools
- tableCreation
- cvebuster_v4.json — Synthetic data used for ingestion
- sendData.py — Ingest via client credentials (use secrets securely)
- sendDataCliCred.py — Ingest via Azure CLI credential (no stored secrets; dev-friendly)
- cveBusterQueries.kql — KQL library used throughout the demo
- tableCreator.ps1 — Create/clone tables and adjust plan/retention; supports BYOS schema
- Do not commit secrets. The sample
sendData.pyshows client credential flow for illustration only; replace with environment variables or a secret store. - Validate DCR stream schemas: some streams treat CVSS as string. The scripts include options to coerce CVSS to string to match the stream’s expected type.
- Auxiliary/Data Lake tables do not support
dynamiccolumns; either avoid them or use-ConvertToStringand include thetransformKqlin your DCR. - For agent runs, bind the Sentinel MCP tools and test the smallest lookback window first to catch ingestion issues early.
— If you want, we can add a small end-to-end “Try it” guide next that wires up one environment’s DCR/stream, runs an ingestion batch on Windows PowerShell, executes the KQLs, and shows the agent outputs in Security Copilot.