Feature Request: Cloud Monitoring & Logging Skill
Summary
Production workloads on Google Cloud require robust observability — yet no existing skill covers Cloud Monitoring (metrics, alerts, SLOs, dashboards) or Cloud Logging (log routing, query syntax, cost optimization). This leaves a critical Day-2 operations gap for agents assisting with GCP deployments.
The Gap
Today, if a user asks an agent:
- 'How do I set up alerts for my Cloud Run service?'
- 'Why did my GKE pod crash? Where are the logs?'
- 'How do I reduce my Cloud Logging bill?'
The agent must fall back to generic knowledge or unrelated skills. There is no single skill that documents:
- Creating Alerting Policies via gcloud, Terraform, or the Console
- Writing PromQL / MQL queries for Cloud Monitoring
- Exporting logs to BigQuery or Cloud Storage for long-term retention
- Configuring log-based metrics for custom business KPIs
- Using Error Reporting to aggregate and track exceptions across services
- Distributed tracing with Cloud Trace (OpenTelemetry integration)
Proposed Skill
A cloud-observability-basics skill that agents load when users mention: monitoring, alerting, logs, metrics, SLO, error reporting, tracing, Cloud Monitoring, Cloud Logging, or observability.
Suggested SKILL.md frontmatter
---
name: cloud-observability-basics
description: >
Use when the user asks about monitoring, logging, alerting, tracing, or observability
for Google Cloud services. Covers Cloud Monitoring (metrics, dashboards, alerting policies,
SLOs), Cloud Logging (log routing, log-based metrics, excluded logs), Cloud Trace
(distributed tracing, OpenTelemetry), and Error Reporting. WHEN: set up alert, create
dashboard, view logs, reduce logging cost, trace request, debug crash, observability.
compatibility: Requires monitoring.metricWriter, logging.logWriter, and cloudtrace.agent IAM roles.
---
Key reference topics
- Cloud Monitoring
- Alerting policies: metric thresholds, uptime checks, log-based alerts
- Dashboards: JSON model, MQL vs PromQL
- SLOs: defining SLI, error budget, burn rate alerts
- Custom metrics: OpenCensus / OpenTelemetry instrumentation
- Cloud Logging
- Log Explorer query syntax (LogQL), regex, JSON subfield extraction
- Log buckets, log views, and IAM
- Log sinks: BigQuery, Cloud Storage, Pub/Sub export
- Exclusion filters to control ingestion cost
- Log-based metrics (counter & distribution)
- Cloud Trace
- OpenTelemetry auto-instrumentation for Cloud Run, GKE, App Engine
- Trace span analysis and latency debugging
- Error Reporting
- Automatic exception grouping from Cloud Functions, Cloud Run, GKE
- Notifications via email / Pub/Sub / Slack
- Cost Optimization
- Logging ingestion pricing tiers
- When to use
_Default vs custom log buckets
- Scheduled queries vs streaming exports
Why Now?
- All existing compute skills (GKE, Cloud Run, Cloud Functions) stop at 'deploy successfully' with no Day-2 operational guidance.
- Cloud Billing and cost optimization are recurring user concerns; logging is often the surprise cost driver.
- Google is strongly pushing OpenTelemetry as the unified observability standard; a skill should bridge GCP-native tools with OTel.
Reference Implementation
Google Cloud official docs:
Happy to contribute a SKILL.md draft if this direction is accepted.
Feature Request: Cloud Monitoring & Logging Skill
Summary
Production workloads on Google Cloud require robust observability — yet no existing skill covers Cloud Monitoring (metrics, alerts, SLOs, dashboards) or Cloud Logging (log routing, query syntax, cost optimization). This leaves a critical Day-2 operations gap for agents assisting with GCP deployments.
The Gap
Today, if a user asks an agent:
The agent must fall back to generic knowledge or unrelated skills. There is no single skill that documents:
Proposed Skill
A
cloud-observability-basicsskill that agents load when users mention: monitoring, alerting, logs, metrics, SLO, error reporting, tracing, Cloud Monitoring, Cloud Logging, or observability.Suggested SKILL.md frontmatter
Key reference topics
_Defaultvs custom log bucketsWhy Now?
Reference Implementation
Google Cloud official docs:
Happy to contribute a SKILL.md draft if this direction is accepted.