Documentation

From zero to a working panel in under 10 minutes.

A single-page quickstart. Connect a metrics source, write a chartlessops.yml, push to your repo. The panel updates as soon as the YAML lands.

Your first panel

Sign up, give us a metrics source (we’ll need a read-only API token or Prometheus endpoint). The first signal lights up about 90 seconds later.

From there you write your chartlessops.yml — one entry per service — and commit it to your infra repo. We watch the file via a GitHub / GitLab webhook and apply changes within a minute.

chartlessops.yml

The whole panel is configured by one YAML file. Here’s a minimal version:

# chartlessops.yml
workspace: acme
sources:
  - name: prod-prometheus
    type: prometheus
    url: https://prom.acme.dev
    auth: ${PROM_TOKEN}

services:
  - name: api-gateway
    signal:
      type: p99_latency_ms
      query: histogram_quantile(0.99, rate(http_duration_seconds_bucket{svc="api"}[5m]))
    slo:
      target: 99.95%
      window: 30d
    alert:
      routes:
        - pagerduty:p-acme-platform
        - slack:#oncall-platform

The signal becomes one row on the panel. The SLO budget shows under the row. The alert routes fire when the signal crosses a threshold or the SLO budget burns past a configured rate.

Defining a signal

A signal is the one thing that answers “is this service OK?” for this service. Built-in signal types:

p50_latency_ms / p95_latency_ms / p99_latency_ms — latency percentiles
error_rate — ratio of bad responses
availability — ratio of good responses
queue_depth — backlog size
throughput — rate / second
custom — you provide a query and a threshold

One signal per service. Always. If you want a second, that’s a second service.

SLO budgets

SLO budgets are calculated rolling over the configured window (typically 30 days). When the budget is being burned faster than the window allows, the row shows “burning fast” with an estimated time to budget exhaustion.

slo:
  target: 99.95%           # availability target
  window: 30d              # rolling window
  burn_alert_threshold: 14d # alert if budget will exhaust in <14d at current rate

Prometheus

Provide a base URL + an optional bearer token. We push GET /api/v1/query on your interval. Multi-cluster federation supported via a list of URLs that get queried in parallel and rolled up.

Datadog

API + APP key from the Datadog UI. We query the metrics API. Compatible with their SLO definitions — if you have one, you can reference it directly with datadog_slo:<id>.

CloudWatch

IAM role with cloudwatch:GetMetricData. We assume the role using STS. Multi-account: provide one IAM role per account, namespace-prefix services to disambiguate.

OpenTelemetry

OTLP push or pull. Push: point your OTel collector at our ingest endpoint. Pull: we periodically poll a Prometheus-compatible endpoint exposed by your collector.

Alert routing

Routes are listed in YAML per service. A signal crossing a threshold fires every listed route. Recovery fires the same routes with the resolved state.

alert:
  on_amber:
    - slack:#oncall-platform
  on_red:
    - pagerduty:p-acme-platform
    - slack:#oncall-platform
    - slack:#leadership-incidents
  on_slo_budget_burn:
    - email:platform-leads@acme.dev

Multi-region rollups

If a service runs in multiple regions, the panel shows one row with the worst region driving the status. Click into the row to see per-region detail.

On-prem deploy

Enterprise plans can run ChartlessOps in your own VPC. Containerised, deployed via Helm or Docker Compose, configured by environment variables + the same chartlessops.yml. Data never leaves your network.

Getting help

Email support@chartlessops.com. For incident-time issues, incident@chartlessops.com routes to oncall.

▸

Ready to wire it up?

Start the 14-day trial — no credit card. Create a workspace →