Welcome back, Maya
Acme Platform · 42 collectors across 4 environments
Remote write rejecting 2.3% of metric samples
k8s-daemon pipelines to prometheusrw are dropping samples for out-of-order timestamps. Likely caused by clock drift on 3 nodes.
✦ AI insight·spans 7 collectors·6m ago
memory_limiter trips daily at 03:14 UTC on ingest-gateway
Spikes correlate with nightly batch jobs from prod-east. Bumping limit_mib to 6144 should stop drops without changing the rest of the pipeline.
✦ AI insight·otel-gateway-01.iad·22m ago
Tail-sampling decision_wait can drop from 10s → 6s
Current setting catches 94% of error spans. Dropping wait time would barely change recall and free ~1.2GB heap on the gateway.
✦ AI suggestion·otel-gateway-01.iad · p2·1h ago
k8s_events scoped to all 37 namespaces
Only 4 namespaces produced events in the last 7d. Limiting the receiver scope would cut log volume by ~80%.
✦ AI suggestion·k8s-daemon (12 collectors)·3h ago
Healthy
38
+2 vs yesterday
Degraded
3
queue pressure on 2
Offline
1
otel-gateway-02
Configurations
7
2 git-backed
Active rollouts
—
coming soon
via commit a91f3b2 · maya · 6m ago
rolling
APPLIED
33
PENDING
8
FAILED
0
Managed configurations7 / 25
GitHub repos2 / 3
Team members6 / 10
| When | Actor | Action | Target | |
|---|---|---|---|---|
| 6m ago | maya | started rollout | prod-ingress@v14 | success |
| 14m ago | — | collector offline | otel-gateway-02 | err |
| 1h ago | jordan | edited config (UI) | staging-edge | v8 → v9 |
| 2h ago | github | commit synced | k8s-daemon@a91f3b2 | applied |
| 5h ago | sam | invited member | [email protected] | pending |