Active connections, throughput, and queue depth exposed to observability layer
Why it matters
Without exported metrics, you cannot answer the most basic operational questions: how many connections are active right now, is message throughput degrading, is the outbound queue backing up? Invisible failures — a Redis adapter that silently stops publishing, a connection leak growing at 10 sockets per hour — go undetected until they cause an outage. ISO 25010 reliability includes the ability to monitor system health; a real-time service with no observability is operationally blind.
Severity rationale
Low because missing metrics delay detection of connection leaks and queue backlogs, turning slow-burn failures into surprise outages rather than paged alerts.
Remediation
Export at minimum three gauges — active connections, messages processed, and queue depth — to a Prometheus-compatible endpoint or your existing observability backend.
import promClient from 'prom-client';
const activeConns = new promClient.Gauge({
name: 'ws_active_connections',
help: 'Live WebSocket connections',
});
const msgTotal = new promClient.Counter({
name: 'ws_messages_total',
help: 'Messages processed since startup',
});
io.on('connection', (socket) => {
activeConns.inc();
socket.on('disconnect', () => activeConns.dec());
socket.on('send_message', () => msgTotal.inc());
});
app.get('/metrics', async (_req, res) => {
res.set('Content-Type', promClient.register.contentType);
res.end(await promClient.register.metrics());
});
Scrape /metrics from Prometheus or forward to Datadog/CloudWatch. Add an alert on ws_active_connections exceeding your tested ceiling.
Detection
-
ID:
observability-metrics -
Severity:
low -
What to look for: Count all metrics and observability integrations. Enumerate the metric types tracked: active connection count, message throughput, queue depth, error rates, latency percentiles. Count the observability backends: Prometheus, Datadog, CloudWatch, or equivalent.
-
Pass criteria: The system exposes at least 3 real-time metrics to an observability backend (Prometheus, Datadog, CloudWatch, etc.) including at minimum active connection count.
-
Fail criteria: No metrics are exposed, or metrics are only logged without being sent to an observability system.
-
Skip (N/A) when: Never — observability is essential for production systems.
-
Cross-reference: For broader monitoring patterns and error tracking, the SaaS Error Handling Audit covers observability infrastructure.
-
Detail on fail:
"No metrics infrastructure. Unable to monitor connection count, throughput, or queue health." -
Remediation: Export real-time metrics to an observability system:
import promClient from 'prom-client'; const activeConnections = new promClient.Gauge({ name: 'websocket_active_connections', help: 'Number of active WebSocket connections', }); const messagesThroughput = new promClient.Counter({ name: 'websocket_messages_total', help: 'Total number of messages processed', }); io.on('connection', (socket) => { activeConnections.inc(); socket.on('disconnect', () => activeConnections.dec()); }); socket.on('message', () => { messagesThroughput.inc(); }); // Expose Prometheus endpoint app.get('/metrics', (req, res) => { res.set('Content-Type', promClient.register.contentType); res.end(promClient.register.metrics()); });
External references
- iso-25010:2011 · reliability.maturity — Maturity — real-time service metrics exposed for operational monitoring
Taxons
History
- 2026-04-18·v1.0.0·Initial import from community-realtime·automated