Active connections, throughput, and queue depth exposed to observability layer

ab-000754 · community-realtime.realtime-ux.observability-metrics

Severity: lowactive

Why it matters

Without exported metrics, you cannot answer the most basic operational questions: how many connections are active right now, is message throughput degrading, is the outbound queue backing up? Invisible failures — a Redis adapter that silently stops publishing, a connection leak growing at 10 sockets per hour — go undetected until they cause an outage. ISO 25010 reliability includes the ability to monitor system health; a real-time service with no observability is operationally blind.

Severity rationale

Low because missing metrics delay detection of connection leaks and queue backlogs, turning slow-burn failures into surprise outages rather than paged alerts.

Remediation

Export at minimum three gauges — active connections, messages processed, and queue depth — to a Prometheus-compatible endpoint or your existing observability backend.

import promClient from 'prom-client';

const activeConns = new promClient.Gauge({
  name: 'ws_active_connections',
  help: 'Live WebSocket connections',
});
const msgTotal = new promClient.Counter({
  name: 'ws_messages_total',
  help: 'Messages processed since startup',
});

io.on('connection', (socket) => {
  activeConns.inc();
  socket.on('disconnect', () => activeConns.dec());
  socket.on('send_message', () => msgTotal.inc());
});

app.get('/metrics', async (_req, res) => {
  res.set('Content-Type', promClient.register.contentType);
  res.end(await promClient.register.metrics());
});

Scrape /metrics from Prometheus or forward to Datadog/CloudWatch. Add an alert on ws_active_connections exceeding your tested ceiling.

Detection

ID: observability-metrics
Severity: low
What to look for: Count all metrics and observability integrations. Enumerate the metric types tracked: active connection count, message throughput, queue depth, error rates, latency percentiles. Count the observability backends: Prometheus, Datadog, CloudWatch, or equivalent.
Pass criteria: The system exposes at least 3 real-time metrics to an observability backend (Prometheus, Datadog, CloudWatch, etc.) including at minimum active connection count.
Fail criteria: No metrics are exposed, or metrics are only logged without being sent to an observability system.
Skip (N/A) when: Never — observability is essential for production systems.
Cross-reference: For broader monitoring patterns and error tracking, the SaaS Error Handling Audit covers observability infrastructure.
Detail on fail: "No metrics infrastructure. Unable to monitor connection count, throughput, or queue health."

Remediation: Export real-time metrics to an observability system:

import promClient from 'prom-client';

const activeConnections = new promClient.Gauge({
  name: 'websocket_active_connections',
  help: 'Number of active WebSocket connections',
});

const messagesThroughput = new promClient.Counter({
  name: 'websocket_messages_total',
  help: 'Total number of messages processed',
});

io.on('connection', (socket) => {
  activeConnections.inc();
  socket.on('disconnect', () => activeConnections.dec());
});

socket.on('message', () => {
  messagesThroughput.inc();
});

// Expose Prometheus endpoint
app.get('/metrics', (req, res) => {
  res.set('Content-Type', promClient.register.contentType);
  res.end(promClient.register.metrics());
});

External references

iso-25010:2011 · reliability.maturity — Maturity — real-time service metrics exposed for operational monitoring

Taxons

observability

History

2026-04-18·v1.0.0·Initial import from community-realtime·automated