v1.1.0Pro20 checks

Every Real-Time Communication Audit check

All 20 checks with why-it-matters prose, severity, and cross-references to related audits.

3 critical6 high6 medium5 low

Connection Management

5 checks

WebSocket connections require authentication before streaming

critical

An unauthenticated WebSocket endpoint is an open data tap. Any client — including malicious actors scanning for open sockets — can connect and receive the full message stream without possessing a valid session. OWASP A07 (Identification & Authentication Failures) and CWE-287 both flag authentication bypasses as severe because they eliminate every downstream access control you've built. In a community platform this means private channel conversations, direct messages, and presence events are exposed to anonymous consumers the moment the server boots.

Why this severity: Critical because any unauthenticated client receives the full real-time message stream, bypassing every downstream access control.

community-realtime.connection-management.websocket-auth-requiredSee full pattern

Per-user connection limits enforced to prevent slot exhaustion

high

Without a per-user connection ceiling, a single authenticated user — or a compromised account — can open thousands of simultaneous WebSocket connections. Each consumes a file descriptor, memory, and CPU for keepalive processing. At scale this constitutes a low-cost denial-of-service attack that degrades or crashes the server for every other user. CWE-770 (Allocation of Resources Without Limits) captures exactly this failure mode, and ISO 25010 performance-efficiency requires that resource consumption be bounded under adversarial load.

Why this severity: High because an unbounded connection count per user can exhaust server file descriptors and memory, causing service degradation for all users.

community-realtime.connection-management.per-user-connection-limitsSee full pattern

Heartbeat or ping on idle connections; zombie connections closed after configurable timeout

high

TCP connections broken by NAT timeouts, mobile network switches, or abrupt process kills do not send a FIN packet. Without a heartbeat, the server holds those socket objects open indefinitely, consuming memory and file descriptors for clients that will never respond again. Enough zombie connections degrade throughput for active users and can prevent new connections from being accepted. ISO 25010 reliability requires that the system detect and recover from failed connections rather than accumulate them silently.

Why this severity: High because zombie connections from abruptly-closed clients accumulate over time, degrading server throughput and exhausting connection capacity.

community-realtime.connection-management.heartbeat-timeoutSee full pattern

Exponential backoff with jitter on client reconnection, not fixed retry interval

high

When a server restarts or drops a connection, every client attempts to reconnect. If all clients retry at the same fixed interval — say, 5 seconds — they form a synchronized stampede that hits the server simultaneously on each retry cycle. This thundering-herd effect can prevent the server from recovering under its own load. Jitter randomizes reconnect timing so the server receives a spread of reconnect requests it can absorb incrementally. ISO 25010 reliability includes resistance to self-inflicted load spikes of exactly this type.

Why this severity: High because synchronized fixed-interval retries across many clients produce a thundering-herd stampede that can prevent server recovery after an outage.

community-realtime.connection-management.exponential-backoff-jitterSee full pattern

Channel authorization re-validated on subscribe, not only at initial handshake

critical

A handshake-only authorization model is susceptible to time-of-check/time-of-use attacks: a user whose channel access is revoked after connecting can continue subscribing to restricted channels until the connection drops. CWE-285 (Improper Authorization) and OWASP A01 (Broken Access Control) both describe this failure — authorization decisions must be re-evaluated at the point of access, not cached from an earlier check. In a community app, this means a banned user or a user whose subscription lapses can still join private channels by issuing subscribe events on an existing connection.

Why this severity: Critical because a user whose access was revoked post-handshake can subscribe to restricted channels until the TCP connection drops, bypassing OWASP A01 access controls.

community-realtime.connection-management.channel-auth-revalidationSee full pattern

Message Delivery & Ordering

5 checks

Each message carries a monotonic sequence number enabling gap detection

high

Without monotonic sequence numbers, clients have no way to detect that messages were dropped during a network disruption. A user rejoining after a brief outage will see the conversation with silent gaps — no indication that messages are missing. CWE-354 (Improper Validation of Integrity Check Value) covers this class of missing-integrity-signal failure. In a community app, undetected gaps corrupt conversation context, cause replies to appear to reference messages that were never received, and erode user trust in the platform's reliability.

Why this severity: High because clients cannot detect dropped messages during network outages, silently corrupting conversation state and breaking reply threading.

community-realtime.message-delivery.monotonic-sequence-numbersSee full pattern

Duplicate messages deduplicated via client-generated ID before rendering

high

Real-time transports retry delivery on network errors. Without deduplication, a message sent before a disconnect and resent after reconnect renders twice in the UI — the user sees duplicate chat lines, duplicate notifications, and potentially double-processed side effects. CWE-694 (Use of Multiple Resources with Duplicate Identifier) captures this integrity failure. The fix is straightforward but frequently omitted in AI-generated WebSocket code that copies the happy path without handling retry semantics.

Why this severity: High because network retries cause duplicate message renders and side effects without client-side deduplication, directly corrupting the conversation view.

community-realtime.message-delivery.client-deduplicationSee full pattern

Pub/sub layer such as Redis or NATS ensures cross-instance message delivery

high

In-process broadcast with `io.to(channel).emit()` only reaches clients connected to the same server instance. The moment you deploy two instances — behind a load balancer or via auto-scaling — users on different instances stop receiving each other's messages. This is a latent failure: the app works perfectly in development and single-instance staging, then silently breaks in production the first time the process count exceeds one. ISO 25010 reliability requires the system to function correctly under expected deployment configurations, including horizontal scaling.

Why this severity: High because in-process-only broadcast silently partitions users across server instances, causing message loss that is invisible in single-instance testing.

community-realtime.message-delivery.pubsub-cross-instanceSee full pattern

Messages queued server-side during disconnect and delivered in order on reconnection

medium

A user who loses connectivity for 30 seconds and reconnects finds the conversation with a gap — every message sent during their outage is gone with no indication it was missed. This is especially damaging in moderation workflows, incident channels, or any context where missing a message has operational consequences. ISO 25010 reliability specifically covers data persistence across failure events; dropping messages on disconnect is a reliability failure regardless of how brief the outage.

Why this severity: Medium because messages sent to disconnected users are permanently lost without a server-side queue, creating silent data gaps on reconnect.

community-realtime.message-delivery.offline-queue-on-reconnectSee full pattern

Max message payload size enforced server-side; oversized frames rejected with error

medium

Without a server-side payload cap, any client can send multi-megabyte WebSocket frames. Processing an oversized frame allocates that memory per connection, and if multiple clients do this simultaneously, the server runs out of heap before the OS-level connection limit is reached. CWE-770 (Allocation of Resources Without Limits) and CWE-400 (Uncontrolled Resource Consumption) both apply. This is a trivial denial-of-service vector in community platforms where registration is open and any user can author a message.

Why this severity: Medium because oversized frames cause unbounded memory allocation per connection, enabling a low-effort denial-of-service by any authenticated user.

community-realtime.message-delivery.max-payload-enforcedSee full pattern

Presence & State Sync

5 checks

Presence state computed from server connection state, not forgeable client heartbeats

critical

Presence derived from client-controlled heartbeats can be forged: a disconnected or banned user can keep pinging to appear online, and a connected user can simply stop sending heartbeats to appear offline. CWE-287 (Improper Authentication) and OWASP A07 identify any system that trusts client-reported state for security decisions as broken. In a community platform, forged presence undermines moderation dashboards, violates user expectations about who can see them, and breaks features like "last seen" timestamps that users rely on to make trust decisions.

Why this severity: Critical because client-controlled presence signals can be forged indefinitely, allowing banned users to appear online and violating OWASP A07 authentication integrity.

community-realtime.presence.server-computed-presenceSee full pattern

Typing indicators debounced at client and expire server-side on timeout

medium

A typing indicator fired on every keystroke with no debounce generates roughly one WebSocket event per 100ms per active user. For a community with 100 simultaneous typists, that is 1,000 events per second — before any real messages are sent. Without server-side expiry, users who close their browser mid-compose are shown as typing indefinitely. Both failure modes inflate bandwidth, CPU cost, and connection noise for zero user-visible benefit, violating ISO 25010 performance-efficiency requirements.

Why this severity: Medium because undebounced typing events generate excessive message volume that inflates server load and costs in proportion to the active user count.

community-realtime.presence.typing-indicator-debounceSee full pattern

Read receipts stored as explicit user acknowledgment, not inferred from delivery

medium

Inferring read status from message delivery lies to users about what their conversation partners have actually seen. A message marked read the instant it lands on a device claims acknowledgment the recipient never gave, which breaks trust in the product, corrupts engagement analytics built on receipt events, and creates evidentiary problems in regulated workflows (harassment reports, support SLAs, legal discovery) where read receipts are treated as proof the user saw the content. Data-integrity failures here cascade: downstream features like unread counts, notification suppression, and retention funnels all key off a signal that does not represent reality.

Why this severity: Medium because the integrity breach misleads users and distorts analytics but does not by itself expose credentials or enable account takeover.

community-realtime.presence.read-receipts-explicitSee full pattern

Presence payloads contain only state and user ID; no IP, device, or session data leaked

medium

Presence payloads broadcast to every connected client in a channel. Embedding IP addresses, user-agent strings, device names, or session tokens in those payloads exposes that data to all channel members — not just the server. This violates GDPR Article 5(1)(c) (data minimisation) and CWE-359 (Exposure of Private Personal Information to an Unauthorized Actor). A chat participant should not be able to infer another user's device, IP range, or session identifier simply by opening the developer console.

Why this severity: Medium because presence payloads are broadcast to all channel members, and embedding IP or device fields exposes PII to every connected peer in violation of GDPR Art. 5(1)(c).

community-realtime.presence.presence-data-privacySee full pattern

Real-time fan-out uses channel membership cache, not per-message permission reads

medium

Per-message database queries to determine channel membership collapse throughput to whatever the database can handle divided by messages per second. At modest scale — 50 messages per second across 10 channels — that is 50 synchronous permission queries in the hot path, turning a real-time feature into a slow polling system. ISO 25010 performance-efficiency and scalability require that read-hot paths use caching rather than per-operation round trips to persistent storage.

Why this severity: Medium because per-message permission queries add a synchronous database round-trip for every broadcast, collapsing throughput at moderate message rates.

community-realtime.presence.fanout-cache-not-per-messageSee full pattern

Real-Time UX & Resilience

5 checks

Connection state exposed to UI as connected, connecting, or disconnected indicator

low

When the realtime socket drops and the UI keeps rendering as if everything is fine, users type messages into a dead channel, assume their posts sent, and discover hours later that nothing reached the server. That silent failure mode destroys trust in the product, generates duplicate support tickets, and corrupts conversation ordering once reconnection flushes a backlog of stale sends. A visible connection indicator is the primary user-experience affordance that distinguishes a working realtime feature from one the user has to guess at, and its absence is the top cause of reported bugs in chat and collaboration tools.

Why this severity: Low because the defect is a UX gap with no security impact, though it directly drives support load and silent message loss.

community-realtime.realtime-ux.connection-state-uiSee full pattern

Messages maintain causal order within a single thread; reply not emitted before parent

low

A reply delivered before its parent message renders as an orphaned response with no visible context. Users see `Re: [message not found]` or a floating reply with no thread anchor, which destroys comprehension and trust in the threading system. CWE-362 (Concurrent Execution Using Shared Resource with Improper Synchronization) applies when two related writes (parent then child) can arrive out of order due to race conditions in asynchronous handlers. Threading is a key value-add for community platforms; broken ordering negates that value.

Why this severity: Low because parent-before-reply ordering failures corrupt thread readability, but they surface only under concurrent send conditions rather than on every message.

community-realtime.realtime-ux.causal-orderingSee full pattern

Client queues outbound messages while offline and re-sends on reconnect

low

A user who types a message while their phone switches from WiFi to cellular loses the message silently — the send fires against a disconnected socket and the error is swallowed. No error message, no retry prompt, no indication the message was lost. This erodes trust in the platform's basic promise of delivery. ISO 25010 reliability covers graceful degradation under transient connectivity loss, and offline-aware apps are now a baseline expectation for communication tools.

Why this severity: Low because silently dropped outbound messages occur only during transient disconnects, but they produce irreversible data loss with no user-visible signal.

community-realtime.realtime-ux.offline-send-queueSee full pattern

Horizontal scaling confirmed: real-time state not held solely in process memory

low

Storing presence, channel subscriptions, or message history exclusively in process memory means that restarting a single server instance silently resets that state for all users connected to it. In production, auto-scaling and rolling deploys trigger instance restarts routinely — users are dropped from channels, presence shows them offline, and in-flight message queues are lost. ISO 25010 reliability requires that state survival is not contingent on any individual process's uptime.

Why this severity: Low because in-process-only state loss occurs on restart or scale events — a failure mode that is invisible in development but routine in production deployments.

community-realtime.realtime-ux.horizontal-scaling-readySee full pattern

Active connections, throughput, and queue depth exposed to observability layer

low

Without exported metrics, you cannot answer the most basic operational questions: how many connections are active right now, is message throughput degrading, is the outbound queue backing up? Invisible failures — a Redis adapter that silently stops publishing, a connection leak growing at 10 sockets per hour — go undetected until they cause an outage. ISO 25010 reliability includes the ability to monitor system health; a real-time service with no observability is operationally blind.

Why this severity: Low because missing metrics delay detection of connection leaks and queue backlogs, turning slow-burn failures into surprise outages rather than paged alerts.

community-realtime.realtime-ux.observability-metricsSee full pattern

Ready to scan your project?

Run this audit in your AI coding tool (Claude Code, Cursor, Bolt, etc.) and submit results here for scoring and benchmarks.

Open Real-Time Communication Audit