All 20 checks with why-it-matters prose, severity, and cross-references to related audits.
An unauthenticated WebSocket endpoint is an open data tap. Any client — including malicious actors scanning for open sockets — can connect and receive the full message stream without possessing a valid session. OWASP A07 (Identification & Authentication Failures) and CWE-287 both flag authentication bypasses as severe because they eliminate every downstream access control you've built. In a community platform this means private channel conversations, direct messages, and presence events are exposed to anonymous consumers the moment the server boots.
Why this severity: Critical because any unauthenticated client receives the full real-time message stream, bypassing every downstream access control.
community-realtime.connection-management.websocket-auth-requiredSee full patternWithout a per-user connection ceiling, a single authenticated user — or a compromised account — can open thousands of simultaneous WebSocket connections. Each consumes a file descriptor, memory, and CPU for keepalive processing. At scale this constitutes a low-cost denial-of-service attack that degrades or crashes the server for every other user. CWE-770 (Allocation of Resources Without Limits) captures exactly this failure mode, and ISO 25010 performance-efficiency requires that resource consumption be bounded under adversarial load.
Why this severity: High because an unbounded connection count per user can exhaust server file descriptors and memory, causing service degradation for all users.
community-realtime.connection-management.per-user-connection-limitsSee full patternTCP connections broken by NAT timeouts, mobile network switches, or abrupt process kills do not send a FIN packet. Without a heartbeat, the server holds those socket objects open indefinitely, consuming memory and file descriptors for clients that will never respond again. Enough zombie connections degrade throughput for active users and can prevent new connections from being accepted. ISO 25010 reliability requires that the system detect and recover from failed connections rather than accumulate them silently.
Why this severity: High because zombie connections from abruptly-closed clients accumulate over time, degrading server throughput and exhausting connection capacity.
community-realtime.connection-management.heartbeat-timeoutSee full patternWhen a server restarts or drops a connection, every client attempts to reconnect. If all clients retry at the same fixed interval — say, 5 seconds — they form a synchronized stampede that hits the server simultaneously on each retry cycle. This thundering-herd effect can prevent the server from recovering under its own load. Jitter randomizes reconnect timing so the server receives a spread of reconnect requests it can absorb incrementally. ISO 25010 reliability includes resistance to self-inflicted load spikes of exactly this type.
Why this severity: High because synchronized fixed-interval retries across many clients produce a thundering-herd stampede that can prevent server recovery after an outage.
community-realtime.connection-management.exponential-backoff-jitterSee full patternA handshake-only authorization model is susceptible to time-of-check/time-of-use attacks: a user whose channel access is revoked after connecting can continue subscribing to restricted channels until the connection drops. CWE-285 (Improper Authorization) and OWASP A01 (Broken Access Control) both describe this failure — authorization decisions must be re-evaluated at the point of access, not cached from an earlier check. In a community app, this means a banned user or a user whose subscription lapses can still join private channels by issuing subscribe events on an existing connection.
Why this severity: Critical because a user whose access was revoked post-handshake can subscribe to restricted channels until the TCP connection drops, bypassing OWASP A01 access controls.
community-realtime.connection-management.channel-auth-revalidationSee full patternWithout monotonic sequence numbers, clients have no way to detect that messages were dropped during a network disruption. A user rejoining after a brief outage will see the conversation with silent gaps — no indication that messages are missing. CWE-354 (Improper Validation of Integrity Check Value) covers this class of missing-integrity-signal failure. In a community app, undetected gaps corrupt conversation context, cause replies to appear to reference messages that were never received, and erode user trust in the platform's reliability.
Why this severity: High because clients cannot detect dropped messages during network outages, silently corrupting conversation state and breaking reply threading.
community-realtime.message-delivery.monotonic-sequence-numbersSee full patternReal-time transports retry delivery on network errors. Without deduplication, a message sent before a disconnect and resent after reconnect renders twice in the UI — the user sees duplicate chat lines, duplicate notifications, and potentially double-processed side effects. CWE-694 (Use of Multiple Resources with Duplicate Identifier) captures this integrity failure. The fix is straightforward but frequently omitted in AI-generated WebSocket code that copies the happy path without handling retry semantics.
Why this severity: High because network retries cause duplicate message renders and side effects without client-side deduplication, directly corrupting the conversation view.
community-realtime.message-delivery.client-deduplicationSee full patternIn-process broadcast with `io.to(channel).emit()` only reaches clients connected to the same server instance. The moment you deploy two instances — behind a load balancer or via auto-scaling — users on different instances stop receiving each other's messages. This is a latent failure: the app works perfectly in development and single-instance staging, then silently breaks in production the first time the process count exceeds one. ISO 25010 reliability requires the system to function correctly under expected deployment configurations, including horizontal scaling.
Why this severity: High because in-process-only broadcast silently partitions users across server instances, causing message loss that is invisible in single-instance testing.
community-realtime.message-delivery.pubsub-cross-instanceSee full patternA user who loses connectivity for 30 seconds and reconnects finds the conversation with a gap — every message sent during their outage is gone with no indication it was missed. This is especially damaging in moderation workflows, incident channels, or any context where missing a message has operational consequences. ISO 25010 reliability specifically covers data persistence across failure events; dropping messages on disconnect is a reliability failure regardless of how brief the outage.
Why this severity: Medium because messages sent to disconnected users are permanently lost without a server-side queue, creating silent data gaps on reconnect.
community-realtime.message-delivery.offline-queue-on-reconnectSee full patternWithout a server-side payload cap, any client can send multi-megabyte WebSocket frames. Processing an oversized frame allocates that memory per connection, and if multiple clients do this simultaneously, the server runs out of heap before the OS-level connection limit is reached. CWE-770 (Allocation of Resources Without Limits) and CWE-400 (Uncontrolled Resource Consumption) both apply. This is a trivial denial-of-service vector in community platforms where registration is open and any user can author a message.
Why this severity: Medium because oversized frames cause unbounded memory allocation per connection, enabling a low-effort denial-of-service by any authenticated user.
community-realtime.message-delivery.max-payload-enforcedSee full patternPresence derived from client-controlled heartbeats can be forged: a disconnected or banned user can keep pinging to appear online, and a connected user can simply stop sending heartbeats to appear offline. CWE-287 (Improper Authentication) and OWASP A07 identify any system that trusts client-reported state for security decisions as broken. In a community platform, forged presence undermines moderation dashboards, violates user expectations about who can see them, and breaks features like "last seen" timestamps that users rely on to make trust decisions.
Why this severity: Critical because client-controlled presence signals can be forged indefinitely, allowing banned users to appear online and violating OWASP A07 authentication integrity.
community-realtime.presence.server-computed-presenceSee full patternA typing indicator fired on every keystroke with no debounce generates roughly one WebSocket event per 100ms per active user. For a community with 100 simultaneous typists, that is 1,000 events per second — before any real messages are sent. Without server-side expiry, users who close their browser mid-compose are shown as typing indefinitely. Both failure modes inflate bandwidth, CPU cost, and connection noise for zero user-visible benefit, violating ISO 25010 performance-efficiency requirements.
Why this severity: Medium because undebounced typing events generate excessive message volume that inflates server load and costs in proportion to the active user count.
community-realtime.presence.typing-indicator-debounceSee full patternInferring read status from message delivery lies to users about what their conversation partners have actually seen. A message marked read the instant it lands on a device claims acknowledgment the recipient never gave, which breaks trust in the product, corrupts engagement analytics built on receipt events, and creates evidentiary problems in regulated workflows (harassment reports, support SLAs, legal discovery) where read receipts are treated as proof the user saw the content. Data-integrity failures here cascade: downstream features like unread counts, notification suppression, and retention funnels all key off a signal that does not represent reality.
Why this severity: Medium because the integrity breach misleads users and distorts analytics but does not by itself expose credentials or enable account takeover.
community-realtime.presence.read-receipts-explicitSee full patternPresence payloads broadcast to every connected client in a channel. Embedding IP addresses, user-agent strings, device names, or session tokens in those payloads exposes that data to all channel members — not just the server. This violates GDPR Article 5(1)(c) (data minimisation) and CWE-359 (Exposure of Private Personal Information to an Unauthorized Actor). A chat participant should not be able to infer another user's device, IP range, or session identifier simply by opening the developer console.
Why this severity: Medium because presence payloads are broadcast to all channel members, and embedding IP or device fields exposes PII to every connected peer in violation of GDPR Art. 5(1)(c).
community-realtime.presence.presence-data-privacySee full patternPer-message database queries to determine channel membership collapse throughput to whatever the database can handle divided by messages per second. At modest scale — 50 messages per second across 10 channels — that is 50 synchronous permission queries in the hot path, turning a real-time feature into a slow polling system. ISO 25010 performance-efficiency and scalability require that read-hot paths use caching rather than per-operation round trips to persistent storage.
Why this severity: Medium because per-message permission queries add a synchronous database round-trip for every broadcast, collapsing throughput at moderate message rates.
community-realtime.presence.fanout-cache-not-per-messageSee full patternWhen the realtime socket drops and the UI keeps rendering as if everything is fine, users type messages into a dead channel, assume their posts sent, and discover hours later that nothing reached the server. That silent failure mode destroys trust in the product, generates duplicate support tickets, and corrupts conversation ordering once reconnection flushes a backlog of stale sends. A visible connection indicator is the primary user-experience affordance that distinguishes a working realtime feature from one the user has to guess at, and its absence is the top cause of reported bugs in chat and collaboration tools.
Why this severity: Low because the defect is a UX gap with no security impact, though it directly drives support load and silent message loss.
community-realtime.realtime-ux.connection-state-uiSee full patternA reply delivered before its parent message renders as an orphaned response with no visible context. Users see `Re: [message not found]` or a floating reply with no thread anchor, which destroys comprehension and trust in the threading system. CWE-362 (Concurrent Execution Using Shared Resource with Improper Synchronization) applies when two related writes (parent then child) can arrive out of order due to race conditions in asynchronous handlers. Threading is a key value-add for community platforms; broken ordering negates that value.
Why this severity: Low because parent-before-reply ordering failures corrupt thread readability, but they surface only under concurrent send conditions rather than on every message.
community-realtime.realtime-ux.causal-orderingSee full patternA user who types a message while their phone switches from WiFi to cellular loses the message silently — the send fires against a disconnected socket and the error is swallowed. No error message, no retry prompt, no indication the message was lost. This erodes trust in the platform's basic promise of delivery. ISO 25010 reliability covers graceful degradation under transient connectivity loss, and offline-aware apps are now a baseline expectation for communication tools.
Why this severity: Low because silently dropped outbound messages occur only during transient disconnects, but they produce irreversible data loss with no user-visible signal.
community-realtime.realtime-ux.offline-send-queueSee full patternStoring presence, channel subscriptions, or message history exclusively in process memory means that restarting a single server instance silently resets that state for all users connected to it. In production, auto-scaling and rolling deploys trigger instance restarts routinely — users are dropped from channels, presence shows them offline, and in-flight message queues are lost. ISO 25010 reliability requires that state survival is not contingent on any individual process's uptime.
Why this severity: Low because in-process-only state loss occurs on restart or scale events — a failure mode that is invisible in development but routine in production deployments.
community-realtime.realtime-ux.horizontal-scaling-readySee full patternWithout exported metrics, you cannot answer the most basic operational questions: how many connections are active right now, is message throughput degrading, is the outbound queue backing up? Invisible failures — a Redis adapter that silently stops publishing, a connection leak growing at 10 sockets per hour — go undetected until they cause an outage. ISO 25010 reliability includes the ability to monitor system health; a real-time service with no observability is operationally blind.
Why this severity: Low because missing metrics delay detection of connection leaks and queue backlogs, turning slow-burn failures into surprise outages rather than paged alerts.
community-realtime.realtime-ux.observability-metricsSee full patternRun this audit in your AI coding tool (Claude Code, Cursor, Bolt, etc.) and submit results here for scoring and benchmarks.
Open Real-Time Communication Audit