Skip to main content

Category taxonomy uses controlled vocabulary, not freeform text

ab-001025 · directory-listing-schema.content-completeness.category-taxonomy
Severity: highactive

Why it matters

Free-text categories produce Restaurant, restaurant, Restaraunt, and Food as four distinct buckets for the same concept. Your browse pages fragment, faceted filters break, sitemaps balloon with near-duplicate URLs that Google treats as keyword cannibalization, and analytics reports become impossible to trust. Once inconsistent data lands in the database, deduplication is a manual SQL cleanup job that costs far more than enforcing a controlled vocabulary at submission time. Schema.org Category emitters also require stable enum values.

Severity rationale

High because free-text taxonomies fragment browse pages, break filters, and create SEO cannibalization that worsens with scale.

Remediation

Replace the free-text input with a select backed by a canonical enum or category table, and validate the submitted value against that list on the server. Store categories as a foreign-key reference or a constrained string column so the database rejects rows with values outside the vocabulary. Define the enum in src/lib/taxonomy/categories.ts and import it in both the form and the API handler.

export const CATEGORIES = ['restaurant', 'cafe', 'retail', 'services'] as const
export const CategorySchema = z.enum(CATEGORIES)

Seed a category table with the same values and add a CHECK constraint or FK.

Detection

  • ID: directory-listing-schema.content-completeness.category-taxonomy

  • Severity: high

  • What to look for: List all category values used across listings. For each category, check the category field in the submission form or schema. Is it a dropdown/select from a predefined list, or a free-form text input? Sample the data and look for category values that might be typos or duplicates.

  • Pass criteria: Categories are selected from a controlled vocabulary (dropdown, select, or predefined enum) and are not free-form text input — 100% of categories must come from a defined taxonomy or enumeration, not free text. Report: "X unique categories found, all Y from a defined taxonomy."

  • Fail criteria: Categories are stored as free-form text, allowing inconsistent, misspelled, or duplicate categories.

  • Skip (N/A) when: Never — taxonomy consistency is essential for directory browsing and SEO.

  • Cross-reference: The multi-value-arrays check verifies that multi-category listings use proper array formatting.

  • Detail on fail: Example: "Category is a free-form text input. Found 'Restaurant', 'restaurant', 'Food'"

  • Remediation: Use a select/dropdown with predefined categories:

    <select name="category" required>
      <option value="">Select a category</option>
      <option value="restaurant">Restaurant</option>
      <option value="cafe">Café</option>
    </select>
    

Taxons

History