Batch processing is used for non-interactive AI tasks

ab-000324 · ai-token-optimization.caching-cost.batch-processing

Severity: infoactive

Why it matters

The OpenAI Batch API offers a 50% cost discount and higher throughput limits for requests that can tolerate a 24-hour completion window. Non-interactive tasks — nightly content tagging, bulk summarization, product description generation, weekly report creation — have no reason to use real-time standard API calls and pay the full rate. Sending these jobs one-by-one through the synchronous endpoint forgoes half the available cost savings and burns synchronous rate limit capacity that user-facing features need. ISO 25010 performance-efficiency requires that resource consumption be calibrated to actual operational constraints.

Severity rationale

Info because missing batch API usage wastes available cost discounts on background tasks, but does not cause failures or affect real-time user experience.

Remediation

Migrate background AI jobs to the OpenAI Batch API using a JSONL request file. The 50% discount applies automatically; standard rate limits do not apply to batch jobs.

// src/jobs/bulk-summarize.ts
const batchRequests = articlesToSummarize.map(article => ({
  custom_id: `article-${article.id}`,
  method: "POST",
  url: "/v1/chat/completions",
  body: {
    model: "gpt-4o-mini",
    messages: [{ role: "user", content: `Summarize: ${article.content}` }],
    max_tokens: 200,
  },
}));

const filePath = "/tmp/batch-requests.jsonl";
fs.writeFileSync(filePath, batchRequests.map(r => JSON.stringify(r)).join("\n"));

const file = await openai.files.create({ file: fs.createReadStream(filePath), purpose: "batch" });
await openai.batches.create({ input_file_id: file.id, endpoint: "/v1/chat/completions", completion_window: "24h" });

Verify by confirming that cron jobs and queue workers route through openai.batches.create rather than chat.completions.create.

Detection

ID: batch-processing
Severity: info
What to look for: Look for background jobs, cron tasks, or queue workers that process AI requests in bulk. Check for integration with the OpenAI Batch API (openai.batches.create), or background processing frameworks (BullMQ, Inngest, trigger.dev, Vercel cron) that aggregate non-interactive AI tasks. Also look for patterns where multiple items are processed one-by-one in a for-loop with synchronous await calls to the AI API. Count all instances found and enumerate each.
Pass criteria: Non-interactive, time-tolerant AI tasks (e.g., nightly summarization, bulk content tagging, report generation) use batch APIs or queued processing, taking advantage of batch pricing discounts (50% discount on OpenAI Batch API).
Fail criteria: Background jobs process AI items one-by-one using standard synchronous API calls, missing available batch pricing. Or all AI processing is synchronous regardless of whether user-facing latency is actually required.
Skip (N/A) when: All AI features are genuinely real-time and user-facing with no background processing or bulk operations. Signal: No cron jobs, no queue workers, no batch processing files, and all AI calls are in response to direct user requests with no background jobs.
Detail on fail: "Bulk AI operations use synchronous standard API — 50% batch discount not utilized"

Remediation: The OpenAI Batch API offers a 50% cost discount and higher rate limits for requests that can tolerate up to 24-hour completion windows — ideal for content processing, tagging, or summarization jobs.

// src/jobs/bulk-summarize.ts
import fs from "fs";
import path from "path";

// Build batch input file
const batchRequests = articlesToSummarize.map((article, i) => ({
  custom_id: `article-${article.id}`,
  method: "POST",
  url: "/v1/chat/completions",
  body: {
    model: "gpt-4o-mini",
    messages: [
      { role: "user", content: `Summarize: ${article.content}` }
    ],
    max_tokens: 200,
  },
}));

const filePath = path.join("/tmp", "batch-requests.jsonl");
fs.writeFileSync(filePath, batchRequests.map(r => JSON.stringify(r)).join("\n"));

const file = await openai.files.create({ file: fs.createReadStream(filePath), purpose: "batch" });
const batch = await openai.batches.create({
  input_file_id: file.id,
  endpoint: "/v1/chat/completions",
  completion_window: "24h",
});

Verify by checking that background AI jobs route through the Batch API rather than standard completions.

External references

iso-25010:2011 · performance-efficiency.resource-utilization — Resource Utilization — batch API used for non-interactive AI workloads

Taxons

cost-efficiency

History

2026-04-18·v1.0.0·Initial import from ai-token-optimization·automated