Are you an LLM? Read llms.txt for a summary of the docs, or llms-full.txt for the full context.
Skip to content
Platform

Rate Limits

The input caps MuBit enforces, how it behaves under load, and how the SDK backs off.

ℹ️Note

The SDK-facing data and control runtime does not currently enforce per-route request quotas and does not return X-RateLimit-* or Retry-After headers. The limits that apply today are the input-size caps and the overload behavior below. Quota headers may be added later — don't build clients that depend on them yet.

Input caps (enforced today)

LimitWhere it appliesOver the limit
1000 items per requestcontrol.ingest, control.batch_insert400 (InvalidArgument)
1000 resultslist_run_history, list_projects, list_skills limitSilently clamped to 1000
llm_override.timeout_ms[1000, 600000] msprompt / skill optimize, query overridesClamped into range

Chunk bulk writes into ≤1000-item batches and paginate large lists.

Overload behavior

When a backend dependency is saturated, calls fail with a retryable status rather than a quota error:

  • 429 (ResourceExhausted) — an upstream dependency (e.g. an LLM provider used by reflect/query) is throttling. MuBit applies its own outbound rate limiting to those providers.
  • 503 (Unavailable / FailedPrecondition) — a backend is temporarily unavailable.

Both are safe to retry with backoff. The SDK already retries them — there is no Retry-After header to honor, so use the SDK's exponential backoff. See Retries.

ℹ️Note

Per-tenant request limits do apply to the platform / instance-management API (the console control plane), configured by the operator via MUBIT_PLATFORM_RATE_LIMIT_REQUESTS_PER_MINUTE. That governs instance CRUD and admin traffic — not your remember / recall / query data calls.

Reducing pressure

  • Batch writes with control.ingest (≤1000 items) instead of N synchronous remember() calls.
  • Cache get_context() within a single LLM turn — the same context is usually safe to reuse across tool calls in that turn.
  • Bound result sizes with recall(limit=N) to cut downstream token spend.

See also

  • Errors — status codes and what each means
  • Retries — backoff and idempotency