Retries and Idempotency
When to retry MuBit calls, how to make writes safe to repeat, and what the SDK does for you automatically.
What the SDK retries automatically
By default the SDK retries transient failures with exponential backoff and jitter:
mubit.ServerError— any5xx(500/503) from the server.mubit.TransportErrorwhose.codeis transient:UNAVAILABLE,DEADLINE_EXCEEDED,RESOURCE_EXHAUSTED,ABORTED,INTERNAL,CANCELLED,CONNECTION_ERROR, orTIMEOUT(includes network-level failures before any response arrived).
It does not retry AuthError (401), ValidationError (400/409), or UnsupportedFeatureError. Those are caller errors — retrying makes them worse.
Retries are tuned process-wide through environment variables (there is no per-call or per-client RetryPolicy object):
| Env var | Default | Meaning |
|---|---|---|
MUBIT_RETRY_ATTEMPTS | 3 | Total attempts including the first (min 1). |
MUBIT_RETRY_BASE_MS | 200 | Base delay in ms. |
MUBIT_RETRY_CAP_MS | 5000 | Maximum delay per retry. |
MUBIT_RETRY_JITTER | 0.2 | ± jitter fraction (0.0 disables jitter). |
Backoff is exponential (factor 2) off MUBIT_RETRY_BASE_MS, capped at MUBIT_RETRY_CAP_MS.
MUBIT_RETRY_ATTEMPTS=5
MUBIT_RETRY_BASE_MS=500Idempotency keys
remember() (and the underlying control.ingest) carry an idempotency key so a repeated write returns the existing entry instead of creating a duplicate. If you don't pass one, the key defaults to the item id (item_id, else an auto-generated remember-<timestamp>). Pin it explicitly to dedupe across retries from a queue worker:
client.remember(
session_id=run_id, agent_id="support-agent",
content="…",
intent="fact",
idempotency_key=f"ticket-{ticket_id}-fact-1",
)record_outcome also accepts an idempotency_key on the wire, so a retried outcome write reinforces once rather than double-counting. Other writes are naturally idempotent by their own ids (e.g. archive keys on the block id, register_agent on the agent id).
When to retry yourself
The SDK's built-in retries cover most cases. Wrap a longer outer budget (queue workers, batch jobs) only when you need one:
- Retry transient
TransportErrorandServerError. - Don't retry
AuthError,ValidationError/AlreadyExistsError, orUnsupportedFeatureError— fix the call instead.
Recommended pattern
from mubit import ServerError, TransportError
import time, random
_TRANSIENT = {"UNAVAILABLE", "DEADLINE_EXCEEDED", "RESOURCE_EXHAUSTED",
"ABORTED", "INTERNAL", "CANCELLED", "CONNECTION_ERROR", "TIMEOUT"}
def with_retry(fn, max_attempts=4, base_ms=300):
for attempt in range(max_attempts):
try:
return fn()
except ServerError:
pass
except TransportError as e:
if getattr(e, "code", None) not in _TRANSIENT:
raise
time.sleep(base_ms * (2 ** attempt) * (0.8 + random.random() * 0.4) / 1000)
raise RuntimeError("retries exhausted")See also
- Errors — status codes and the SDK exception taxonomy
- Rate limits — input caps and overload behavior