Why Cereby Supports Multiple AI Models (and How to Choose One)
The Problem: Two Models Created Product and Reliability Limits
Our initial model strategy relied on a narrow default set. This simplified early operations, but it introduced structural limits as usage diversified:
- vendor concentration increased outage and rate-limit risk
- task diversity was forced through too few latency/cost profiles
- users had limited control over model quality vs. price tradeoffs
- experimentation across model families was constrained
The outcome was acceptable for baseline chat, but suboptimal for a learning product with variable task complexity.
The Architecture Shift: Multi-Provider Model Routing
We moved to a gateway-based model architecture with a curated model allowlist across multiple providers. This gives us a single integration surface while preserving provider diversity at runtime.
At the product layer, we classify models into cost tiers (low, medium, high) and expose those tiers in the selector so users can make explicit tradeoff decisions.
Design Goals
The migration targeted four concrete goals:
- Resilience — reduce dependence on a single provider.
- Task-model fit — support different models for different workloads.
- Cost transparency — expose a clear mapping between model tier and usage cost.
- Operational agility — allow model curation updates without architecture rewrites.
What Changed for Users
The selector now supports a broader model catalog with two key UX constraints:
- each model includes concise performance/cost context
- access to premium tiers is explicit and entitlement-aware
This prevents hidden upgrades and improves trust in cost behavior.
Choosing the Right Tier (Practical Heuristic)
Use low tier for short, repetitive, and exploratory tasks where latency and cost efficiency dominate. Use medium/high tiers for nuanced reasoning, long-form transformation, and high-stakes outputs where quality variance has meaningful user impact.
Coins are still consumption-driven by token usage; tiers provide guidance, not fixed per-message billing.
Important Scope Note
Not every AI feature follows the interactive model selector. Some subsystems pin model choices to preserve deterministic behavior, throughput targets, or domain-specific quality requirements.
Outcome
The multi-model transition improved three things simultaneously: product resilience, user control over cost/quality tradeoffs, and operational flexibility for future model updates. Instead of forcing all requests through a single quality/cost profile, Cereby now matches model capacity to task requirements in a transparent way.
Visual Summary
flowchart LR
A[Request Type] --> B{Routing Policy}
B -->|Low latency tasks| C[Fast Model]
B -->|Balanced tasks| D[Standard Model]
B -->|Complex reasoning| E[Advanced Model]
C --> F[Unified Response Interface]
D --> F
E --> F