How Cereby’s Four-Layer Context Makes AI Feel Continuous
The Goal: One Coherent Context Every Time
When you talk to Cereby AI, you expect it to remember what you told it, what you’ve been working on lately, and what’s going on in this chat—without mixing things up or forgetting the thread. To do that, we give the model a single, predictable context on every request, built in a fixed order. We call this the four-layer context model.
The Four Layers (In Order)
Context is assembled in this order so the model always sees information from “most stable” to “most immediate”:
- System instructions — Cereby’s core behavior and rules (how to help with studying, creation, questions, etc.).
- Session metadata — A short, one-time snapshot of this request: device (desktop or mobile), browser, OS, your tier, coin balance, your local time and current date, and your location (when provided). We don’t store this; we only use it so the model can adapt to your environment—for example, resolving “tomorrow” or “next Tuesday” using your date, or tailoring answers to your time zone and locale.
- User memory — What Cereby “remembers” about you: preferences and facts you’ve shared (either by saying “remember this” or that it picked up over time). We inject only the memories that are relevant to your current message, within a token budget, so the model gets signal without noise.
- Recent conversation summaries — Short, user-side summaries of your past chats (e.g. “Mar 8: Building a load balancer in Go – asked about connection pooling”). We send the last several sessions so the model has continuity across days without re-reading every old message.
- Current session messages — The full transcript of the chat you’re in right now.
- Your current message — What you just sent.
So on every turn, the model sees: who the assistant is → this request’s environment → what we know about you → what you’ve been up to lately → this conversation → this message.
Why Order Matters
Putting context in a fixed order does two things:
- Consistency — The model always gets the same “shape” of context. That makes behavior more predictable and easier to tune.
- Sensible trimming — If we ever need to cut context to fit a token budget, we trim from the “oldest” or least essential side first (e.g. oldest messages in the current chat, then older recent summaries), and keep system instructions, session metadata, and the most relevant memory and summaries. So the most important layers stay intact.
What This Means for You
- Personalization — Cereby uses your stored memories (Layer 3) and recent summaries (Layer 4) so it can refer back to past topics and preferences without you repeating yourself.
- Continuity — Even if you switch between chats or come back days later, the model has a compact view of what you’ve been working on, so it can stay on topic and suggest relevant next steps.
- Control — You can turn off memory or use a temporary chat when you don’t want this session to be remembered; in those cases we skip user memory and recent summaries but still send session metadata so the model knows your environment.
Under the Hood (Without Getting Too Technical)
- Session metadata is built from your request (e.g. User-Agent, timezone, local time, and location when your client sends them) and your account (tier, coins). We use your local time and current date so the model can resolve relative dates like “tomorrow” or “next week,” and we include location when available. This block is computed per request and never stored.
- User memory is stored in Cereby’s memory layer and retrieved by relevance to your current message (using embeddings and a token cap), not by dumping everything you’ve ever said.
- Recent conversation summaries are short, user-focused lines (what you talked about) stored per chat session and updated when you start or continue a conversation. We send the last several of these so the model has cross-session continuity without sending full histories.
Together, these four layers (plus system instructions and the current transcript) give Cereby a clear, ordered picture of who you are, where you are, what you’ve been doing, and what you’re asking now—so every reply can be both consistent and personalized.
Visual Summary
flowchart TD
A[System Instructions] --> B[Session Metadata]
B --> C[User Memory]
C --> D[Recent Conversation Summaries]
D --> E[Current Session Messages]
E --> F[Current User Message]
F --> G[Model Response]