Accurate Citations with Compressed Context: A Two-Stage Verification System
The Citation Challenge
In our previous posts about Hierarchical Context Compression and Query-Aware Smart Compression, we detailed how Cereby AI reduces token usage by 90-95% while maintaining response quality. But compression introduces a critical challenge:
How do you provide accurate citations when the AI never sees most of the document?The Problem
Consider this scenario:
Student uploads: 100-page biology textbook (150,000 tokens)
Student asks: "What is cellular respiration?"
After compression:
✅ AI receives: 8,000 tokens (5% of original)
✅ AI generates: "Cellular respiration is the process..."
❓ Citation needed: Which page? What's the exact quote?
Problem: The AI only saw pages 45-46 (compressed).
How do we verify the quote actually exists on those pages?
Why Citations Matter
For educational content, accurate citations are non-negotiable:
- Academic Integrity — Students need proper source attribution
- Verification — Users must be able to check the AI's claims
- Trust — Incorrect citations destroy confidence in the system
- Navigation — Citations help users find related information
- Legal Compliance — Required for copyrighted academic materials
The Two-Stage Solution
We developed a two-stage architecture that separates AI response generation from citation verification:
Stage 1: Compressed Response Generation
↓
Large Document (150K tokens)
↓
Compression Pipeline (90-95% reduction)
↓
Compressed Context (8K tokens)
↓
AI Generates Response (sees only compressed content)
↓
Response with Quotes
Stage 2: Citation Extraction and Verification
↓
Full Original Content (stored in database)
↓
Extract Quotes from Response
↓
Search Full Original Content
↓
Find Exact Locations
↓
Generate Verified Citations
↓
Final Response with Accurate Citations
The Key Insight: The AI generates responses from compressed content, but citations are always verified against the full original content stored in the database.
Stage 1: Compressed Response Generation
First, we generate the AI response using our compression pipeline:
The Process
- Compress file context (90-95% reduction)
- Build system prompt with citation requirements
- Generate AI response (sees only compressed content)
- Extract initial response for citation processing
The AI sees compressed content and generates a response, naturally including quotes and information from the content it was shown.
Stage 2: Citation Extraction and Verification
After the AI generates its response, we extract citations and verify them against the full original content:
The Citation Extraction Architecture
Our citation extraction system performs three key operations:
- Extract quotes from AI response using pattern matching
- Search full original content (not compressed version) for each quote
- Build citations with exact page numbers/timestamps/sections
AI Response:
Cellular respiration "converts glucose into ATP energy" through
three stages. According to the textbook, "the Krebs cycle produces
NADH and FADH2" which are essential for the electron transport chain.
Extracted Quotes:
"converts glucose into ATP energy""the Krebs cycle produces NADH and FADH2"
Location Verification: Full Content Search
This is the critical step — we search the full original content, not the compressed version.
The system:
- Accesses full original content from database
- Searches for each quote using case-insensitive matching
- Identifies exact location based on file type
- Builds location object with confidence score
Document Location: Page Number Identification
For PDF/document files, we identify the exact page number by:
- Searching through all pages in full document
- Finding page where quote appears
- Extracting page number
- Returning structured location object
Video/Audio Location: Timestamp Identification
For video and audio files, we identify the exact timestamp by:
- Searching through full transcript
- Finding segment where quote appears
- Extracting timestamp
- Formatting for display
Web Link Location: Section Identification
For web articles, we identify the section heading by:
- Searching through all sections
- Finding section where quote appears
- Extracting heading name
- Building section reference
Integration with Compression Systems
The citation system integrates seamlessly with both compression strategies:
Integration with Hierarchical Context Compression
When using page-level compression:
- AI receives full text of most relevant pages
- AI receives summaries of moderately relevant pages
- Citation verification checks all pages (including omitted ones)
Integration with Query-Aware Smart Compression
When using chunk-level compression within a page:
- AI receives most relevant chunks from each page
- Chunks may have gaps between them
- Citation verification checks entire page content (all chunks)
Fallback Strategies
What happens when citations can't be extracted normally?
Strategy 1: Explicit Citation Formats
We instruct the AI to use specific citation formats in the system prompt. The AI learns to include structured citations that are easy to parse.
Strategy 2: Fallback Citation Generation
If the AI doesn't provide explicit citations, we generate them automatically by:
- Extracting key phrases from AI response
- Searching full document for these phrases
- Finding best matches with surrounding context
- Generating citations pointing to those locations
- Adding confidence scores (lower for fallback citations)
Strategy 3: Quote Verification with Fuzzy Matching
Sometimes quotes have minor differences (punctuation, whitespace). Our system handles this by:
- Attempting exact match first
- Falling back to fuzzy match (ignoring punctuation and whitespace)
- Normalizing both quote and content for comparison
Real-World Examples
Example 1: PDF Document with Compression
Setup:- File: 100-page biology textbook
- Query: "Explain cellular respiration"
- Compression: 150K → 8K tokens (95% reduction)
- Pages included (full): 45, 46
- Pages included (summary): 12, 43, 44, 47
- Pages omitted: 1-11, 13-42, 48-100
Cellular respiration is "the process by which cells convert glucose
into ATP energy" through three main stages. The Krebs cycle "produces
NADH and FADH2 which are used in the electron transport chain" to
generate the majority of ATP molecules.
Citation Extraction:
Quote 1: Searched full 150K token document → Found on page 45 Quote 2: Searched full 150K token document → Found on page 46
Example 2: Video with Timestamp Citations
Setup:- File: 45-minute biology lecture video
- Query: "What did the professor say about mitochondria?"
- Compression: Full transcript → relevant segments only
The professor explained that "mitochondria are the powerhouse of
the cell because they produce most of the ATP." Later in the lecture,
she noted that "mitochondrial dysfunction is linked to many diseases
including Parkinson's and Alzheimer's."
Citation Extraction:
Quote 1: Searched full 45-minute transcript → Found at 5:23 timestamp Quote 2: Searched full 45-minute transcript → Found at 38:12 timestamp
Performance Characteristics
Citation Extraction Speed
Measured on 1,000 real queries:
| File Size | Pages/Length | Quotes | Extraction Time | Success Rate |
|---|---|---|---|---|
| Small | 10 pages | 2-3 | 50-100ms | 98% |
| Medium | 50 pages | 3-5 | 150-300ms | 97% |
| Large | 200 pages | 4-6 | 400-800ms | 96% |
| Video | 60 min | 3-4 | 200-400ms | 95% |
Accuracy Metrics
We validated citation accuracy on 5,000 responses:
| Metric | Result |
|---|---|
| Correct page numbers | 96.3% |
| Correct timestamps | 94.8% |
| Correct sections | 97.1% |
| Quote accuracy | 98.2% |
| False citations | 0.3% |
Storage Efficiency
Citations add minimal storage overhead - approximately 350 bytes per citation, with average responses containing 3-5 citations for a total of ~1.5 KB per response.
Future Enhancements
1. Visual Citation Highlighting
Show citations with visual highlights in the original document, allowing users to scroll to exact location and see highlighted quote with surrounding context.
2. Citation Quality Scoring
Rate citation confidence and quality with scores for exact matches vs. paraphrases vs. inferred citations.
3. Cross-File Citation Linking
When quoting from multiple files, show relationships and agreement scores across sources.
4. Interactive Citation Verification
Let users verify and correct citations, with the system learning from corrections to improve future extractions.
Conclusion
The Citation Extractor system solves the fundamental challenge of providing accurate citations with compressed context through a two-stage architecture:
Stage 1: Response Generation- AI sees compressed content (5-8% of original)
- Generates response with quotes and claims
- Works with both page-level and chunk-level compression
- Extract quotes from AI response
- Verify against full original content (100%)
- Find exact page numbers / timestamps / sections
- Generate accurate, verifiable citations
✅ 96.3% citation accuracy across 5,000 test responses ✅ <1 second extraction time even for 200-page documents ✅ Works with compression (no accuracy loss despite 95% token reduction) ✅ Multi-format support (PDFs, videos, audio, web articles) ✅ Fallback strategies ensure citations are always provided ✅ Minimal overhead (~1.5 KB per response)
The system demonstrates that aggressive compression and perfect citation accuracy are not mutually exclusive. By separating response generation (compressed context) from verification (full content), we achieve both token efficiency and citation accuracy.
For teams building similar systems, the key lesson is: Never verify against what the AI saw—always verify against the source of truth.
Interested in how we handle compression? Read about Hierarchical Context Compression for page-level compression and Query-Aware Smart Compression for chunk-level compression.
Visual Summary
flowchart TD
A[Full Document] --> B[Hierarchical Compression]
B --> C[Response Generation on Compressed Context]
C --> D[Candidate Citations]
D --> E[Verify Against Full Source]
E --> F{Citation Exact?}
F -->|Yes| G[Publish Response]
F -->|No| H[Reject + Recompute Citation]
H --> E