Building the Cereby Humanizer: A Rule-Based System That Fights AI Detectors
How we engineered a deterministic text-transformation pipeline that takes AI-generated writing and reshapes it to read more like a human wrote it, plus where the approach breaks down and what we're doing about it.
The problem every AI-assisted writer runs into
AI detection tools are everywhere now. Students run their writing through detectors before submitting. Teachers paste assignments into scoring tools. If you used an AI assistant to help draft or brainstorm, your text might still light up as "AI-generated" even when the ideas are genuinely yours.
We needed a humanizer. Not a "spin the words and hope" tool, but something that actually understands why detectors flag text and systematically neutralizes those signals. What we built is a two-pass pipeline: a neural paraphraser rewrites for semantic variation, then a rule-based engine applies 12 targeted transforms driven by detector feedback. If the score stays above 50% after one pass, the pipeline re-scores and runs again, up to three total.
The headline result: on our benchmark sample, the pipeline dropped a 68% AI-detection score to 51%, a 17-point reduction, while keeping the text grammatically correct and semantically faithful. The honest caveat is that 51% is still above the 40% target we set, and some detector signals (discourse markers in particular) barely moved. This post covers the architecture, what each piece does, why some signals are stubborn, and our plan for closing the gap.
What detectors actually measure
We reverse-engineered signals from two detector architectures. Heuristic detectors expose explicit metrics: burstiness (AI text has suspiciously uniform sentence lengths), transition word density ("Furthermore," "Moreover," "In conclusion"), formality ("utilize" vs. "use," "demonstrate" vs. "show"), lexical diversity, discourse markers, and human-likeness (contractions, hedges, informal texture). Neural detectors output only a probability, learned implicitly from millions of human-vs-AI examples.
The naive approach, "just rewrite everything to sound casual," fails immediately. Academic writing needs to stay academic. The humanizer has to reduce detector signals without destroying the register.
The shape we landed on
Every humanize request flows through a fixed pipeline. Semantic rewriting and signal reduction are different problems, so we separate them into two passes.
The neural pass is broad: it restructures sentences and swaps vocabulary at a level regex cannot reach, but it does not target specific signals and might lower formality in one paragraph while raising transition density in another. The rule-based engine is surgical: it knows exactly which words trigger each metric. Together, the neural pass breaks surface patterns, then rules clean up whatever the detector is still flagging.
How the transforms work
The three rule-based transforms that carry the most weight:
Burstiness injection splits sentences over 22 words at conjunctions or relative clauses and adds short "punch" sentences (2 to 5 words). AI text has suspiciously uniform lengths; the variation is what humans actually produce.
Discourse marker stripping removes "First," "Second," "In conclusion," "To summarize," and in strengthen mode also "Similarly," "Ultimately," "Essentially." These are the scaffolding words that make text read like an outline.
Contractions expand "do not" to "don't," "it is" to "it's," and in strengthen mode go as far as "gonna" and "wanna." The step skips text inside double quotes to preserve quotations. We learned this the hard way when Round 3 showed the pipeline mangling "Something there is that doesn't love a wall."
The remaining nine steps (transition reduction, formality swaps, informal asides, sentence restructuring, personality injection, rhetorical questions, conversational markers, emotional tone, and synonym substitution) operate on the same signal-driven logic: only run when the detector reason is present, cap insertions to avoid over-editing.
If 4 or more strengthen flags fire, the pipeline recurses with a second-pass flag to catch signals that survived the first sweep. The API layer also re-scores after the rule-based pass and runs another full round if the score is still above 50%, up to 3 total.
Where the rules hit a wall
The discourse marker ceiling
Across every round of testing, the discourse marker score stayed at 1.00 (the maximum). Our transforms strip sentence-initial markers, but the detector appears to measure discourse density at a level our rules do not fully reach. Mid-sentence markers, paragraph-level structure, and argument scaffolding patterns all contribute. Regex-based stripping can only go so far without risking coherence.
Human-likeness that goes the wrong way
In Round 2, human-likeness actually dropped (0.16 to 0.11) after humanization. Adding contractions and informal markers to formal academic text made the text feel inconsistent rather than human. The detector picked up on the register mismatch. We added context-aware contraction limits: when both the human-likeness and informality score signals are flagged, the humanizer skips contraction strengthening for formal text. Formal-but-AI turns out to be better than formal-sprinkled-with-gonna.
The neural detector gives no reasons
Our production detector returns only "Classifier: likely AI-generated" with no granular signals. First probe results showed a delta of 0: the humanizer moved the score from 99 to 99. The same transforms that beat explicit-metric detectors do not move a learned classifier because the feature space is different.
Results
| Metric | Before | After (Round 5) | Delta |
|---|---|---|---|
| AI detection score | 68% | 51% | -17 pts |
| Human-likeness | 0.16 | 0.19 | +0.03 |
| Discourse markers | 1.00 | 1.00 | No change |
| Sentence entropy | Low | 0.71 | Improved |
| Grammar correctness | Correct | Correct | Preserved |
What's next
The rule-based system has reached a plateau. Three things are next.
Fine-tune a humanizer model on detector-filtered pairs with the detector as a reward signal. Hand-written rules cannot beat a neural classifier that learned patterns from millions of examples; we need a model that can reason about text the way the detector does.
Wire through the strength and personality parameters the paraphrase endpoint already accepts but currently ignores. Right now the neural pass runs at fixed aggressiveness; turning those knobs would let us match intensity to the input register.
Add per-step score deltas in production. The "strengthen everything" fallback is a blunt instrument. Ablation data would tell us which of the 12 transforms actually moves the neural classifier and which are just adding latency.
