How AI Detectors Work: Perplexity, Burstiness, Classifiers, and Watermarking Explained
Learn how AI detectors really work. This long-form guide breaks down perplexity, burstiness, statistical analysis, classifier models, and watermarking (SynthID), what detectors look for, and why they often fail.
AI detectors are not magic. They're just math, and the math is getting overwhelmed. They work by looking for statistical patterns that AI models tend to leave behind, but as AI gets better at sounding human, these patterns are getting harder and harder to spot.
Quick answers
What are AI detectors? They are software tools that analyze text, images, or other media to estimate the probability that it was generated by an AI. They are not lie detectors or plagiarism checkers; they are pattern-matching systems.
How do AI detectors work for text? Most use a combination of statistical analysis and machine learning. They measure things like text predictability (perplexity), sentence length variation (burstiness), and other stylistic features to see if they match known patterns of AI writing.
What is perplexity in AI detection? Perplexity measures how predictable a sequence of words is to a language model. AI-generated text often uses very common, high-probability words, resulting in low perplexity. Human writing is usually more surprising and less predictable, giving it a higher perplexity score.
What is burstiness in AI detection? Burstiness refers to the rhythm and flow of writing. Humans tend to write with a mix of long, complex sentences and short, punchy ones. This variation is high burstiness. AI models, especially older ones, often produce text with uniform sentence lengths, which is low burstiness.
How accurate are AI detectors? Honestly, not very. Their accuracy is inconsistent and drops significantly with short texts, human-edited AI content, or text from newer AI models. OpenAI famously shut down its own detector in 2023 because of its "low accuracy."
Why do detectors flag human writing as AI? This is called a false positive, and it's a huge problem. It happens because clean, concise, or formulaic human writing (like technical manuals, some academic papers, or simple SEO articles) can statistically resemble AI-generated text. Non-native English speakers are also flagged at a higher rate.
What is AI watermarking? Watermarking is a technique where the AI model intentionally embeds a hidden, invisible signal into its output. For images, Google's SynthID subtly alters pixels. For text, it might involve a secret pattern of word choices. A corresponding detector can then look for this specific signal. It's more reliable but requires the AI creator to participate.
Can you make AI text undetectable? Yes, pretty easily. Lightly editing the text, changing sentence structures, or using an AI humanizer tool like ours can usually bypass most detectors. The goal of these tools isn't just evasion; it's to restore the human voice and style that AI strips away.
Are AI detectors and plagiarism checkers the same? No. A plagiarism checker looks for copied content by comparing a document against a massive database of existing text. An AI detector looks at the *style and statistical properties* of the writing itself to guess its origin.
Should I use an AI detector to discipline a student or fire a writer? Absolutely not. Given their high error rates, using an AI detection score as the sole piece of evidence is irresponsible and risky. MIT and other institutions strongly advise against it.
How Different AI Detection Methods Stack Up
Here’s a breakdown of the main techniques detectors use. Most modern tools, like GPTZero or Originality.ai, use a mix of these, but they all have their breaking points.
| Method / Approach | Primary Techniques Used | Strengths & When It Works Best | Weaknesses & Common Failure Modes |
|---|---|---|---|
| Statistical Analysis | Measures perplexity (predictability) and burstiness (sentence variation). | Good at catching raw, unedited output from older models like GPT-3. The signals are easy to calculate and understand. | Easily fooled. A little human editing or a simple prompt like "write with varied sentence lengths" can defeat it. Fails on newer, more sophisticated models. |
| Classifier Models | A machine learning model is trained on millions of examples of human vs. AI text. It learns to recognize complex patterns beyond just perplexity. | More solid than simple statistics. Can identify subtle stylistic tics of specific AI models it was trained on. This is the core of most modern detectors. | The model is a "black box," so you don't know *why* it made a decision. It's only as good as its training data and quickly becomes outdated as new LLMs are released. Prone to bias. |
| Stylometry | Analyzes the author's "fingerprint" by measuring features like vocabulary richness, punctuation habits, use of specific function words, and sentence complexity. | Can be effective for identifying authorship in general, not just AI. It looks at a wider set of features than basic statistical methods. | Requires a large amount of text to be reliable. Useless on short snippets. An AI can be prompted to mimic a specific style, confusing the analysis. |
| Watermarking | The AI model embeds a hidden, cryptographic signal into the output (e.g., a specific pattern of word choices or invisible pixel changes). | Very high accuracy if the watermark is present and the detector knows the "key." solid against many types of editing. | Almost never used in practice for text. Requires the AI company (like OpenAI or Google) to build it in, which they mostly haven't. Useless for detecting content from any model that isn't watermarked. |
| Semantic Analysis | Looks at the meaning and logical flow of the text. Tries to spot content that is grammatically correct but logically hollow or nonsensical. | A more intelligent approach that tries to "understand" the text. Can sometimes catch "AI nonsense" that other methods miss. | Extremely difficult to implement reliably. What one person considers "logically hollow," another might see as creative or abstract. Highly subjective. |
What AI Detectors Are (and What They Are Not)
Let's get one thing straight.
AI detectors are not truth machines. They are not lie detectors. They are not definitive proof of anything.
Think of an AI detector like a weather forecast for a single picnic. It might say there's a 90% chance of sunshine. But you could still get rained on. It might say there's a 70% chance of a thunderstorm, and you end up with a beautiful, clear day. The forecast is just a probability based on patterns. It can be wrong. And when it's wrong, your picnic is ruined.
That's an AI detector. It takes a piece of text and runs it through a mathematical model that says, "Based on the patterns I was trained on, this text has an X% probability of being generated by an AI."
That's it. It's a guess. An educated guess, maybe, but still a guess.
Here's the problem. People, especially people in positions of power like teachers or managers, often treat that percentage as gospel. They see "98% AI-generated" and think it means "98% confirmed cheating." This is a fundamental misunderstanding of the technology. According to a 2023 report in the *Washington Post*, these tools have led to students being falsely accused of cheating, with some facing serious academic consequences based on faulty evidence.
So, an AI detector is:
- A pattern-matching tool.
- A probabilistic forecaster.
- A signal that *might* warrant further investigation.
An AI detector is not:
- Proof of academic dishonesty.
- A reliable way to screen job candidates.
- A fair or unbiased judge of writing.
Treating it like one is, frankly, irresponsible.
Core Signals: Perplexity, Burstiness, and Statistical Patterns
Early AI detectors, and many that still exist today, were built on two core ideas: perplexity and burstiness. They sound complicated, but the concepts are actually pretty simple.
Perplexity: The "I Knew You'd Say That" Metric
Perplexity is just a fancy word for predictability.
Imagine you're reading this sentence: "The cat sat on the ___."
What word comes next? Probably "mat." Or "floor." Or "couch." A language model has seen this pattern millions of times. For the model, the word "mat" is extremely low-perplexity. It's completely expected.
But what if the sentence was: "The cat sat on the existential dread"?
That's a bit more surprising. It's a higher-perplexity choice. Humans do this all the time. We make weird connections, use metaphors, and try to be clever. Our writing is full of little surprises.
AI models, especially older or lazy ones, are trained to choose the most probable next word. They are, by their very nature, perplexity-minimizing machines. Their goal is to write the most generic, expected, statistically average text possible based on their training data.
So, when an AI detector analyzes a piece of text, it's essentially asking its own internal language model: "How surprising is this text?" If the text is full of boring, predictable word choices, the detector flags it as low-perplexity. And low perplexity is a big red flag for AI generation.
The problem? Good human writing can sometimes be very clear and simple. "The user clicks the button. A dialog box appears." That's low-perplexity, but it's also just good technical writing. It's not AI. The detector doesn't know the difference.
Burstiness: The Rhythm of Writing
Burstiness is all about variation.
Read a paragraph written by a person. You'll probably see a mix of sentence lengths. A long, winding sentence full of clauses might be followed by a short, sharp one. Like this. It creates a certain rhythm. A flow. That's high burstiness.
Now, look at a lot of raw AI output. You'll often see something different. The sentences are all roughly the same length. They follow a similar structure. Clause, clause, period. Clause, clause, period. It reads like a robot wrote it, because a robot did. This is low burstiness. It's too smooth, too uniform, too... perfect.
AI detectors measure this variation. They calculate the standard deviation of sentence lengths and other structural features. If the variation is too low, it's another check in the "probably AI" column.
This was a very effective signal against early models like GPT-2 and even GPT-3. But it's becoming less reliable. Why? Because you can just tell the AI to fix it.
A prompt like, "Rewrite this paragraph, but use a mix of long and short sentences. Make it sound more human and less robotic," is often enough to completely fool a burstiness-based detector. Modern models like GPT-4 and Claude 3 are already trained to do this by default. They've learned that humans like burstiness, so they've learned to fake it.
So while perplexity and burstiness were the foundation of AI detection, they are now just two signals among many. And they're the easiest ones to trick.
Under the Hood: Classifier Models and Embeddings
Modern detectors have moved beyond simple statistics. The real engine behind tools like GPTZero, Originality.ai, and Turnitin is a classifier model.
This is where the machine learning comes in.
Here's the process, simplified:
- Get a Ton of Data: First, you need a massive dataset. You collect millions of articles, essays, and blog posts written by humans. Then, you use a bunch of different AI models (GPT-3, GPT-4, Claude, etc.) to generate millions more. You now have two giant piles of text: "Human" and "AI."
- Turn Words into Numbers (Embeddings): A computer can't read words. It needs numbers. So, you use a special kind of model to convert every piece of text into a list of numbers called a "vector embedding." This isn't just a simple code; the embedding captures the *semantic meaning* of the text. For example, the embeddings for "king" and "queen" will be mathematically related in the vector space. This is the same core tech that powers the LLMs themselves.
- Train the Classifier: Now you take your labeled embeddings (this one is "Human," that one is "AI") and feed them into a classifier model. The model's job is to learn the mathematical differences between the "Human" embeddings and the "AI" embeddings. It might learn that AI embeddings tend to cluster in a certain region of the vector space, or that they have a certain mathematical property that human ones don't. It learns thousands of these subtle patterns, far more complex than just perplexity or burstiness.
- Make a Prediction: Once the classifier is trained, you can give it a new, unlabeled piece of text. It converts the text to an embedding, runs it through its learned patterns, and spits out a probability: "I am 87% confident that the patterns in this embedding match the 'AI' patterns I was trained on."
This is a much more powerful approach. It's why modern detectors can sometimes catch AI text even when the sentence length is varied. The classifier is looking at deeper, more abstract properties of the language.
But it has a huge, glaring weakness.
The training data is everything. A classifier is only good at identifying the specific AI models it was trained on. When OpenAI releases GPT-5 next year, every single AI detector on the market will instantly become less accurate. They haven't been trained on its output. They don't know its stylistic quirks.
It's a constant cat-and-mouse game. The AI generators get better, and the detectors have to scramble to get new data and retrain their models. They are always, by definition, one step behind.
Advanced Techniques: Stylometry and Semantic Analysis
Beyond classifiers, some systems try to get even smarter by incorporating principles from linguistics and a field called stylometry.
Stylometry: The Writer's Fingerprint
Stylometry is the statistical analysis of literary style. It's been used for centuries to figure out who wrote anonymous texts. For example, analysts used stylometry to argue that certain Federalist Papers were written by James Madison and not Alexander Hamilton, based on their different preferences for using words like "whilst" versus "while."
AI detectors can use a simplified version of this. They create a "stylistic fingerprint" of a piece of text by measuring dozens of features:
- Vocabulary Richness: How many unique words are used compared to the total number of words? AI models often have a slightly repetitive vocabulary.
- Punctuation Patterns: Does the author use a lot of commas? Semicolons? How often do they use parentheses?
- Function Word Usage: How often does the text use common words like "the," "of," "it," and "is"? The frequency of these words is a surprisingly stable indicator of authorship.
- Sentence Complexity: What's the average number of clauses per sentence? How are they connected?
The detector compares the fingerprint of the submitted text to the typical fingerprints of humans and AIs. If your text has the punctuation habits and vocabulary diversity of a typical GPT-4 output, it gets flagged.
Semantic Analysis: Does This Even Make Sense?
This is the holy grail of detection, and it's also the hardest to do. Semantic analysis tries to go beyond style and look at the actual meaning.
An AI can write a paragraph that is grammatically perfect and stylistically plausible but is complete and utter nonsense. It might confidently state two contradictory facts in the same sentence. Or it might write a beautiful, flowing description of a historical event that never happened. This is often called "hallucination."
A semantic detector would try to:
- Fact-check claims against a knowledge base.
- Analyze logical consistency within the document.
- Look for signs of "shallow" reasoning, where the text just strings together related concepts without forming a coherent argument.
The problem is that building a system that can truly *understand* text is just as hard as building the AI that wrote it in the first place. This approach is more theoretical than practical for most available tools. They might do some basic checks, but they aren't performing deep logical analysis.
Watermarking and SynthID: A Totally Different Approach
So far, all the methods we've discussed are *post-hoc*. They analyze the final text and try to guess its origin. But what if you could build the detection signal right into the content from the start?
That's the idea behind watermarking.
The most famous example is Google DeepMind's SynthID, which works for images. When an AI model like Imagen generates a picture, SynthID can be used to subtly change the values of some of the pixels. The change is completely invisible to the human eye, but a special algorithm can scan the image and detect the hidden pattern. According to Google, this watermark is designed to survive things like compression, cropping, and color adjustments.
It's like a secret signature that proves the image came from a specific AI.
For text, the idea is similar but a bit more abstract. You can't change the "pixels" of a word. But you can influence the AI's word choices. A 2023 paper titled "A Watermark for Large Language Models" proposed a clever scheme for this.
Here's how it works in principle:
- Before generating a word, the LLM has a list of possible next tokens (words or parts of words) and their probabilities.
- Using a secret key, the watermarking algorithm splits this list of tokens into two groups: a "green list" and a "red list."
- The algorithm then slightly increases the probability of choosing tokens from the "green list."
- Over the course of a long text, this results in a statistically significant number of words being chosen from the green list.
To detect the watermark, you just need the secret key. You can then scan the text and see if the number of "green list" words is higher than what you'd expect by random chance. If it is, you can be very confident the text was generated by a watermarked model.
Why this is powerful:
- It's incredibly accurate if the watermark is there.
- It can't be easily removed by simple paraphrasing, because the statistical bias is spread throughout the entire text.
Why it's not the solution (yet):
- Adoption: It only works if the AI companies (OpenAI, Google, Anthropic) agree to build it into their models. So far, they haven't done so for public-facing text models. They are worried it could hurt performance or be used for surveillance.
- It's Opt-In: It can't detect text from any open-source model or any company that chooses not to include a watermark.
- Brittleness: A clever adversary could potentially figure out the watermarking scheme and either remove it or even frame human text by adding a fake watermark.
Watermarking is a promising idea, but it's not a silver bullet. It's a closed-system solution in an open-system world.
Why AI Detectors Fail So Badly
If these tools are so sophisticated, why are they so unreliable? Why do they get it wrong all the time?
There are a few core reasons.
1. The False Positive Problem
This is the big one, especially in education. A false positive is when a detector claims human-written text is AI-generated. A study cited by MIT's Educational Technology Office highlighted the high error rates and the risk of instructors falsely accusing students.
It happens for several reasons:
- Formulaic Writing: If you're following a strict template for an essay (introduction, three body paragraphs, conclusion), your writing might lack the "burstiness" and "perplexity" the detector expects from a human.
- Non-Native Speakers: Research has shown that detectors are more likely to flag text written by non-native English speakers. This is because they may use simpler sentence structures and vocabulary, which statistically resembles AI output. This is a massive fairness and equity issue.
- Technical and Scientific Writing: This type of writing is supposed to be clear, concise, and objective. It is, by design, often low-perplexity. Detectors can't tell the difference between a well-written lab report and a GPT-4 summary.
- Over-Editing: If you use tools like Grammarly to polish your writing to perfection, you might accidentally smooth out the very human "imperfections" that detectors look for.
2. The Ease of Evasion (The False Negative Problem)
It is trivially easy to beat most AI detectors. All you have to do is break the patterns they're looking for.
- Simple Editing: Take the raw AI output and spend five minutes rephrasing a few sentences. Change the word order. Combine a short sentence with a long one. This is often enough to drop the AI score from 99% to under 10%.
- Sophisticated Prompts: You can just ask the AI to write in a way that evades detection. "Write about the history of the Roman Empire, but use a highly variable sentence structure, include some unusual vocabulary, and write in a slightly informal, personal tone." The AI is more than capable of doing this.
- AI Humanizers: This is what our tool, humangpt.io, is designed for. A humanizer is an AI model specifically trained to rewrite AI text in a more human-like style. It systematically increases the perplexity and burstiness, tweaks the vocabulary, and introduces the kinds of stylistic quirks that classifiers are trained to see as "human." It automates the evasion process.
3. The Moving Goalposts
As mentioned before, detectors are always playing catch-up. A detector trained on GPT-3.5 struggles to identify text from GPT-4. A detector trained on GPT-4 will be even less effective against GPT-5 or Claude 4. The generator models are evolving at a blistering pace, and the detector models can't keep up. Their training data is always out of date.
4. The "Mushy Middle" of Hybrid Work
The biggest conceptual failure of AI detectors is that they treat writing as a binary: either 100% human or 100% AI.
That's not how people work anymore.
A student might use ChatGPT to generate an outline, write the first draft themselves, use an AI to rephrase a clunky paragraph, and then edit the whole thing by hand. Is that "AI-written"? Is it "human-written"?
It's both.
No current detector can handle this "mushy middle." They might flag the whole thing as AI, or they might miss it completely. They can't provide a nuanced analysis, because their entire model is based on a false dichotomy. They are the wrong tool for the job of evaluating modern writing processes.
How We Test and Form Our Opinions
We don't just read the marketing copy. Our views on AI detection come from a few places.
First, we read the academic papers. We look at the research from universities like Stanford, Maryland, and MIT that actually tests these models under controlled conditions. This gives us a baseline understanding of their theoretical limits.
Second, we test them constantly. We have subscriptions to all the major detectors like Originality.ai, GPTZero, and Copyleaks. We feed them everything: raw output from GPT-4, Claude, and Llama; text that has been lightly edited by a human; text that has been run through our own humanizer; and, most importantly, text written entirely by our own team. This hands-on experience shows us how they perform in the real world, not just a lab.
Finally, we listen to users. We hear from students who have been falsely accused, freelancers who have been penalized by clients over a bogus AI score, and writers who are just trying to use new tools without getting in trouble. Their stories inform our understanding of the real-world impact of this flawed technology.
A Cheat Sheet: When to Trust an AI Detector (or Not)
Given all the problems, is there ever a time to use an AI detector? Maybe. But you have to use it as a very specific tool with a very specific mindset.
For Teachers and Educators
- When to use it: As a preliminary, private signal that a student's writing style has changed dramatically from their previous work.
- How to use it: Never, ever use the score as proof. If a paper gets a high AI score, use it as a reason to talk to the student. Ask them about their writing process. Ask them to explain a specific paragraph in their own words. Check the document's version history. The detector's score is, at best, a conversation starter. It is not an accusation.
- When NOT to use it: As a final verdict or as the sole basis for an academic misconduct case. The risk of a false positive is too high, and the consequences are too severe.
For SEOs and Content Managers
- When to use it: As a quick, first-pass filter to catch lazy, zero-effort content. If a freelancer submits an article that scores 100% AI, it's a good sign they just copied and pasted it from ChatGPT without even reading it.
- How to use it: As one data point among many. Your own editorial judgment is far more important. Does the article meet the brief? Is it accurate? Is it well-written? Does it have a unique voice? If the answer to those questions is yes, then the AI score is mostly irrelevant. Google doesn't care about AI scores; it cares about helpful, high-quality content.
- When NOT to use it: As a strict pass/fail metric for payment. Many good, human-written SEO articles are formulaic and might trigger a false positive. Penalizing writers for this is unfair and counterproductive.
For Writers and Students
- When to use it: To check your own work if you're worried about false accusations. If you've used AI for brainstorming or editing, running it through a detector can give you a sense of how it might be perceived.
- How to use it: If your score is high, don't panic. It doesn't mean you cheated. It just means your writing fits a certain statistical pattern. Use it as a prompt to revise your work. Add more of your own voice. Break up sentence structures. Add a personal anecdote. Or, use a tool like an AI humanizer to help automate this stylistic polishing.
- When NOT to use it: As a measure of your writing quality. A low AI score does not mean your writing is good. A high AI score does not mean your writing is bad. It's a technical metric, not an editorial one.
Frequently Asked Questions
How do AI detectors actually work under the hood? They use machine learning models trained on huge datasets of human and AI text. They turn your writing into numerical representations (embeddings) and a classifier model predicts the probability of it being AI-written based on learned statistical patterns like perplexity (predictability) and burstiness (sentence variation).
What are perplexity and burstiness again? Perplexity is how surprising your word choices are. AI text is often very predictable (low perplexity). Burstiness is the variation in your sentence lengths. Human writing tends to have more rhythm and mix long and short sentences (high burstiness), while AI text can be very uniform (low burstiness).
Why did my human-written essay get flagged as AI? It's likely a false positive. This happens a lot with clear, concise, or structured writing. If your essay follows a standard five-paragraph format and uses simple language, its statistical profile can look a lot like AI-generated text to a detector that's just looking at patterns. It's a flaw in the detector, not in your writing.
Can AI detectors reliably spot ChatGPT content? Not really. They are best at catching raw, unedited text from models they've been specifically trained on. As soon as you edit the text, or when a new model like GPT-5 comes out, their accuracy plummets. There is no detector that can guarantee detection of all AI-generated content.
What's the deal with AI watermarking? Is it being used? Watermarking embeds a hidden signal directly into the AI's output. Google's SynthID does this for images. For text, it would involve a secret pattern of word choices. It's a powerful idea, but it's not widely used for public text generation models because the companies are hesitant to implement it. So, you can't rely on it.
What specific features do detectors look for in my text? They look at a whole range of things: predictability of words, variation in sentence length, punctuation habits, vocabulary diversity, use of common transition phrases ("In conclusion," "and"), and the overall mathematical "shape" of the text's semantic embedding.
Why are schools like MIT saying AI detectors don't work? Because the stakes are too high in education. A false accusation of cheating can ruin a student's career. Given that detectors have known high error rates, especially for non-native speakers, MIT and other academic institutions have concluded that it's irresponsible to use them as evidence for disciplinary action.
So what are the biggest limitations of these tools? The main limits are: high rates of false positives and false negatives; they are always outdated because new LLMs come out faster than they can be retrained; they can't handle text that is a mix of human and AI writing; and their decision-making process is often a "black box" that can't be explained or audited for bias.
What We'll Never Tell You (Because Other Companies Won't)
Here's the part that most companies in the AI detection or AI humanizing space won't say out loud.
We will never build a perfect AI detector. Nobody will. The fundamental problem is that we are chasing a moving target that is actively trying to look just like us. As AI models get better, they will become statistically indistinguishable from human writing. It's an arms race where the advantage always goes to the generator, not the detector. Anyone who sells you a detector with "99.9% accuracy" is selling you snake oil.
Tools like ours, AI humanizers, are part of the problem for detectors. We are actively creating technology that makes detection harder. We're open about this. Our goal is to give the human user control over their final text, to help them infuse their own voice back into AI-assisted drafts. A side effect of this process is that it breaks the statistical patterns detectors rely on.
This whole game is a bit of a circus. Generators get better, so detectors get better, so humanizers get better, and so on. It will never end.
Because of this, you should never, ever use an AI detection score to make a life-altering decision about someone else. Don't fire a freelancer, fail a student, or reject a candidate based on a probabilistic score from an opaque algorithm. It's not evidence. It's barely a hint.
AI detectors are a flawed solution to a complex problem. They can be a useful signal in a low-stakes workflow, but they are not the arbiters of truth.
Don't believe us? You can test it for yourself. Take some text from ChatGPT, run it through a detector to get a score, and then run it through our humanizer. Check the score again.
You can try the humangpt.io humanizer for free. See what happens.
200 free words a day. No signup needed to try it.
Paste a ChatGPT, Claude, or Gemini draft. See it humanized in seconds. If you decide to upgrade later, Pro is $10/mo for 50,000 words/month.
Try HumanGPT free