Every term, defined.
In plain English.

AI humanization, AI detection, and the writing-craft terms behind both. One paragraph for the answer, a deeper paragraph for the context, and a link to where the term shows up around the rest of the site.

AI detector

Software that scores a passage of text for likelihood of being machine-generated.

An AI detector reads a piece of text and outputs a probability that the passage was written by a large language model rather than a human. The major detectors as of 2026 are GPTZero, Turnitin AI, Originality.ai, Copyleaks, ZeroGPT, Sapling, Winston, and Crossplag. They differ in classifier design, threshold strictness, and which models they were trained against, but they all rely on the same two underlying signals: perplexity and burstiness.

AI humanizer

A tool that rewrites AI-generated text so it reads as if a human wrote it.

An AI humanizer accepts a passage written by a language model and produces an output that preserves the meaning while changing the surface signals detectors look for. Quality humanizers vary perplexity (vocabulary unpredictability) and burstiness (sentence-length variance), strip the AI vocabulary cluster (leverage, transformative, comprehensive, etc.), and match the register of the input. Bad humanizers are dressed-up paraphrasers and tend to fail on the harder detectors.

Bypass rate

The percentage of humanized passages that pass an AI detector as human.

Industry shorthand for the success rate of a humanizer against a specific detector. HumanGPT publishes weekly per-detector bypass rates: roughly 99.6% across the seven major detectors as of mid-2026. Bypass rates fall when a detector ships a model update; we patch and the rate climbs back. There is no humanizer that hits 100% on every passage on every detector simultaneously; anyone claiming so is either lying or has not tested at scale.

Burstiness

The variance in sentence length and complexity across a passage.

Human prose mixes short sentences and long ones; AI prose defaults to a steady middle. GPTZero specifically scores burstiness on a 0-100 scale and treats low burstiness as a strong AI signal. A humanizer increases burstiness by writing fragments next to compound-complex sentences, varying paragraph length, and avoiding the AI tendency to produce three medium-length sentences in a row.

Citation freeze

Locking citations and references so a humanizer does not alter them.

Academic and research writing depends on citations being exactly correct. HumanGPT's Frozen Keywords feature lets you mark author names, dates, page numbers, and direct quotes as untouchable, and the rewriter passes them through unchanged. Without this, a generic humanizer can corrupt a Smith (2019) into a Smyth (2019), which is bad in unrecoverable ways.

Copyleaks

An AI detector trained on billions of pages of human and AI text.

Copyleaks runs a proprietary neural classifier and looks at the whole document at once rather than just sentence-level perplexity. Common in publishing, content agencies, and corporate plagiarism workflows. HumanGPT tests against Copyleaks weekly with a current 99.5% bypass rate.

Crossplag

An AI detector with cross-language semantic analysis.

Crossplag combines AI detection with traditional plagiarism analysis and adds a cross-language layer that flags AI translation patterns. Particularly aggressive on academic and formal text. HumanGPT's multi-pass detector loop checks for cross-language artifacts and rewrites at the sentence-shape level rather than swapping synonyms — which is the pattern Crossplag flags hardest.

Em-dash tell

Frequent use of em-dashes (—) is a strong signal of AI authorship.

Large language models trained on books over-use em-dashes for parenthetical phrasing because their training data was disproportionately dense with that style. Human writers in 2026 rarely use em-dashes; commas, periods, and parentheses are more common. HumanGPT's prompt forbids em-dashes entirely, and the API post-processes output to strip them as a final pass.

Frozen keywords

User-specified terms that pass through a humanizer unchanged.

A list of words or phrases the humanizer must not alter. Used for product names, target SEO keywords, brand taglines, citations, technical jargon, and proper nouns. In the HumanGPT v4 web UI, the per-paste freeze field is not exposed; the REST API accepts a `freeze` array parameter on Pro plans. The web equivalent — preserving your tone end-to-end — is the Brand Voice training feature.

Generative engine optimization (GEO)

Optimizing content to be cited and surfaced by AI answer engines like ChatGPT and Perplexity.

The 2025-2026 successor to traditional SEO. Where SEO targets Google's blue links, GEO targets the citations LLMs make when answering user queries. Tactics overlap (clear definitions, structured data, authoritative sourcing) but emphasize concise factual blocks LLMs can quote verbatim. This glossary itself is a GEO play.

GPTZero

The most-cited AI detector, used by educators since 2023.

GPTZero scores perplexity and burstiness, plus a deeper neural classifier trained on millions of human and AI samples. It outputs a per-sentence and document-level probability, and is the detector students see most often in academic settings. HumanGPT's bypass rate against GPTZero sits around 99.7% as of May 2026.

Hedge words

Phrases like 'probably,' 'in many cases,' 'in my experience.' Humans use them; AI defaults skip them.

Confident, unhedged prose is an AI signal in informal contexts. Real people qualify their statements, especially in conversational writing. A humanizer adds hedges where the register allows (casual, marketing, cover letter, story, general) and avoids them where it doesn't (academic, legal, formal report).

Human score

A detector's reported probability that a passage was written by a human.

On a 0-100 scale: 100 = unmistakably human, 0 = unmistakably AI. HumanGPT-validated samples target 95+ across all seven major detectors. The label varies by detector ('Human Score,' 'Probability Human,' 'Originality Score'), but the underlying number is the same.

Large language model (LLM)

A neural network trained to predict text, large enough to do so well across most domains.

ChatGPT, Claude, Gemini, Llama, Mistral, DeepSeek, and dozens of others. LLMs share the same fundamental tendency: to choose statistically average next-words, which produces low-perplexity prose that detectors flag.

Originality.ai

An AI detector specifically marketed to publishers and content agencies.

Originality runs a strict binary classifier with a hard percentage cutoff (anything over ~60% is flagged AI). It is the detector most freelance content clients run before paying invoices. HumanGPT's Originality bypass rate is 99.4%, tested weekly against fresh Originality model updates.

Paraphraser

A tool that swaps words for synonyms; weaker and more detectable than a humanizer.

Paraphrasers like Quillbot's spinner mode swap individual words for synonyms (utilize → use, demonstrate → show). This kind of edit doesn't change the underlying perplexity or burstiness pattern, so detectors usually still flag the result. A humanizer goes further: restructuring sentences, varying length, removing AI vocabulary clusters, and matching register.

Perplexity

A measure of how predictable a sequence of words is to a language model.

Lower perplexity means the next word was easy to predict; higher perplexity means it was a surprise. AI text has consistently low perplexity because the model picks statistically optimal words. Human text has wider perplexity variance because writers think about tone, rhythm, and personal preference. Most major detectors use perplexity as a primary signal.

Reading level

A scale of text complexity, typically tied to a grade or education level.

HumanGPT supports four reading levels: High School, University, Doctorate, and Journalist. The setting controls vocabulary range, sentence length, and density. Set to High School and outputs avoid SAT vocabulary; set to Doctorate and the rewrite uses the full technical range where appropriate.

Register

The level of formality and the social context a piece of writing is built for.

Academic register is formal and disciplined. Reddit register is casual and personal. Cover-letter register is professional but warm. A humanizer must match the input's register: rewriting an academic essay into a Reddit post is not 'making it more human,' it's destroying the document. HumanGPT's Brand Voice training (Pro) lets you save 2-3 samples of your own writing once, and every rewrite then routes through that register fingerprint.

Sapling

An AI detector with a per-sentence classifier and high false-positive rate.

Sapling flags sentences individually, which makes it good at finding inserted AI paragraphs but prone to flagging dry human writing as AI. HumanGPT handles Sapling by chunking and processing sentence-level rhythm separately.

Speakable schema

Schema.org markup that flags content as appropriate for voice assistants.

When a section of a page is marked with Speakable schema, voice assistants and AI engines can read it aloud as the answer to a query. Pairing concise definitions with Speakable markup is a high-leverage GEO move; AI engines surface those passages disproportionately in voice responses.

Turnitin AI

Turnitin's built-in AI detector, embedded in academic submission workflows.

Turnitin runs an AI score on every submission alongside the traditional plagiarism check. Threshold is conservative; flagged work goes to the instructor for review. HumanGPT's bypass rate against Turnitin is ~99.5%, with a small dip after Turnitin model updates that we typically patch within a week.

Watermark

A statistical fingerprint embedded in AI output to identify it as machine-generated.

Some labs (Google, OpenAI experimentally) have proposed watermarking AI output by biasing the token distribution in detectable ways. As of mid-2026, no production model ships a robust watermark by default, and the watermark schemes proposed so far are easily disrupted by even basic rewriting. HumanGPT-style humanization fully eliminates them.

Winston AI

An AI detector with a tight threshold and strong burstiness sensitivity.

Winston flags content quickly when burstiness is low or sentence structure repeats. It is widely used in plagiarism-aware publishing. HumanGPT's Winston bypass rate is around 99.3% as of May 2026.

ZeroGPT

A free, popular AI detector used heavily by students and casual checkers.

ZeroGPT is fast and free, which makes it the detector most commonly run by curious individuals (including students checking their own work). Particularly sensitive to formulaic openings and closings ('In conclusion,' 'In today's...'). HumanGPT's ZeroGPT bypass rate sits around 99.6%.

Answer engine optimization (AEO)

Optimizing pages to be the source AI assistants quote when answering user questions.

AEO is the practice of structuring content so that ChatGPT, Perplexity, Claude, and Gemini cite your page as the source of an answer. Tactics include leading every section with a 40-60 word definitional answer, using FAQ schema, embedding named statistics with dates, and making every H2 read as a quotable claim. AEO and GEO overlap heavily but AEO is more focused on the assistant's answer surface.

Attention mechanism

The transformer architecture component that lets a model weigh which prior tokens matter for the next prediction.

Introduced in the 2017 'Attention Is All You Need' paper from Google researchers led by Ashish Vaswani. Attention lets a transformer model assign different importance to each prior token when generating the next one. It is the core architectural innovation behind every modern LLM, including ChatGPT, Claude, and Gemini, and the reason these models can hold long context windows in mind without recurrence.

BERT

Google's bidirectional transformer that powered the previous generation of AI text classifiers.

BERT (Bidirectional Encoder Representations from Transformers), released by Google in 2018, was the model behind many early AI detectors before GPT-style LLMs took over both writing and detection. Many academic AI-classification papers from 2020-2023 fine-tuned BERT or its variants (RoBERTa, DistilBERT) on labeled human-vs-AI corpora.

Binary classification

A classifier that outputs one of two labels: in this case, AI or human.

Most AI detectors are binary classifiers that score a passage's probability of belonging to the AI class versus the human class. The output is a single probability between 0 and 1, then thresholded (typically 0.5 or 0.6) into a hard label. Binary classification is the simplest framing of the problem but it loses nuance; mixed human-edited AI text falls in a gray zone these classifiers handle poorly.

BLEU score

A metric for measuring how close a generated text is to a reference text.

BLEU (Bilingual Evaluation Understudy) compares n-gram overlap between machine output and a reference. Originally built for machine translation evaluation, sometimes used in AI detection research to compare generated and ground-truth samples. BLEU above 0.6 typically indicates very similar texts; below 0.2 indicates substantial differences.

BPE encoding

Byte-pair encoding: the tokenization scheme most modern LLMs use to break text into chunks.

BPE splits text into subword tokens based on frequency in the training corpus. Common words become a single token; rare words split into multiple. The choice of tokenizer affects how a model thinks about text, including how it handles non-English languages (often less efficiently) and rare technical jargon. ChatGPT and Claude both use BPE-derived tokenizers.

Chain-of-thought (CoT)

A prompting technique that asks the model to reason step by step before answering.

Adding 'think step by step' or providing intermediate reasoning examples improves LLM performance on math, logic, and complex tasks. CoT is relevant to detection because reasoning-style outputs read differently from the model's default helpful tone, and may produce slightly different perplexity signatures.

Classifier

A machine-learning model that assigns input data to one of a fixed set of labels.

AI detectors are classifiers. Spam filters are classifiers. Image-recognition systems are classifiers. The category being predicted (AI vs human, spam vs ham, cat vs dog) defines the classifier. Most modern AI detectors are deep neural classifiers, often built on RoBERTa or BERT backbones, fine-tuned on a labeled corpus of human and AI text samples.

Confidence threshold

The probability cutoff at which a classifier flips from one label to the other.

An AI detector might output 0.73 (probability of AI) on a passage. Whether that gets labeled 'AI' depends on the detector's confidence threshold. GPTZero typically uses 0.5; Originality.ai uses around 0.6. Lowering the threshold catches more AI but flags more humans (more false positives). Raising it does the opposite. Threshold choice is a policy decision, not a technical one.

Context window

The maximum amount of text an LLM can read and reason over in a single call.

GPT-4 Turbo: 128k tokens. Claude 3.5 Sonnet: 200k. Gemini 1.5 Pro: up to 2M. The context window determines how much input you can paste before older content is dropped or summarized. For humanizer use, context windows beyond 8k tokens are typically unnecessary; longer documents are processed in chunks anyway to maintain consistency.

Cosine similarity

A measure of how close two text embeddings are in semantic space.

Cosine similarity ranges from -1 (opposite) to 1 (identical). In AI detection and humanization work, it's used to verify that a humanized rewrite preserves the original meaning. HumanGPT's pipeline checks cosine similarity between input and output and rejects rewrites where meaning has drifted too far.

Detector consensus

Combining the verdicts of multiple AI detectors into a single, more reliable judgment.

Single detectors disagree wildly; consensus methods average or vote across detectors to produce a more stable signal. HumanGPT uses pessimistic consensus internally: any classifier saying AI flips the verdict. This is harder to beat than any single detector and produces output that holds up across the entire detector landscape.

E-E-A-T

Google's quality framework for ranking content: Experience, Expertise, Authoritativeness, Trustworthiness.

Introduced in the December 2022 update to Google's Search Quality Rater Guidelines, E-E-A-T (originally E-A-T, with Experience added) defines what makes a webpage worth ranking. Author bios, named sources, original first-party data, and dated references all signal E-E-A-T. The framework matters for traditional SEO and increasingly for AI citation: assistants weight E-E-A-T-rich pages more.

Edward Tian

Princeton student who built and shipped GPTZero in January 2023.

Tian, then a senior at Princeton, built GPTZero over a winter break and released it on January 2, 2023. The tool became the first AI detector with public traction and remains one of the most-used in academic settings. Tian later raised seed funding and grew GPTZero into a company with educator-focused dashboards. The origin story is among the most-cited founder narratives in AI detection.

Embedding

A numeric vector representation of a word, sentence, or document.

Embeddings turn text into a high-dimensional vector (typically 384, 768, or 1536 dimensions) where semantic similarity correlates with vector distance. Embeddings power search, retrieval, similarity comparison, and many detector pipelines. OpenAI's text-embedding-3-small and text-embedding-3-large are the current commercial standard.

Enhanced mode

HumanGPT's premium humanization tier with extra rewrite passes for the toughest detector cases.

Available on Pro and Founder plans. Enhanced mode runs additional refinement passes after the initial rewrite, with stricter perplexity targets and stronger burstiness variance. Used for content that has to clear Originality.ai or Turnitin's strict mode. Output is slightly slower but bypass rates are typically 10-15 points higher than standard mode.

F1 score

A combined accuracy metric that balances precision and recall.

F1 = 2 * (precision * recall) / (precision + recall). It rewards detectors that catch real AI without flagging too many humans. Reported AI-detector F1 scores in independent academic tests typically range from 0.6 to 0.85; vendor-claimed scores are usually higher. HumanGPT's internal evaluation uses F1 alongside per-class false positive and false negative rates.

False negative

When a detector says 'human' on text that is actually AI-generated.

From the detector's perspective, a false negative is a miss. From the user's perspective, a false negative on humanized AI is the goal. False negative rates on humanized AI in 2026 across the seven major detectors range from 56% (Originality.ai missed) to 78% (ZeroGPT missed), per HumanGPT's May 2026 500-sample test.

False positive

When a detector says 'AI' on text that a human actually wrote.

The most-contested failure mode of AI detectors. False positive rates in independent testing range from 6% (Turnitin AI on native English) to 47% (ZeroGPT on non-native English). Non-native English speakers, formal academic writers, and STEM students all fall into the false-positive zone disproportionately. The Stanford Zou study from 2023 first quantified this gap at population scale.

Few-shot prompting

Giving an LLM a small number of input/output examples in the prompt to teach it a task.

Few-shot prompts include 2-5 examples of the desired output format before the actual user request. Significantly improves consistency on structured output tasks (JSON, classification, voice mimicry). HumanGPT uses few-shot prompts internally to lock voice profiles to specific cadences and vocabulary ranges.

Fine-tuning

Adapting a pretrained LLM to a specific task or domain by training on additional data.

Fine-tuning takes a base model (Llama, Mistral, GPT-4) and trains it further on a domain-specific dataset (legal documents, medical text, your brand voice). The result preserves general knowledge while specializing behavior. AI detectors are typically fine-tuned classifiers built on RoBERTa or BERT backbones. Custom voice profiles in HumanGPT are achieved through prompt-level instruction rather than fine-tuning, which keeps the engine flexible.

Google March 2024 core update

The Google search update that wiped out sites publishing low-quality AI content at scale.

Rolled out March 5, 2024, this core update was paired with a major spam-policy update and explicitly targeted scaled content abuse. Sites that had been pumping out hundreds of AI-generated articles per day lost 70-90% of their traffic in the following weeks. The update did not penalize AI as a tool; it penalized the use of AI to publish unhelpful content for ranking purposes.

Hallucination

When an LLM generates plausible-sounding text that is factually wrong.

LLMs hallucinate citations (50%+ of ChatGPT-generated academic citations are entirely fabricated), historical facts, statistics, and quotes. The mechanism is statistical: the model picks the most plausible next words, which often happens to be true but does not have to be. Hallucination is a major reason professors catch AI-generated essays: a fake citation is easy to verify and instantly damning.

Helpful Content System

Google's algorithm layer that demotes pages written for search engines rather than humans.

First rolled out August 2022 and integrated into Google's core ranking systems by March 2024. The system identifies pages that lack original information, demonstrate no first-hand experience, are produced at scale to game search rankings, or fail to satisfy the user's underlying intent. AI-generated thin content is one of the most common targets, but the system penalizes content quality, not the tool used to make it.

Instruction tuning

Training a base LLM to follow natural-language instructions rather than just continuing text.

Instruction-tuned models (Claude, ChatGPT, Gemini) follow commands like 'summarize this' or 'write me a poem' because they were trained on instruction-response pairs after the base pretraining phase. Pre-instruction-tuned base models (raw Llama base) just continue text and are much harder to use directly. Most consumer-facing LLMs in 2026 are instruction-tuned.

Jon Gillham

Founder of Originality.ai, launched in 2022.

Gillham launched Originality.ai in 2022 specifically targeting publishers, content agencies, and freelance content workflows. The tool became the strictest of the consumer AI detectors and remains the hardest to bypass cleanly. Gillham has been a vocal industry voice on detector accuracy and limitations, including publishing periodic accuracy benchmarks against his own product.

Key takeaways

A short bulleted summary at the top of an article that AI assistants extract preferentially.

Three to five-bullet summaries near the top of an article get quoted by ChatGPT, Perplexity, and Claude at much higher rates than buried claims. The format mimics how AI engines structure their own answers. HumanGPT articles include a Key takeaways block right after the lead paragraph as a deliberate AEO move.

llms.txt

A standard file at site root that tells AI crawlers and assistants what your site is about.

Proposed by Jeremy Howard in September 2024, llms.txt is the AI-era equivalent of robots.txt. It lives at /llms.txt, contains a markdown-format summary of the site, key pages, key facts, and policy. AI assistants (Anthropic, Perplexity, OpenAI) preferentially ingest llms.txt content when surfacing the site in answers. HumanGPT publishes both /llms.txt and /llms-full.txt with embedded full-content sections.

Logit

The raw, unnormalized output score a neural network produces before softmax conversion.

When an LLM picks the next token, it first produces a logit for every possible token in the vocabulary, then applies softmax to convert logits into probabilities. Detector classifiers also produce logits internally, and the choice of threshold (logit > 0 vs probability > 0.5) is mathematically equivalent but worded differently in different papers and tools.

MLA AI guidance 2024

The Modern Language Association's 2024 guidance against using AI detector scores as standalone evidence.

Published in 2024, the MLA guidance, co-signed by major writing-program directors, advised faculty against treating AI detector scores as sufficient proof of AI use in academic integrity cases. The guidance cited high false positive rates, particularly on non-native English writing, and recommended process-based assessment instead. Several universities cited it when turning off detector-based enforcement.

Multi-pass loop

HumanGPT's core architecture: rewrite, detect, rewrite again until detector consensus says human.

Single-pass humanization gets caught by strict detectors. HumanGPT's loop runs the rewrite, scores the output through pessimistic detector consensus, and re-rewrites with feedback if any detector flags. Free tier runs one pass; Pro runs up to two; Founder runs three with Gemini 2.5 Pro on the final pass. The loop converges in 1-3 iterations on most inputs.

N-gram

A consecutive sequence of N words; the most basic statistical signal AI detectors use.

Bigrams (2 words), trigrams (3 words), and longer sequences. AI text has statistically distinctive n-gram distributions: 'in conclusion', 'it is important to note', 'multifaceted approach'. Detectors look at how heavily a passage relies on the AI-typical n-gram cluster. Stripping these clusters is one of the fastest ways to drop a detector score by 10-20 points.

OpenAI text classifier (deprecated)

OpenAI's own AI text detector, released January 2023 and shut down July 2023 due to low accuracy.

OpenAI built and released its own AI Text Classifier on January 31, 2023, billed as a tool to help users identify AI-generated text. It was discontinued just six months later, on July 20, 2023, with OpenAI citing 'low rate of accuracy.' The shutdown was a striking signal about the difficulty of the detection problem: even the company that built the AI could not reliably detect its own output.

Pessimistic consensus

A detector consensus rule where any single 'AI' verdict flips the overall judgment to AI.

HumanGPT's internal verdict logic. We run several signals in parallel (perplexity stat, RoBERTa classifier, secondary LLM judgment) and any one of them saying 'AI' flips our verdict. This is intentionally harder to beat than a simple average. Output that clears pessimistic consensus tends to clear all of the seven major external detectors as well.

Precision

Of all the cases a detector flagged as AI, how many actually were AI.

Precision = true positives / (true positives + false positives). High precision means the detector rarely flags innocent humans. Originality.ai trades a slightly lower recall for higher precision; ZeroGPT trades higher recall for lower precision. For academic integrity use, precision matters more than recall, because falsely accusing innocents is the costlier error.

Prompt engineering

The craft of structuring an LLM input to produce reliable, useful, format-correct output.

Specific techniques include role assignment ('Act as a professor'), few-shot examples, structured output instructions ('Return strict JSON'), banned-word lists, and explicit constraints. Prompt engineering can reduce AI-detector flag rates by 40-60% before any humanization pass; a well-engineered prompt produces less robotic output in the first place.

Recall

Of all the AI text in a corpus, how much the detector caught.

Recall = true positives / (true positives + false negatives). High recall means the detector misses very little real AI. ZeroGPT has high recall but low precision; Originality.ai has slightly lower recall but higher precision. Detector marketing leans on recall numbers because they sound impressive in isolation. Precision is what protects users from false accusations.

RLHF

Reinforcement Learning from Human Feedback: the post-training step that makes LLMs feel polite and helpful.

RLHF takes a pretrained, instruction-tuned LLM and further trains it on human preferences. Trainers rank multiple model outputs; the model learns to produce outputs humans prefer. RLHF is why ChatGPT sounds friendly and avoids most harmful content. It also produces a particular RLHF voice (over-helpful, slightly hedged, Reddit-thread-like) that is itself a detector signal.

RoBERTa classifier

Facebook's improved BERT variant, the most common backbone for modern AI detectors.

RoBERTa (Robustly Optimized BERT Approach) is a 2019 improvement on BERT from Facebook AI Research. Many production AI detectors are RoBERTa fine-tuned on labeled human-vs-AI corpora. HumanGPT uses an open-source transformer-based classifier as one signal in its internal consensus, alongside perplexity statistics and LLM-based judgments.

Schema.org

A vocabulary of structured-data tags that helps search engines and AI assistants understand page content.

Co-developed by Google, Microsoft, Yahoo, and Yandex starting 2011. Schema types relevant to HumanGPT include Article, BlogPosting, FAQPage, HowTo, Person, Organization, DefinedTerm, and SoftwareApplication. AI assistants weight schema-marked content higher because the structure makes it unambiguous. Every page on humangpt.io ships with at least three schema blocks.

Semantic similarity

How close two pieces of text are in meaning, independent of exact wording.

Measured via embedding distance (typically cosine similarity). A humanizer that scores high semantic similarity to its input is preserving meaning; one that scores low has changed what the text says. HumanGPT's pipeline checks semantic similarity between input and output as a quality gate, rejecting rewrites that drift too far.

Softmax

The math function that converts raw model scores into a probability distribution.

Softmax takes a vector of logits and outputs a vector of probabilities that sum to 1. Used at the final layer of every classifier and at every token-prediction step inside an LLM. Temperature controls how 'peaky' the softmax distribution is: low temperature concentrates probability on the top choice, high temperature spreads it across alternatives.

Stanford 2023 detector study

The peer-reviewed Stanford study that documented systematic detector bias against non-native English speakers.

Published in 2023 by James Zou and colleagues at Stanford, the study tested seven popular AI detectors on essays written by non-native English speakers and showed that more than half of the human-written essays from this group were falsely flagged as AI. The study became the most-cited evidence in academic integrity cases involving non-native students and contributed directly to several universities turning off detector-based enforcement.

Statistical detector

An AI detector that scores text using only statistical features (perplexity, burstiness, n-gram).

Distinguished from neural-classifier detectors. Statistical detectors are simpler, faster, and easier to bypass. They are often used in free or low-cost tools (early ZeroGPT, basic GPTZero). Modern detectors typically combine statistical features with a neural classifier on top to improve accuracy.

System prompt

The hidden instruction that defines an LLM's persona and constraints before the user message arrives.

System prompts are typically baked in by the application developer. ChatGPT's system prompt makes it act as a helpful assistant; HumanGPT's system prompt instructs the model to humanize input while preserving meaning, varying burstiness, and forbidding banned-word clusters. Users do not see the system prompt but it determines almost everything about how the model behaves.

Temperature

An LLM sampling parameter that controls how predictable or creative the output is.

Temperature 0 makes the model deterministic and produces the most likely next token every time. Temperature 1 introduces randomness; the model picks from the probability distribution. Temperature 1.5+ produces creative, sometimes nonsensical, output. HumanGPT uses temperature 0.8-0.95 on rewrite passes to preserve voice variance, and lower (0.3-0.5) on final-pass quality checks.

Tokenizer

The component that breaks raw text into the chunks an LLM actually processes.

Most LLMs do not operate on words but on subword tokens. The tokenizer (typically BPE-derived) splits input into a sequence of token IDs, and the model predicts the next token ID. Tokenizer choice affects everything downstream: how efficiently the model handles non-English languages, how it sees rare technical terms, and how cost (per-token API pricing) scales for different inputs.

Top-p sampling

Nucleus sampling: sampling from only the top tokens whose cumulative probability exceeds p.

Top-p (typically 0.9 or 0.95) trims the long tail of unlikely tokens but lets the model choose freely from the high-probability cluster. Combined with temperature, it controls how creative and diverse the output is. HumanGPT uses top-p around 0.92 on rewrite passes to allow vocabulary variance without producing outright weird tokens.

Top-k sampling

Picking the next token only from the k most likely candidates.

Top-k (typically 40-100) is an alternative to top-p that restricts sampling to a fixed number of candidates. Less common in modern practice; top-p has mostly replaced it in production LLMs. Both control output diversity and are usually paired with temperature.

Transformer architecture

The neural network design behind every modern LLM, introduced by Google in 2017.

The 2017 paper 'Attention Is All You Need' introduced the transformer, replacing the recurrent neural networks (LSTM, GRU) that had dominated NLP. Transformers process all tokens in parallel using attention to weigh relationships between them. Every LLM in production in 2026, from GPT-4 to Claude to Gemini to Llama, is a transformer derivative.

Vanderbilt detector turnoff

Vanderbilt University's August 2023 decision to disable Turnitin's AI detection feature.

Vanderbilt was one of the first major US universities to publicly turn off Turnitin's AI detection, citing concerns about reliability and false positives. The announcement, from the Vanderbilt provost's office, became a touchstone in faculty conversations about whether to use detector scores at all. The University of Texas at Austin, Northwestern, and several Cal State campuses followed in subsequent months.

Brand Voice training

A Pro-tier feature where a user pastes 2-3 samples of their own writing, and every future humanize routes through that voice fingerprint.

Brand Voice training (Pro plan, on the /account dashboard) lets a user save samples of their own writing once. The tool extracts register, contractions, em-dash habits, banned words, and signature phrases, then applies that fingerprint to every Pro humanize. The output preserves the input's meaning while reading in the user's own voice instead of a generic 'humanized' tone. Train once, reuse forever.

Watermarking

Embedding a statistical fingerprint in AI output so it can be identified later.

Proposed by researchers at the University of Maryland (Kirchenbauer et al., 2023) and others. Watermarking biases the model's token distribution toward a specific pseudorandom subset, leaving a detectable pattern. The promise: reliable detection without false positives. The reality (so far): every published watermarking scheme can be stripped by a single rewriting pass through any humanizer or even another LLM, which is why it has not been deployed at scale by 2026.

Zero-shot prompting

Asking an LLM to do a task with no examples, just an instruction.

'Translate this to French.' 'Summarize this article.' 'Rewrite this in a friendly tone.' Modern instruction-tuned LLMs handle zero-shot tasks well, which is why most consumer ChatGPT use is zero-shot. For specialized tasks (voice mimicry, niche format compliance, structured JSON output), few-shot prompting still outperforms zero-shot.