← Journal
Essay July 2, 2026 8 min read

The AI writing tells you can't edit out: what a study of 61,608 stories found

Writers using AI have learned to hunt the surface tells: the em dashes, the word “delve,” the tidy little triplets. A new University of Maryland and Google DeepMind study of 61,608 stories found the machine is visible somewhere deeper, in the shape of the story itself. And editing the prose barely touches it.

Quick answer
AI-generated fiction can be identified 93% of the time from narrative structure alone, with every stylistic cue stripped out. The tells are structural: themes explained outright, single-track plots with no subplots, emotion rendered as body sensations, quiet over-resolved endings. Editing the prose afterwards barely moves detection (95.5% to 93.9%). The tells are decisions, not words, so they only disappear when the writer makes the decisions. That is the workflow bookmoth is built around: your plan and your voice, with the model drafting inside both.

There is a genre of advice thread that appears daily wherever writers use AI: how to humanise the output. Kill the em dashes. Ban “delve” and “tapestry.” Break up the rule-of-three sentences. Run it through a rewriter. The advice works, narrowly. Surface style is the easiest thing to change, and the newest models are already scrubbing their own tics. GPT’s latest release cut its em dash habit; the tells writers police hardest are the ones evaporating on their own.

Which makes the new research uncomfortable. In April 2026, a team at the University of Maryland and Google DeepMind released StoryScope, the largest study yet of what AI fiction is actually like underneath the sentences. Their finding, in one line: you can take every stylistic cue away, and AI-generated stories are still identifiable 93.2% of the time, because the machine writes a different kind of story.

What did the study actually test?

The researchers compared 10,272 human-written stories against versions of the same premises written by five current AI models, then deliberately blinded their analysis to style.

The models were Claude Sonnet 4.6, GPT-5.4, Gemini 3 Flash, DeepSeek V3.2, and Kimi K2.5: 61,608 stories in total, averaging around 4,800 words. Instead of reading the prose, the pipeline converted every story into a structured map of its narrative decisions across ten dimensions: plot, character agency, temporal structure, setting, how information is revealed, and so on. Sentence rhythm, word choice, and figurative language were withheld on purpose.

On narrative features alone, a classifier told human from machine with a macro-F1 of 93.2%, nearly matching detectors that read the raw text. It could also tell which model wrote a story 68.4% of the time from structure alone. Five models, one shared region of story-space, each with its own fingerprint inside it.

The timing matters. In March 2026, Hachette pulled a commercially published horror novel after it was flagged as roughly 78% AI-generated, a first for a major publisher. One detection firm found nearly 20% of a 14,000-novel sample of self-published Amazon books was largely AI-written, up 41% year over year. Readers, agents, and publishers are all learning to smell this. The question of what exactly they are smelling now has data.

What are the tells of AI fiction?

The study distilled 30 core features that separate human stories from AI stories across all five models. They read like a craft workshop's list of notes.

If you have read an AI draft of your own chapter and felt a competent stranger at the wheel, this list is probably why. It is the same flattening I wrote about in AI doesn’t just change your style, it overwrites your voice, one level up: not your sentences this time, your storytelling.

Does every AI model write fiction the same way?

No, and this is the study’s most quotable finding. Each model has a house style distinct enough that a classifier can attribute a story to its author-model two times out of three from structure alone.

The researchers’ own headings are hard to improve on. Claude keeps it cool: the most distinctive of the five, defined by restraint. Its event intensity escalates less than any other source, it takes a reverent, continuist approach to literary tradition (62% of its stories, versus 39 to 56% elsewhere), favours epilogues, avoids dream sequences, and prefers a quiet ending to an avalanche. GPT likes to gossip: rumour and hearsay drive the plot in 64% of its stories, casts are ensemble-heavy at human levels, events are framed in retrospect from years later, and it subverts expectations more than any other model. Gemini writes the tidiest endings, the longest denouements, and the bleakest worlds: 88% of its settings were tagged bleak and oppressive. DeepSeek front-loads context that other sources withhold. Kimi has the fewest quirks of all, which is its own kind of tell: it sits at the generic centre of the AI distribution.

Two things follow. First, if you draft with one model exclusively, your manuscript inherits its house style at the structural level, not just the sentence level. Second, none of them escape the cluster. In the study’s feature space, the distance from the human centre to the AI cluster is 1.6 times the distance between any two models. The models are more like each other than any of them is like a person.

Why can’t you edit the tells out?

Because editing operates on sentences, and the tells live in decisions. The researchers tested exactly this, and the result should end the “humanizer” industry’s claims about fiction.

They took AI-generated stories and rewrote them with a span-level editing framework built from professional writers’ edits of AI text, targeting seven categories of artifact: cliché, redundant exposition, purple prose, the surface layer writers are told to police. Detection on the edited stories dropped from 95.5% to 93.9%. A point and a half. The narrative classifier barely noticed, because nothing it measures had changed. The story still explained its theme, still ran on one track, still resolved through the protagonist’s neat internal acceptance.

To remove structural tells you have to restructure: add the subplot, break the timeline, cut the moral, unresolve the ending. At which point you are not cleaning up an AI story. You are writing a story, using the machine for the labour in between. The study quietly proves that the only reliable “humanizer” is a human making the narrative choices.

What does this mean if you write with AI?

It means the important question about any AI writing workflow is: who is making the story decisions? Everything measured in this paper is a decision. Whoever makes them is the author the classifier detects.

Note what the study measured: models handed a premise and asked to write the whole story. Autonomous AI fiction. The prompt-and-pray workflow. That is the thing readers are learning to identify, and that publishers are starting to pull from shelves.

The alternative workflow inverts it. The writer decides the structure: which chapter holds which turn, what stays unresolved, which thread runs underneath, what the reader is trusted to infer. The model drafts inside those decisions, and a voice constraint built from the writer’s own prose governs the sentence level. Then the structure being measured is yours, because it came from you.

This is the architecture bookmoth is built on, and this research is why I keep insisting the distinction is not marketing. Your story plan is the spine: chapters and scenes draft to your beats, not to the model’s preferred single track. Your voice profile is a binding constraint, derived from your prose, applied to every generation. And the drafting rules push directly against the reflexes in this paper’s list: trust the reader’s memory, do not moralise the theme, do not render every feeling as a tightening chest. The tells the study found at the sentence level, we were already fighting. The tells it found at the story level are the reason the writer, not the model, owns the plan.

The honest summary: the em dashes were never the problem. They were the visible edge of a deeper pattern, and the deeper pattern is that a model left to tell a story tells its own, in its own house style, more like every other model than like any person who ever wrote. If your name is going on the cover, the decisions need to be yours. The machine can carry water. It cannot be trusted with the architecture, and now there are 61,608 stories’ worth of evidence saying so. If you want to see what your own patterns look like from the outside, start with how to find your writing voice, or let the voice tool read a few pages of your prose.

Common questions about AI writing tells

How can you tell if a story was written by AI?
As of 2026, the most reliable signals are structural, not stylistic. A University of Maryland and Google DeepMind study of 61,608 stories (StoryScope) identified AI-generated fiction 93.2% of the time using narrative features alone: the narrator states the theme outright (77% of AI stories vs 52% of human ones), plots run on a single tidy track with no subplots (79% vs 57%), emotion is rendered as body sensations rather than named feelings (81% vs 38%), and endings resolve through the protagonist’s neat internal acceptance. Surface tells like em dashes and words like “delve” still exist but are increasingly removed by newer models, while the structural tells persist.
Can you edit AI writing so it doesn’t look AI-generated?
Not by editing the prose. The StoryScope researchers took AI-generated stories and rewrote them with a span-level editing framework built from professional writers’ edits, removing cliché, redundant exposition, and purple prose. Detection barely moved: 95.5% before editing, 93.9% after. The classifier is not reading sentences, it is reading narrative decisions such as causal linearity, thematic over-explanation, and sensory over-description. Removing those requires structurally rewriting the story, at which point the human is making the narrative decisions, which is the actual fix.
Do all AI models write fiction the same way?
No. Each model has a measurable house style. In the StoryScope study, Claude was the most distinctive: restrained, with the flattest event escalation, a reverence for literary tradition, epilogues, and quiet endings. GPT builds plots on gossip and rumour (64% of its stories) with ensemble casts and retrospective framing. Gemini writes the tidiest endings and the bleakest settings (88% tagged bleak). DeepSeek front-loads crucial context. Kimi has the fewest quirks, sitting at the generic centre of the AI distribution. But all five cluster together in narrative space, well away from human authors: the models are more like each other than any of them is like a person.
What makes human-written stories different from AI stories?
Human stories are messier and rarer. They use more subplots, time jumps, flashbacks, and nonlinear structure, are more comfortable with ambiguous endings, and feature morally ambivalent protagonists more often (59% vs 38%). Humans name real books, authors, and places at nearly double the AI rate, and are far more willing to simply state a feeling (29% vs 8%) instead of always rendering it as a tightening chest. Statistically, human stories occupy a wider, rarer region of narrative space: 24.7% of human stories fall in the rarest 10% of all stories in the study, versus 7.1% of AI stories.
How do you use AI for a novel without the AI tells?
Keep the narrative decisions human and constrain the sentence level to your own voice. The tells the research found are choices: what gets explained versus trusted to the reader, whether a subplot exists, how an ending resolves. If those choices come from the writer’s own story plan and the model drafts inside them, the structure being measured is human. That is the architecture bookmoth uses: the writer’s chapter and scene plan is the spine, a voice profile built from the writer’s own prose governs the sentence level, and drafting rules push against the known tells, like moralising the theme and over-writing body sensations.
A note from the person behind bookmoth
The machine can carry water. The architecture should be yours.
bookmoth was built on the bet this paper just confirmed: what makes writing yours is not a style setting, it is the decisions. In bookmoth your story plan is the structure the AI drafts inside, and your voice, derived from your own prose, is a binding constraint on every sentence. One purchase, your own API key, your words on your machine. If you want to know what your voice actually looks like on paper first, the voice tool will read a few pages and show you.
See a portrait of your voice →