Voice preservation in AI writing is finally becoming a measurable craft problem rather than a vibes argument. A recent ACL paper from researchers at the University of Maryland gave the field the first proper methodology for measuring how close AI output actually sits to a writer’s own voice, and the findings are useful in both directions: they tell us where the popular workflow falls short, and they point at what kind of tool can do better.
The study took 81 writers, gave each one an AI-generated draft, and asked them to edit until it sounded authentically like themselves. Then it measured how close the edited result was to the writer’s own real, unedited writing on the same topic.
The editing moved the prose toward the writer’s voice, but it didn’t get all the way there. The edited result sat in a middle zone closer to AI than to the writer’s own writing. The interesting follow-on finding: writers rated their edits as sounding fully like them, even though the numbers said they hadn’t quite gotten there. It’s a useful piece of evidence about how voice works at a layer below conscious attention.
For anyone using AI to write anything voiced, this is good information. The “generate with AI, then edit until it sounds like me” approach does real work, but the research shows it has a ceiling. The encouraging news is that a different kind of AI writing tool, one that holds your voice as a rule the AI follows from the start rather than something to edit toward afterwards, can clear that ceiling.
bookmoth is built around that approach, and we’re using the new methodology to measure and improve it. More on that below. The paper itself deserves its own treatment first.
What did the ACL paper actually find about editing AI writing?
A team at the University of Maryland gave 81 writers AI-generated drafts and asked them to edit those drafts to sound like themselves. Using a way of measuring writing style that works like a mathematical fingerprint, they tracked how close the edited prose got to the writer’s actual voice. The result: closer than the raw AI, but still measurably further from the writer’s voice than what the writer would produce on their own.
The setup was clean. Each participant wrote a piece in their own voice first. They received an AI-generated draft on the same topic. They edited the AI draft until they felt it sounded like themselves. Then the researchers compared the two pieces using style similarity scores, a method that picks up on word choice, sentence rhythm, paragraph shape, and similar patterns.
The result: the edited prose sat between the raw AI and the writer’s own writing, but it sat closer to the AI side. The AI’s flattening effect on voice wasn’t fully reversible through editing.
There was also a separate finding about variety. The edited prose showed less stylistic variation than the writers’ own unedited work. Even when each writer tried hard to put themselves back into the prose, the variation across different writers shrank. The AI’s tendency to smooth everyone toward the same default partly survived the editing pass.
This is a controlled study with proper methodology, presented at ACL 2026 (one of the top conferences in the field), with the code and data published openly on GitHub.
[Source: Baumler, Bao, Nghiem, Yang, Carpuat, Daumé III. arxiv 2604.24444]
Why can't writers tell their AI-edited text still sounds like AI?
Because the AI’s flattening mostly happens at a layer below what writers consciously notice when they edit. Writers fix the obvious AI tells (clunky transitions, generic adjectives, awkward phrasings). They can’t easily see the deeper patterns (sentence rhythm, word frequency, paragraph shape) that actually make their voice theirs.
This is the most uncomfortable finding in the paper, and the one most worth sitting with.
Voice in writing lives at two layers. The surface layer is the one you can see when you read a sentence: specific words, awkward phrasings, generic adjectives, the kind of mistakes that scream “AI-ish” on sight. Writers can fix all of those in an editing pass. Most writers do.
The deep layer is the one you can’t easily see: your average sentence length, how often you use contractions, how your paragraphs open and close, the rhythm of how your sentences run. These patterns form a kind of writing fingerprint. They’re real, they’re measurable, they’re what makes your prose recognisably yours. (It’s the same fingerprint Claude used to identify journalist Kelsey Piper from 125 unpublished words a few weeks back.) They’re also mostly below the layer most writers can inspect on a sentence-by-sentence read.
So when you edit AI output, you fix the surface layer brilliantly. The deep layer drifts toward the AI’s default and stays there. The result reads fine to you. The metrics show your voice still got partially eaten.
This is also why the “I just edit it until it sounds like me” approach feels like it works. The editing pass IS doing real work. You feel satisfied with the result. The voice loss is happening at a layer you can’t see.
Does the "generate then edit" AI writing workflow actually work?
It works partly. The edited prose is closer to your voice than the raw AI output. But it’s still measurably closer to AI than to your own real writing. If you care about voice across a long project, the generate-then-edit approach leaks voice every time you use it, and it leaks more than you can see.
The honest answer is “yes, but.” Editing AI output is better than not editing it. Writers who edit their AI drafts are doing more to keep their voice than writers who just accept the raw output. The paper confirms that.
What the paper also confirms: the editing approach doesn’t fully preserve voice. Some of the voice loss is baked into how AI writing works, and editing can’t reach all of it. What’s left is a hybrid, closer to yours than to AI, but still detectably hybrid.
For short work, like a single paragraph or a one-off social post or a quick email, the partial recovery is probably enough. For long work where your voice is the brand, the partial recovery compounds. Each chapter loses a little, each book loses more, and by the third book you’ve drafted this way, the voice you started with isn’t the voice you have anymore. You can’t tell from the inside.
This is also why the approach is so popular. It feels productive: you’re doing real work fixing real problems, and the output meets your own quality bar. The voice loss is happening at a layer you can’t see, so nothing in the writing process flags it. Nobody is doing anything wrong; the approach is just structurally incomplete.
How can you actually make AI writing sound like you?
The reliable way to make AI writing sound like you isn’t editing the output afterwards. It’s holding your voice in place from the start, by using an AI tool that treats your writing samples as a rule the AI has to follow rather than a description it tries to imitate. The Baumler paper essentially measures why the editing approach falls short; the alternative is to never let the flattening happen in the first place.
There are two practical approaches to making AI writing sound more like you.
The first is generate-then-edit. The popular approach. You generate with whatever AI tool, then edit until the prose feels like yours. The Baumler paper measured this and found it does something but doesn’t get all the way there. What’s left is partial. You can’t tell.
The second is hold-the-voice-from-the-start. The bookmoth approach. Your writing samples get analysed once and turned into a profile of how you write. That profile gets applied as a rule the AI has to follow on every chunk of output, not as a description the AI can drift away from. The AI never gets to default back to its own voice because the rule isn’t in the layer the AI’s defaults can override.
The difference matters because editing-after leaks. Holding-from-the-start doesn’t have to recover anything because the flattening doesn’t fully happen in the first place.
This isn’t a claim that any tool perfectly preserves voice in every situation. Voice preservation across a whole novel is an ongoing engineering problem and no tool is perfect at it. But the approach matters more than the model, and the approach that lines up with what the Baumler paper actually measured is the one that holds voice from the start rather than tries to recover it after.
If you’re using Claude, ChatGPT, Sudowrite, NovelCrafter, or any other prompt-based tool with a generate-then-edit habit, the paper says you’re losing more voice than you can see. The honest move is either to accept the loss (for short writing where it doesn’t matter much) or use a tool built around holding voice from the start (for long writing where it does).
What's the best AI writing tool that keeps your voice intact?
The best AI writing tool for keeping your voice intact is one built around holding voice from the start, not editing it back in afterwards. bookmoth is the tool built for this for novelists and long-form writers. Most other tools (Sudowrite, NovelCrafter, base Claude, ChatGPT) work the generate-then-edit way the Baumler paper just complicated.
This is the practical takeaway. If editing AI output doesn’t fully recover voice and writers can’t tell from the inside, the tools that encourage the generate-then-edit habit are leaking voice from their users without anyone noticing. Most current AI writing tools work this way.
bookmoth was built around the alternative. It reads your existing writing once, builds a profile of how you write across multiple dimensions (sentence rhythm, structural habits, the words you reach for, how your dialogue sounds), and applies that profile as a rule the AI has to follow on every chunk of output. The flattening the Baumler paper measured doesn’t get a chance to fully happen because the AI is held to your voice as it’s writing, not after.
This doesn’t mean bookmoth produces perfect prose with no editing. The AI still makes mistakes, the rule still has edge cases, and human editing is still part of any serious writing process. The difference is what the editing is doing. With bookmoth, you’re editing for content, plot, and craft, not trying to recover voice that’s already gone. The voice is in the output already.
For novelists and long-form non-fiction writers who care about voice across a whole project, this is the alternative approach the Baumler paper essentially points toward without naming.
How does bookmoth keep getting better at preserving your voice?
Voice preservation is the thing bookmoth has been built around from day one. The Baumler methodology is the kind of measurement framework we’ve been working toward, and it’s now embedded in how we build and refine voice profiles. The next three months of the roadmap, from now through early September, are shaped by it.
Here’s what makes the current moment exciting for tools in this category. For most of the time AI writing tools have existed, voice preservation has been a craft argument: builders made claims, users either felt convinced or didn’t, and there was no shared way to settle whether a tool was actually doing what it promised. That’s started to change. The Baumler team and others are publishing methodologies that let writer-tool brands test their own work with the same rigour an academic team would. We’ve been waiting for exactly this.
bookmoth is using these methods. Every voice profile bookmoth builds is becoming measurable: how close does the prose bookmoth produces sit to your own real, unedited writing? Which patterns hold strongest, which drift slightly, where does the profile need to be sharper? The shift is from “we believe this works” to “we can show you the dimensions where it works and where it doesn’t, and we can iterate on the latter.”
This is where bookmoth is going between now and September. The methodology gives us a feedback loop. The feedback loop gives us a measurable target. The measurable target gives us something to ship against. We’ll share what we find publicly.
If you’ve been frustrated by AI writing tools making claims with no way to check them, that frustration is in good company. The good news is that the methods to check are now public, the research community is contributing them, and bookmoth is built for exactly this kind of evidence-led iteration.