A Newbie's Guide to Publishing: Anthropic and the Future of Copyright

Over two years ago I wrote a blog post about AI. Specifically about Large Language Models that have been trained on pirated novels, and the resulting class action lawsuit, Bartz v Anthropic.

Since writing that post my views have changed.

Part of my argument can be summed up as: If AI isn't stealing verbatim, and ideas cannot be copyrighted, where is the copyright infringement?

I still mostly agree with that statement. Apparently, so did the court. Here's an AI summary:

Bartz v. Anthropic (3:24-cv-05417, N.D. Cal.) was a class action copyright infringement lawsuit filed in August 2024 by authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson against AI company Anthropic PBC. The plaintiffs alleged that Anthropic used pirated copies of their books, obtained from "shadow libraries" like LibGen and PiLiMi, to train its Claude large language models (LLMs) without permission. The case tested the application of fair use doctrine under Section 107 of the Copyright Act to AI training data.

U.S. District Judge William Alsup granted partial summary judgment in favor of Anthropic on fair use grounds for certain activities, while denying it for others.

Fair Use for LLM Training: The court ruled that using copyrighted books to train LLMs was "spectacularly" transformative and constituted fair use, as the process analyzes statistical relationships in text to enable generation of new, original content without reproducing the originals. This was analogized to human learning, emphasizing that copyright does not protect ideas, methods, or concepts. No market harm was found, as there was no evidence of infringing outputs from Claude mimicking or reproducing the plaintiffs' works.
Fair Use for Digitization of Purchased Books: Anthropic's scanning of lawfully purchased print books (involving destructive processes like removing bindings) to create internal digital copies was deemed fair use, as it was a transformative format shift for storage and searchability without distribution or increasing the number of copies. The court found no market harm, distinguishing it from unauthorized duplication.
No Fair Use for Pirated Copies: However, the acquisition and retention of pirated digital copies to build a central library was not fair use and constituted infringement, even if later used for training. This was non-transformative, displaced the market for authorized sales, and was unnecessary given lawful alternatives. Judge Alsup expressed skepticism that subsequent fair use could justify initial piracy, noting lawful acquisition as essential.

Joe sez: Seems like Judge Alsup mostly agreed with me; ideas can't be copyrighted, and books absorbed and used to generate new ideas are transformative.
But Judge Alsup said that training AI on pirated books is not fair use when there are lawful alternatives for acquisition other than stealing.
Currently there is the option for authors to join this class action and receive $3000 for each work of theirs that was pirated and used.
As I understand this, it's compensation for the stealing, but not for the actual use of copyrighted IPs, or what that use entails.
This reminds me of the Napster days, where record companies sued kids who downloaded songs for free and the court ordered them to pay vast sums of money for each infringement.
Punitive damages for stealing.
But what's happening here is something very different.
What every LLM has done by training on the IP of fiction writers is incorporate those millions of words into their own programming.

This isn't about using my work to generate transformative ideas.

This is about using my work to create a program that can generate transformative ideas.

Big difference there.

LLMs aren't basing works on my IP.

LLMs are able to make money because my IP helped to build them.
Just as cinnamon is an ingredient used to make cinnamon rolls, intellectual property was used as an ingredient to improve AI.

You can't take the cinnamon out of the roll after it has been baked.

While the roll may be a transformative work that uses cinnamon, the recipe cannot exist without the cinnamon.

And if the cinnamon is stolen, every time a roll is sold, the original owner of the cinnamon should be paid.

You cannot have AI in its current state without accounting for the IP it stole to get to its current state. You cannot get that genie back in the bottle.
This isn't about punitive damages for stealing a work.
This is about a trillion dollar industry that cannot exist without having trained on the IPs this industry has stolen.
No cinnamon, no cinnamon rolls to sell.

No training on IP, no AI to sell.
The fact that the IP is stolen seems inconsequential. Legally paying $7.99 for a paperback, scanning it, and then training an LLM on it goes beyond fair use for a few reasons.
First, because of scale. I learned to write mysteries by reading and imitating (without stealing from) writers like Robert B Parker, Ed McBain, John D MacDonald. But I am only one person who can only read so many books.
LLMs have devoured millions of books, and have taken what they learned to make money.

This cannot be covered by the current legal definitions of copyright and fair use. New definitions must be created.

Second, because of speed. I have a limit of how quickly I can read and write. AI can read and write millions of words in a fraction of the time. A machine with an unlimited appetite for devouring and learning from IP is unprecedented, and fair use cannot apply. It's simply not fair at all.

Again, current legal definitions must be refined. As technology improves, laws must keep up.

Third, because of how AI learns, it needs massive datasets to ingest and process in order to spot patterns. Once patterns are learned, LLMs are able to predict what comes next in a sequence.
Books are narratives that creatively answer the "what comes next" question. AI needs cogent, modern, professional books to learn from. Books that have been expertly written, vetted, edited, composed.
Books are the cinnamon in the cinnamon roll.
AI didn't create its own version or facsimile of cinnamon. It isn't taking the "idea" of cinnamon. It outright stole the cinnamon, and continues to use that stolen cinnamon, and cannot be separated from the stolen cinnamon.
LLMs have stolen more than 200 of my books, and they've learned from them. They've gotten so good at imitating my writing style that fans will soon be able to have their favorite AI create books that are identical to mine in tone and quality.

Because my IP has been absorbed by AI, every question asked by a paying user gets an answer that has a tiny bit of me in it; my style, my jokes, my tone.
This isn't analogous to downloading an mp3 file of a Metallica song without paying. Stealing a song from Metallica doesn't allow the thief to instantly write a song that sounds exactly like Metallica.
This is more like The Stepford Wives, and writers are non-consensually training their own replacements.

It's the stealing of brand secrets. It's the outright theft of the individually crafted blueprints of how thoughts and ideas are uniquely crafted. It's stealing our voices, and copying how our brains work, and being able to replicate not only our ideas but the unique expression and implementation of those ideas, without permission.

Permission is the key here. If you want to make a Virtual J.A. Konrath, surely that should only be done with my permission? If Hollywood can't steal Crispin Glover's face, why should LLMs be able to steal my life's work?

These billion dollar companies are able to make billions because they illegally downloaded and absorbed millions of books and continually use that information to perform tasks. They cannot start over from scratch without this stolen data. My IP is now part of every LLM's programming that trained on my books. My trade secrets have been stolen.

Should writers be compensated? Absolutely.

If I were to license my IP--and all of the work and knowledge and skill that went into creating a body of work that constitutes several million published words--I would ask for consulting and usage fees much higher than $3k per book.
You want my land to build your empire? Pay me for the land. You can't build without my permission and then offer me pennies. Especially when your empire is worth billions.
So I don't agree with Judge Alsup that this is a copyright infringement only because of piracy.

This is a copright infringement because my life's work--my brand--was taken by you without my permission and used to build your empire. Even if the LLMs each bought one copy of every book I've written, the infringement remains. Reading one of my books, whether you buy it or not, does not mean you have the right to use it any way you want to.

This reminds me of the Henrietta Lacks lawsuit. As summed up by AI:

Henrietta Lacks was an African American woman who died of cervical cancer in 1951 at Johns Hopkins Hospital in Baltimore, Maryland. During her treatment, doctors took samples of her tumor cells without her knowledge or consent, a common practice at the time, especially for Black patients under segregation-era policies. These cells, later named HeLa (from her name), were the first human cells to reproduce indefinitely in a lab, revolutionizing medical research. They've contributed to breakthroughs like the polio vaccine, cancer treatments, gene mapping, and COVID-19 vaccines, generating billions in profits for biotech companies. However, Lacks' family was unaware of this until the 1970s and received no compensation or recognition for decades.

Henrietta did not give informed consent for the use of her cells. The family sued. A settlement was reached.

The permission LLMs need to legally learn from an IP encompasses more than buying a copy.

If LLMs only use 0.000006% of the overall training data from my books (that's an AI estimate) that will still result in an untold number of answers for paying users that benefit from my work when I gave no permission. I did not enter into a contract with these LLMs for this specific kind of use. I do not receive royalties every time my work is used by LLMs.

This goes beyond current US Copyright Law.

The recent Bartz v. Anthropic settlement addressed claims related to works under registered copyright. However, this leaves a vast number of unprotected creators; hundreds of thousands of self-published works that were never formally registered but were nonetheless pirated and incorporated into training datasets such as LibGen, Book3, and the Pirate Library Mirror.
These datasets have been utilized by virtually every major LLM, fueling the growth of multi-billion dollar AI empires at the expense--and without permission from--original creators.
In my own case, I have authored over 200 works that have been pirated and used in AI training, yet only a dozen or so were registered for copyright. This situation highlights a critical barrier: the requirement to overturn the Supreme Court's decision in Fourth Estate Public Benefit Corp. v. Wall-Street.com, LLC (2019), which mandates copyright registration as a prerequisite for filing infringement lawsuits.
I contend that this ruling is unconstitutional.
Copyright protection is granted automatically upon creation, yet the registration process imposes lengthy and costly government hurdles that effectively deny access to justice. Fiction IP ownership can be readily proven through ISBNs, ASINs, and other verifiable records. I can demonstrate authorship of my books, their piracy, and their unauthorized use by LLMs. Denying the ability to litigate without registration amounts to the U.S. government restricting citizens' rights, creating unequal protection based on one's ability to pay fees or jump through bureaucratic hoops.
In essence, I own intellectual property but am barred from defending it legally.
The Fourth Estate decision violates multiple constitutional provisions, including:
The Intellectual Property Clause (Article I, Section 8, Clause 8), by undermining the promotion of science and useful arts;
The Petition Clause of the First Amendment, by restricting the right to seek redress;
Due Process under the Fifth Amendment, by denying fair legal recourse;
Equal Protection under the Fifth Amendment, by discriminating against those unable to afford registration;

The Ninth Amendment, on numerous grounds related to unenumerated rights;
The Seventh Amendment, by limiting access to civil jury trials;
Article III, Section 2, by constraining judicial power over cases and controversies.
As a fiction writer whose sole income derives from my creative works, I am directly harmed by Big Tech's theft, yet I am unable to sue due to this archaic registration mandate. This case represents an opportunity to champion the rights of indie authors nationwide, potentially reshaping copyright law in the AI age.
But it gets worse for me.
Trade secrets require no copyright registration at all and are enforced through confidentiality and misappropriation laws, and LLMs have stolen and can reproduce my trade secrets.
Ask an AI to write a book in the style of J.A. Konrath (me) and it can, using a process that relies on unauthorized use of protected material at multiple stages. Modern generative AI models, such as those for text generation, are trained on massive datasets that often include books like mine scraped from pirate sources without permission or licensing.
This goes beyond ideas and plots and settings and characters. These LLMs have stolen--and can reproduce on demand--my style, my tone, my humor, my pacing; everything that makes my books unique to me and to my readers.
This training involves reproducing and analyzing my works to extract patterns. All LLMs train on pattern recognition and mimicry. This greatly differs from human imitation, where a writer might legally read my books and create original content inspired by my approach, by utilizing systematic, unauthorized data ingestion at massive scales.
This isn't analogous to being fired from an assembly line and replaced by a robot.
I am now forced to unfairly compete, which violates Section 5 of the Federal Trade Commission Act, with any user who asks AI to create a work in my style. It's like stealing my secret recipe and publishing it on the Internet. If a reader asks an AI to write a JA Konrath book, AI can do it as well as I can. Why should readers buy my books? Why buy the cow when LLMs are giving out free milk?

I used AI several times in writing this blog post. I also use AI to proofread my books, and AI is helping me translate my books into different languages.

LLMs are putting proofreaders and translators out of business. But these LLMs didn't steal from--or learn--from proofreaders or translators. They are doing the same job, and learned from the same set of rules.

But LLMs could not write like J.A. Konrath or think like J.A. Konrath without stealing from J.A. Konrath because they had no meeting of the minds with J.A. Konrath on what they can incorporate from J.A. Konrath into their models.

I currently pay $400 a year for AI.

Should AI be paying me royalties every time someone asks it a question and it answers using a bit of something it stole from me?

I think so.

Should LLMs who have infringed on the IP of writers pay those writers a settlement for involuntarily helping program those LLMs?

I think so.

Should Fourth Estate Public Benefit Corp. v. Wall-Street.com, LLC be challenged in court?

I think so.

Should US Copyright Law be changed to incorporate new technology?

I think so.

Buying my books does not grant permission to build a company founded on my books.

AI estimates that the Top 20 LLMs have already generated 6 trillion responses.

Out of those 6,000,000,000,000 responses so far, if those LLMs trained on my 200+ books (based on the AI estimate that 0.000006% of the overall training data from my books helped form them), that means there have been as many as 36,000,000 responses made by LLMs that involved learning from my work.

As of November 2025, my IP has perhaps assisted LLMs in crafting 36 million answers.

If I received 1/10th of a cent for each of these answers that can be attributed to my IP, I'd be owed $36k dollars by LLMs. So far. Plus a royalty situation going forward.

These numbers are staggering. And we're still in the early days of AI and LLMs.

It is not my fault you stole my land to build your empire.

Should every writer of every IP used by LLMs in their training be compensated in this manner?

I think so.

What do you think?