AI - AN AUTHOR'S THOUGHTS AFTER THE ATLANTIC MAGAZINE BROKE THE META HEIST STORY. PART 1.

A  while ago on here, I posted about how nine of my books had their copyright infringed, along with another 180K titles, when all of these books hosted at a pirate website were scraped for inclusion in the Books3 dataset, to train the LLMs of the big tech companies. In the recent meta heist of seven million books from the LibGen pirate website, all of my books, including foreign editions, were included on that site. So, it's reasonable to assume that the copyright of my life's work has been infringed, within two years, by tech companies training their large language models for AI.

What a way to celebrate my 30th year as a professional writer.

If you look into Alex Reisner's investigation, published in The Atlantic magazine, it's clear that employees of the tech companies knew that what they were doing was illegal. Deception preceded theft and has followed theft.

Anyway, I may rage but I also always look for solutions. I've spoken to a licensing company (Created by Humans) and set-up an account in order to state that no book authored by me is available for any exploitation by AI. I must now input 216 titles (numerous editions of each book), one by one, and set up exclusions on two types of AI usage. This will only be applicable if legit AI companies, in the future, wanted to license my future books, and who approach me or a third party that represents my work. It's like pissing into a strong wind, however, because of the precedent that Tech companies, in collusion with pirate sites, just help themselves anyway. Also, everything that I have written has already been scraped. I will, however, piss into the wind because I need to do something.

Also, everything that has been scraped and used to train AI, apparently, can't be "unlearned" by the technology​. How convenient. Also, from what I can determine, AI detection software  isn't foolproof. How convenient. Because if you were reckless enough to release such a virus, you'd create diagnostic tests and a vaccine first, in case the consequences were hideous? 

But, if you intended to pull off the biggest theft of culture in the history of mankind, in order to exploit it for your own gain, you'd hope that what you've done can't be undone; and to avoid litigation until the end of universe, you'd want what you've stolen to be laundered into derivatives in just such a way as to make it impossible to accurately detect the usage of the stolen source material. I see a pattern emerging.

This all follows years of me sending takedown messages to pirated book websites with mixed results - if they're Russian like LibGen (now there's a fucking surprise), forget it. I want to write books, and publish them, and not spend my time and capacity drafting and sending out takedown notices, and filling in endless fields at licensing websites, and reading scores of articles on just how far I have been held upside down by my ankles and shaken hard, before having my rights desecrated.

Now, the British govt, in all of its wisdom and desperation to dig itself out of a financial hole, and to not get left behind, suggests that training AI on copyrighted works should be, more or less, fair usage. This is the "Data (Use and Access) Bill". And I'm going to guess that those sympathetic to the tech industry advised them on its creation.

According to the proposals in the bill, the only way that I can prevent my work being used to train AI, would be for me to "Opt out" all of my books, in every single edition, in every single territory, rather than "Opt in". So each and every ISBN generated for each edition of all of my books (hence the 216 cited above) will have to be separately "opted out" - yet what about short stories in multi-author collections? What if you missed an old eBook edition from 10 years ago? It's just not workable or practical. An opt-in arrangement, however, would be: unless stated by the author that their work can be used to train AI, tech companies and users of AI are forbidden to touch the work. But that's not in favour of the tech companies, is it? And everything has already been scraped and can't be "unlearned", so all of this opting out is only applicable to future works. How many times must I be forced by AI to piss into strong headwinds?

This bill also suggests that the derivatives shat out the other end can be copyright protected. It beggars belief. So, you'd be sanctioning copyright infringement of all existing human created books, but would legally protect the derivatives resulting from this theft?

Mercifully, the House of Lords has asked for amendments - "forcing AI crawlers to observe UK copyright law, reveal their identities and purpose, and let creatives know if their copyrighted works have been scraped." You can read about it in Graham Lovelace's article.

To my dismay, the only factor included in these discussions appears to be money, revenue, transfer of wealth, etc. Something far more valuable is at stake: our ability as a species to think abstractly and to make sense of ourselves, the world, our place in time, and to preserve the most important facets of storytelling that carry the wisdom of the ages, through every successive generation of our species. And let's not discard the neurological and psychological and social impact of making people stupid and making truth irrelevant through tech. I get a sense that everything is at stake.

So, I have also taken part in the lengthy 'Consultation on Copyright and Artificial Intelligence' to explain just how unfair this legislation is, and what the dire consequences will be for writers, and for human created culture. The idea that culture is fodder for AI muck sprayers, and only has the worth that tech companies assign to it, is just too staggering to comprehend.

If this post disappears, or my SM accounts are suddenly placed in hiatus, I won't have deactivated myself. Just so you know.

Leave a comment

Please note, comments must be approved before they are published