Hm, do they actually distribute these LLM models under the GPL? Because they certainly are "derivative works" of GPL'd code by any definition...
See, that's not clear to me at all. Certainly, an AI model that's trained on all the code that's publicly readable at GitHub will have ingested LOTS of GPL'ed code, but also lots of BSD-licensed code, lots of truly open source code, and everything in between. And ingesting code is in and of itself not a breach of copyright: I can go to the library and read all the novels written in the last 100 years, and nobody can complain.
What I can't do is to publish a derivative work. So for example, I can not retell the story of Ulysses in the Martian language, without getting a license from the heirs of James Joyce. I can't use half of Glasperlenspiel when I write my own biography, without a license from the heirs of Hermann Hesse. But what I can do: Use two sentences from each of the 100s of novels to write my own work, because each of the quotations is short enough to be exempt from copyright protection. Note that even single sentence in my biography is copyrighted by others, yet I have not plagiarized any single novel so much that they have a valid claim against me.
The same happens with music. I can not write an adaptation of Sleigh Ride for 19 contra bassoons, without the heirs of Leroy Anderson getting seriously mad at me (and a lot of other people, but those don't have a legal claim). But I can use an F# in my new composition, even though it also occurs in Sleigh Ride. And I can even quote the little 4-bar long main theme of Sleigh Ride. For a fun example of a Christmas tune that quotes dozens of pieces of copyrighted music, listen to "Minor Alterations" by David Lovrien.
If the library sends out a newsletter about "new releases", and in there it says that they just got a great new book by Hemingway, which deals with bull fights and Spanish wine, and a new book by Garcia Lorca about a village that has been disconnected from civilization for 100 years, and synthesizes the two into a coherent description of the human condition (ha ha), is that a derivative work? No, legally it isn't.
So where does AI-generated autocompletion source code fall on that spectrum? I have no idea, and I'm not a lawyer (but I've dealt with open source for ~30 years, and been yelled at by lawyers quite a few times).
There needs to be a NO-AI-USE-WITHOUT-PERMISSION file in every repo that doesn't want to feed it to the sandworm of AI.
If you don't want your code to be ingested into AI, you probably need to say that explicitly in the license you attach to it. Because if you just create such a file, but otherwise say that it is under a standard license such as GPL or BSD, then probably everyone has the right to read the code, which is what AI does. Matter-of-fact, if you don't want people and computers to read your code, why do you publish it? That's like an author getting mad that a copy of their bestseller ended up in the city library on the "new releases" shelf. It's logically inconsistent.
And to be 100% clear: I'm not sure that AI generated code is ethically acceptable. But if it isn't, I'm not sure how the current copyright legal system can fix that.