Github Copilot now available for free

Now automatically integrated into VS Code, all of you have access to 2,000 code completions and 50 chat messages per month, simply by signing in with your personal GitHub account. Or by creating a new one. And just last week, we passed the mark of 150M developers on GitHub. šŸŽ‰

Copilot Free gives you the choice between Anthropicā€™s Claude 3.5 Sonnet or OpenAIā€™s GPT-4o model. You can ask a coding question, explain existing code, or have it find a bug. You can execute edits across multiple files. And you can access Copilotā€™s third-party agents or build your own extension.


 
Question: by using it, do they enter a contract with you that is enforceable? So you can sue them for product defects? Inquiring landsharks want to know.
 
So, this is how MS get to use the massive repository of GPL code they have had at their disposal since the Github acquisition - and avoids any GPL infringement lawsuits...
 
So, this is how MS get to use the massive repository of GPL code they have had at their disposal since the Github acquisition - and avoids any GPL infringement lawsuits...
I would not be too sure about that lawsuit avoidance. All the FSF needs is a verbatim copy of a fragment, those 6 lines. And I'd bet they are already ordering some fine toothed combs.
 
So, this is how MS get to use the massive repository of GPL code they have had at their disposal since the Github acquisition
Microsoft had access to it even before they bought Github: Most of it was publicly visible. They could read it before the acquisition just as well. The only difference is the closed source stuff that's stored there; whether Microsoft can read it to train their AI model is a question for the contract that the owner of the code entered in.

I would not be too sure about that lawsuit avoidance. All the FSF needs is a verbatim copy of a fragment, those 6 lines. And I'd bet they are already ordering some fine toothed combs.
The interaction of "fair use" in copyright and AI is a field that we don't know yet. For example, fair use allows me to say "As RalphBSz awoke one morning from uneasy dreams, I found myself transformed in bed into a gigantic insect". While that is nearly a literal quotation from Kafka, I can use short quotations. In the same sense, I can use short snippets of music (typically less than 30 seconds), so I could play some Schoenberg while the insect wakes up. And I could continue the little story with lots of short snippets from James Joyce, Heinrich Boll, and Solshenitsyn, all copyrighted works.

Now the question is: Can an AI use a similarly short snippet of code, when combined with other short snippets? As far as I know, neither legislators nor courts have given us guidance on that question, so we don't know.

It was a dark and stormy night by the way ...
 
It was a dark and stormy night by the way ...
And the little boggins hid from the hog riders, while Fritos blade glowed blue as in the presence of lawyers. (Bored of the rings)

A judge might come to the conclusion that if some lines of code are enough for one big corp to go after some other, then that is the scale. Thanks IBM/SCO/...
 
Hm, do they actually distribute these LLM models under the GPL? Because they certainly are "derivative works" of GPL'd code by any definition...
See, that's not clear to me at all. Certainly, an AI model that's trained on all the code that's publicly readable at GitHub will have ingested LOTS of GPL'ed code, but also lots of BSD-licensed code, lots of truly open source code, and everything in between. And ingesting code is in and of itself not a breach of copyright: I can go to the library and read all the novels written in the last 100 years, and nobody can complain.

What I can't do is to publish a derivative work. So for example, I can not retell the story of Ulysses in the Martian language, without getting a license from the heirs of James Joyce. I can't use half of Glasperlenspiel when I write my own biography, without a license from the heirs of Hermann Hesse. But what I can do: Use two sentences from each of the 100s of novels to write my own work, because each of the quotations is short enough to be exempt from copyright protection. Note that even single sentence in my biography is copyrighted by others, yet I have not plagiarized any single novel so much that they have a valid claim against me.

The same happens with music. I can not write an adaptation of Sleigh Ride for 19 contra bassoons, without the heirs of Leroy Anderson getting seriously mad at me (and a lot of other people, but those don't have a legal claim). But I can use an F# in my new composition, even though it also occurs in Sleigh Ride. And I can even quote the little 4-bar long main theme of Sleigh Ride. For a fun example of a Christmas tune that quotes dozens of pieces of copyrighted music, listen to "Minor Alterations" by David Lovrien.

If the library sends out a newsletter about "new releases", and in there it says that they just got a great new book by Hemingway, which deals with bull fights and Spanish wine, and a new book by Garcia Lorca about a village that has been disconnected from civilization for 100 years, and synthesizes the two into a coherent description of the human condition (ha ha), is that a derivative work? No, legally it isn't.

So where does AI-generated autocompletion source code fall on that spectrum? I have no idea, and I'm not a lawyer (but I've dealt with open source for ~30 years, and been yelled at by lawyers quite a few times).

There needs to be a NO-AI-USE-WITHOUT-PERMISSION file in every repo that doesn't want to feed it to the sandworm of AI.
If you don't want your code to be ingested into AI, you probably need to say that explicitly in the license you attach to it. Because if you just create such a file, but otherwise say that it is under a standard license such as GPL or BSD, then probably everyone has the right to read the code, which is what AI does. Matter-of-fact, if you don't want people and computers to read your code, why do you publish it? That's like an author getting mad that a copy of their bestseller ended up in the city library on the "new releases" shelf. It's logically inconsistent.

And to be 100% clear: I'm not sure that AI generated code is ethically acceptable. But if it isn't, I'm not sure how the current copyright legal system can fix that.
 
And to be 100% clear: I'm not sure that AI generated code is ethically acceptable. But if it isn't, I'm not sure how the current copyright legal system can fix that.
I have a counter-example to that right here from these Forums (Oh, and let's see if an AI is even capable of stuff like counter-examples, irony, humor, social context, and more) :

Starting at post #504. I linked to a Bugzilla ticket that contained some useful information that proved to solve a problem for me. If you look at the Bugzilla ticket, you'll discover that the patch was ChatGPT-generated. Do I care that it was ChatGPT-generated? not really, I just want my problem to be solved!
 
Seriously, fuck them and fuck that. Remixing other peopleā€™s intellectual property without even giving any credits is available ā€œfor freeā€ now. Ah well.

I never was a violent person, but Iā€™d love to see their server farm engulfed in flames. Thieves and/or imbeciles.
I used it for some minor things but manual operation is still poor, like why place file by file instead of uploading a directory? Also it's getting bloated with this over the top statistics information.
What would be a good alternative?
 
If you don't want your code to be ingested into AI, you probably need to say that explicitly in the license you attach to it. Because if you just create such a file, but otherwise say that it is under a standard license such as GPL or BSD, then probably everyone has the right to read the code, which is what AI does. Matter-of-fact, if you don't want people and computers to read your code, why do you publish it? That's like an author getting mad that a copy of their bestseller ended up in the city library on the "new releases" shelf. It's logically inconsistent.
As a copyright holder you have a right to control to what use your code is put to. You will have to modify your license accordingly. Whether this stance stands up in the courts is hard to predict! What is legal is not always sensible or protective of individual's rights. As for the specially named file, that was more in the style of robots.txt.
 
See, that's not clear to me at all. Certainly, an AI model that's trained on all the code that's publicly readable at GitHub will have ingested LOTS of GPL'ed code, but also lots of BSD-licensed code, lots of truly open source code, and everything in between. And ingesting code is in and of itself not a breach of copyright: I can go to the library and read all the novels written in the last 100 years, and nobody can complain.
The AI is ready to "publish" transformed plagiarized text or code any time you ask! This is a new threat that the courts and legislators will have to sort out.
 
What I can't do is to publish a derivative work.
How I see it: they can read all code they want, but the trouble begins when copilot starts to answer questions with things learned. Then it is actively creating a derived work.
 
See, that's not clear to me at all. Certainly, an AI model that's trained on all the code that's publicly readable at GitHub will have ingested LOTS of GPL'ed code, but also lots of BSD-licensed code, lots of truly open source code, and everything in between. And ingesting code is in and of itself not a breach of copyright: I can go to the library and read all the novels written in the last 100 years, and nobody can complain.

Ingesting is not, but they store it in a lossy and compressed way in their trained model. Given the extent of GPL'd source code you can be sure that it shaped the trained model considerably. Which clearly makes it a derivative work in my POV. This is a licensing issue, which has a much broader scope than just copyright.
 
trouble begins when copilot starts to answer questions with things learned
Yep, just like a student is supposed to be able to quote Plato in a debate, to the point, after reading a book of Plato quotes... šŸ˜

... just like the rest of us humans... šŸ¤·ā€ā™‚ļø

Or is code supposed to be classified information? And if it is, why is that code up on GitHub in the first place? That would be rather antithetical to the very idea of Open Source, IMHO...
 
How I see it: they can read all code they want, but the trouble begins when copilot starts to answer questions with things learned. Then it is actively creating a derived work.
I go to the university library, and read all the textbooks on organic chemistry. I work all the homework problems in the textbooks. I go to the professor, and pass the exam, and get a diploma. I take this knowledge and go to work in a chemistry lab. A colleague asks me how to create a new kind of glue, strong enough to hold little yellow pieces of paper, but weak enough that they can be easily peeled back off. I answer the question, given the knowledge from these textbooks about bond strength, tearing of molecules and van-der-Waals forces, quasi-crystals, viscosity, and thixotropic properties. Did I create a derived work from the textbooks?

No, current law holds that I learned something. However, if I write a patent application about my new glue, and in there I literally quote half a page from Mr. Bayer's textbook of organic glues, then I have violated copyright.
Ingesting is not, but they store it in a lossy and compressed way in their trained model. Given the extent of GPL'd source code you can be sure that it shaped the trained model considerably. Which clearly makes it a derivative work in my POV.
See organic chemistry example above. The model is lossy and compressed, yet complete and well connected, just like my memory of organic chemistry. As long as the model doesn't literally regurgitate whole sections, it has ingested knowledge.
This is a licensing issue, which has a much broader scope than just copyright.
But what license can you write that says: You can use this code, but you can't read it, and if you read it you must forget it, and retain nothing more than the general concepts? Because that is sort of what a student (or the AI) does.

All in all, the use of AI to ingest a whole corpus, and then store and use just loose connections between the pieces, is something that probably eludes current law. This will have to be sorted out.

Yep, just like a student is supposed to be able to quote Plato in a debate,
I prefer Dionysius: I drink, therefore I am.

PS: That quote was not AI-generated, but using natural stupidity. And I don't know much about organic chemistry, nor about history, biology, or the French I took, but I know that I love youuuuuu ...
 
Or is code supposed to be classified information? And if it is, why is that code up on GitHub in the first place? That would be rather antithetical to the very idea of Open Source, IMHO...
I think people like their work to be at least credited (their work/ideas/creations), and for some code there is the licencing where they release their code but want it to be "viral" - "you can use this code, but please make sure your use of it (including your code) is also shared".

So it's not about secrecy or openness but people taking someone else's work (where it's code or art or whatever) and using it uncredited or not in the spirit that it was released.

Some people don't have a problem with that, other people do.
 
You can use this code, but you can't read it, and if you read it you must forget it, and retain nothing more than the general concepts
Kind of like the GameBoy and Casio digital watches from 1980s? Before Arduino and Pi boards, it was next to impossible to take electronics apart and learn how to reprogram them... you could only have a plastic/metal gadget with physical buttons, and what amounts to a black box when it comes to learning how the heck that circuitry knows when to light up this line or that one to show the correct time. Is that how code is supposed to be used, as a black box that you can't take apart?
 
Back
Top