Project-wide LLM policy#3959
Conversation
| * Neurodivergent authors tend to replicate the "terseness" of many LLMs, and often show up as false positives in LLM detection | ||
| * Kenyan authors, many of whom helped filter the data for LLMs, often show up as false positives in LLM detection |
There was a problem hiding this comment.
FWIW, while I've tried to cite everything across the RFC, this section was added after the first draft and I knew both of these had citations, but had a lot of trouble finding explicit articles about it. So, if you happen to find sources for these, I'd be happy to update the RFC to include them.
There was a problem hiding this comment.
When reading that section, I thought back to this article: https://marcusolang.substack.com/p/im-kenyan-i-dont-write-like-chatgpt.
There was a problem hiding this comment.
(To be clear, I do think this blog post is sufficient evidence for the second point, although I'm mostly leaving this open to remind myself to look for other sources too.)
|
|
||
| 1. If the LLM usage is *trivial*, it is completely ignored by the policy and always allowed. Generally, this means that changes made by LLMs are indistinguishable from those made by humans, where the LLM didn't have any creative input into the change. | ||
| 2. If the LLM usage is *slop*, it is considered spam and moderated accordingly. Generally, this means submitting changes made by LLMs with minimal human intervention. | ||
| 3. *Nontrivial* LLM usage must be *disclosed* in ideally as detailed as a manner as possible. This may necessitate additional tooling to notify new contributors about the policy and explain how disclosure works. |
There was a problem hiding this comment.
In PRs I've submitted to rust and cargo, I include a single sentence at the end usually of the form:
No AI tooling of any kind was used during the creation of this PR.
I think including an AI disclosure prompt/template in the GitHub PR template is a simple way to cover off disclosure while allowing the author the freedom to go into as much or as little detail as they feel is appropriate.
There was a problem hiding this comment.
Yeah, having a section on a PR template just to fill in LLM disclosure is a good idea, and that's one of the multiple options that could be used for tooling.
This is actually the one difference between the summary for the RFC and the one proposed for inclusion in the actual policy: for the RFC, the tooling is an important point since it doesn't exist yet, but for the final policy, the goal is to already have that.
I mostly leave the details out of the RFC since they're almost certainly going to be done via experimentation, but this is a pretty easy one to start with.
|
|
||
| Several people were attempting to find a way to properly obtain licenses for copyrighted material before proceeding. Then, suddenly, the CEO of the company demonstrates his desire to "move this stuff forward," and people just start doing it without permission. Even if the employees responsible for creating the model said "using pirated material should be beyond our ethical threshold," the CEO decided to ignore those concerns. Even though Meta's LLM is not a coding model, their case is not particularly unusual in the industry. | ||
|
|
||
| And, it's worth mentioning that the "worst case" scenario of xAI, brought up earlier, *is* supported by GitHub Copilot, showing that at least all of the "good actors" in the AI space are willing to work with all the bad actors on equal footing. This example indicates that Meta's case is likely to be the norm. |
There was a problem hiding this comment.
| And, it's worth mentioning that the "worst case" scenario of xAI, brought up earlier, *is* supported by GitHub Copilot, showing that at least all of the "good actors" in the AI space are willing to work with all the bad actors on equal footing. This example indicates that Meta's case is likely to be the norm. | |
| And, it's worth mentioning that the "worst case" scenario of xAI, brought up earlier, *is* supported by GitHub Copilot, showing that at least many of the "good actors" in the AI space are willing to work with many of the bad actors on equal footing. This example indicates that Meta's case is likely to be the norm. |
I think there are likely a few exceptions, so "all" is overbroad.
There was a problem hiding this comment.
I think the second "all" / "many of" is actually redundant here, but yes, I can reword this.
There was a problem hiding this comment.
So, I am unresolving this and heavily considering at least strengthening the wording here given: https://www.cnbc.com/2026/05/06/anthropic-spacex-data-center-capacity.html
Now that Anthropic is explicitly using the data centre in question, I think it's unreasonable to not point out how actually bad this case is, even if I do concede it's not literally everybody.
|
|
||
| While there are arguments to be made about some models having less power consumption, it ultimately doesn't matter if they fundamentally require operation based upon brute force. As hopefully any programmer even vaguely educated on complexity knows, brute force is *the worst* way to solve any problem, and should always be used as a last resort. LLMs put brute force front and center as the best option. | ||
|
|
||
| With code, we have methods to bound operation: for example, a famous case is sorting algorithms based upon quicksort, a quadratically-bounded algorithm, which fall back to some explicitly-optimal method like heapsort or merge-sort. Rust uses these algorithms in its standard library. |
There was a problem hiding this comment.
The argument in this section could be clarified a bit.
You seem to be saying that the idea of an unbounded loop where the model picks some action (writing code, running a tool), receives the result, and uses it to inform future actions, is inherently "brute force" and a "feedback loop" and bad.
But that can't be true, because that also describes how humans work.
Of course, LLMs are considerably worse at it than humans. And I can think of ways in which those terms apply uniquely to LLMs. "Brute force" applies uniquely to LLMs in that one strategy LLMs use is to make up for a lack of intelligence with sheer persistence. "Feedback loop" applies uniquely to LLMs in that LLMs are much more vulnerable than humans to getting stuck with a bad assumption and spending a long time barking up the wrong tree. Or perhaps more generally in that LLM performance tends to fall off drastically past certain complexity levels, which could be analogized to an audio feedback loop amplifying noise.
However, you don't really explain that. As written, the argument could apply just as well to humans. And I could respond: sure, it would be nice if there were a way to solve arbitrary programming problems with a bounded amount of work, but that just doesn't exist.
There was a problem hiding this comment.
Hmm, I feel like I did technically cover this point, but I do think you have a point here. I'm mostly replying to point out that I do want to consider what you've written, but am going to think about it a bit to figure out how to properly articulate that.
There was a problem hiding this comment.
Also for some additional context, this piece is probably the closest to being directly inspired by this massive essay I've been writing for years going more in-depth on this. Like, there, I spend a lot more time explaining the specifics of complexity theory, incompleteness proofs, and other things to make a point, but like, I both haven't written that fully and am not going to write that fully here. But I feel like there's some nugget of information that could be gleaned without going past the surface level.
| 2. If the LLM usage is *slop*, it is considered spam and moderated accordingly. Generally, this means submitting changes made by LLMs with minimal human intervention. | ||
| 3. *Nontrivial* LLM usage must be *disclosed* in ideally as detailed as a manner as possible. This may necessitate additional tooling to notify new contributors about the policy and explain how disclosure works. | ||
| 4. If a contributor does not fully understand the code they submit, their contribution may be rejected for that reason alone. Note that such usage is not always considered *slop*, and is considered separately. (For example, they may understand a large portion, but not all of it, which shows that they still put in a lot of effort.) | ||
| 5. If a user is found to be repeatedly lying about LLM usage (using LLMs without disclosing that usage), this is a COC violation that will be moderated accordingly. |
There was a problem hiding this comment.
This all looks great! "Honesty over purity" is a more friendly way of saying that non-disclosed nontrivial LLM uses is bad faith.
I'd suggest rewording 5 so that a single offence can trigger moderation. You could just remove the word "repeatedly", but then the sentence maybe stronger than you'd like, since some lack of clarity exists on the first offence. "repeatedly or dangerously" maybe? Or some rephrasing?
Anyway I do think you want to leave yourself room to act strongly on a first offence. You maybe confident 6 and 7 cover those cases, but I think you're not doing the vibers any favours by giving the impression of a free pass here.
You should probably change "lying" to "hiding" too, because someone might think that saying nothing is not lying. There can be explicit questions in PR templates etc that make "hiding" more clearly be "lying", but this policy seems more general, so "hiding" seems clearer here.
There was a problem hiding this comment.
So, not 100% sure if you've read the whole thing, since I do go a bit more in depth in the start of the motivation to point out what "honesty over purity" means; nontrivial use is allowed (less purity) with disclosure (honesty). The point is that this is a compromise over a more strict policy where honesty is seen more important than preventing all (nontrivial) LLM usage.
With regard to enforcement, 5 already leaves enforcement largely up to the moderation team, and I would rather that they have broad authority to determine whether something is intentional or not. The main method of determining usage where things might be ambiguous is by asking people directly, and then you're kind of forced to say something about it; at that point, if you lie, that's just lying, not hiding.
Right now there are already a lot of moderation details I kept out of the policy for simplicity: for example, the moderation team tends to prefer bans for slop PRs because bans can be rescinded; the only change is they have to talk to the mods about it before future contributions. For stuff like what to do when lying, I'd rather the mods who are already pretty lenient (intentionally) make the determination whether someone was being malicious or not.
Plus, COC enforcement also is inherently asymmetrical because it's more heavily weighted against team members (who should know better), so we kind of get the desired enforcement built in: team members who lie are treated much more seriously and might not get a second chance, whereas for ordinary contributors who do it, we're more likely to just reject PRs unless it becomes a problem.
Co-authored-by: Arhan Chaudhary <arhan.ch@gmail.com>
| [Rust Project Perspectives]: https://rust-lang.github.io/perspectives-on-llms/index.html | ||
| [xAI Memphis]: https://time.com/7308925/elon-musk-memphis-ai-data-center/ | ||
|
|
||
| Due to the extreme rift between the various arguments for and against LLMs, this section has the potential to be discussed in a generally uncritical way. Please *do not* attempt to refute or reinforce these arguments in the RFC discussion. As usual, constructive revision of wording and addition of sources is encouraged and helpful, but nonconstructive critique is unhelpful. We strongly encourage you to read the full RFC before commenting on these sections. |
There was a problem hiding this comment.
I think this is a reasonable request if these points are meant to illustrate a spectrum of the potential concerns people have, but at the moment several of these are written as statements of fact, many of which are objectively false, and the caveats spelled out above make it seem like the RFC accepts the facts presented but doesn't necessary endorse the ethical conclusion, whereas I believe the intent is to simply relay what has been said without making any statement about its truth.
Maybe these could be put in block quotes or re-worded to make it clear that these are opinions that have been expressed.
Some examples:
To one extent or another, Large Language Models will include data that was not taken with permission, i.e. stolen.
Whether or not the project (or the law...) considers this stealing seems like it's a point the RFC is not intending to settle one way or the other.
The lack of thorough manual review for data leads to a number of issues in the output that will only become more difficult to fix as models increase in size.
I don't think there's enough evidence to settle this either way.
Additionally, it's worth clarifying that LLMs are fundamentally interpolary systems, not extrapolary systems,
There is no definition for "interpolary" vs "extrapolary" which an LLM fails but a human passes. Can LLMs recognise patterns and extrapolate beyond the data-set: this has been proven many times. Can LLMs make accurate predictions about something outside of the data-set when no evidence of the pattern exists within the data-set? No, that would be magic even for a human.
There was a problem hiding this comment.
So, to respond point-by-point:
at the moment several of these are written as statements of fact, many of which are objectively false, and the caveats spelled out above make it seem like the RFC accepts the facts presented but doesn't necessary endorse the ethical conclusion, whereas I believe the intent is to simply relay what has been said without making any statement about its truth.
If you have explicit examples of opinions stated as fact, please feel free to point them out. Otherwise, this feedback is unhelpful.
To one extent or another, Large Language Models will include data that was not taken with permission, i.e. stolen.
Whether or not the project (or the law...) considers this stealing seems like it's a point the RFC is not intending to settle one way or the other.
Taking something without permission is stealing, and not all stealing is considered bad to the point of a crime. For example, taking a pen from someone's desk can be called "stealing" but that doesn't mean you deserve to be arrested for it.
The lack of thorough manual review for data leads to a number of issues in the output that will only become more difficult to fix as models increase in size.
I don't think there's enough evidence to settle this either way.
These models have training sets that comprise effectively the size of the entire internet. Do you think that someone personally read through every piece of training data? Do you think that even a small army of underpaid Kenyans read it, and that this would constitute "thorough review"? This is not a claim that requires evidence; the scale of these models, if it were even a fraction of what's claimed, prevents you from thoroughly reviewing all the data. And, since the main mechanism for improving models is adding more data, this only gets worse with time.
Additionally, it's worth clarifying that LLMs are fundamentally interpolary systems, not extrapolary systems,
There is no definition for "interpolary" vs "extrapolary" which an LLM fails but a human passes. Can LLMs recognise patterns and extrapolate beyond the data-set: this has been proven many times. Can LLMs make accurate predictions about something outside of the data-set when no evidence of the pattern exists within the data-set? No, that would be magic even for a human.
This isn't the point being made here. The point being made is that without logical deduction, LLMs are much more constrained by their input data than humans are. This is also a quick summary to reference a later point in the section, which includes an explicit example of what this entails.
I'd be happy to reword this to be more clear, but I'm not sure whether this is actually unclear or if you just dislike this point.
There was a problem hiding this comment.
If you have explicit examples of opinions stated as fact, please feel free to point them out. Otherwise, this feedback is unhelpful.
I gave some examples? I didn't try to be exhaustive because I didn't think the discussion was meant to be about whether or not they were true or not, since you said:
Please do not attempt to refute or reinforce these arguments in the RFC discussion.
Yet seemingly you do want to argue these points?
Taking something without permission is stealing, and not all stealing is considered bad to the point of a crime. For example, taking a pen from someone's desk can be called "stealing" but that doesn't mean you deserve to be arrested for it.
Either these statements are part of the "RFC proper" and should be open to discussion or they are not, there's no point discussing them and they shouldn't be presented as such.
Since you do apparently want to discuss them though, here goes:
This is not a claim that requires evidence; the scale of these models, if it were even a fraction of what's claimed, prevents you from thoroughly reviewing all the data. And, since the main mechanism for improving models is adding more data, this only gets worse with time.
It is already the case that the vast majority of data does not go though any kind of manual review process at all, so this is irrelevant. You'd have to somehow evidence that the increase in data doesn't make up for the percentage-wise decrease in manual reviews when it comes to LLM quality, and there's no way to answer that at the moment.
This isn't the point being made here. The point being made is that without logical deduction, LLMs are much more constrained by their input data than humans are.
a) That's not what "interpolary" means, so if that's the argument you're trying to make I don't understand why you would use those words.
b) Humans are notoriously bad at logical deduction unless they resort to formal methods. There's a reason we invented systems like propositional logic. LLMs have access to the same tools. I'm sure there's a quantitative difference in ability here (for now) but I don't believe anyone can say with certainty whether there is a qualititative difference. If you can point to a paper on this I would be very interested to read it.
If you're trying to argue that LLMs empirically seem to be worse at certain things... just say that? It weakens the argument to be overly (and incorrectly) specific about why something is the case, when all that matters is that it appears to be the case.
I'd be happy to reword this to be more clear, but I'm not sure whether this is actually unclear or if you just dislike this point.
To be honest I think the guide and reference sections of this RFC are fine. It's a good compromise that I hope will ensure a high quality of contributions going forward.
The rest of the RFC reads like an essay on your personal beliefs on LLMs, some of which I think are substantiated, some not. I think it only harms the ability to come to a consensus to include them as anything other than "concerns that have been raised".
There was a problem hiding this comment.
If you have explicit examples of opinions stated as fact, please feel free to point them out. Otherwise, this feedback is unhelpful.
I gave some examples? I didn't try to be exhaustive because I didn't think the discussion was meant to be about whether or not they were true or not
To be clear, I… genuinely did not even think these were examples here. Because to me, these read entirely as reasonable deductions to make based upon the limited things we know about LLMs.
Please do not attempt to refute or reinforce these arguments in the RFC discussion.
Yet seemingly you do want to argue these points?
I agree this is a bit confusing; what I meant to say here is that I don't want to accept feedback of the form "hey, here's another example of ____" or "I disagree that LLMs are unethical; this argument is unsatisfactory" since I don't think they're going to be very productive. But these points, to me, are just reasonable statements being deduced, and I'm responding to them because I want to understand your viewpoint here.
It is already the case that the vast majority of data does not go though any kind of manual review process at all, so this is irrelevant. You'd have to somehow evidence that the increase in data doesn't make up for the percentage-wise decrease in manual reviews when it comes to LLM quality, and there's no way to answer that at the moment.
So, to me, it feels like a given that model creators are just going to keep adding more data to their models, and that they will be less and less able to thoroughly review all that data as a result, and the more debatable point is whether the thorough review is necessary. What it felt like you were saying is that more evidence is needed to prove that thorough review is not happening when like… no, this is just not possible; most of it is not reviewed at all, even.
That said, I don't try to point this out as fact; it's specifically referencing a later point where I argue that bias in the models is a side effect of the lack of review. This definitely is a logical leap but I think it's a fair one.
The rest of the RFC reads like an essay on your personal beliefs on LLMs, some of which I think are substantiated, some not. I think it only harms the ability to come to a consensus to include them as anything other than "concerns that have been raised".
An RFC itself is an essay that reflects one's opinions. If proposals were an obvious choice that didn't involve any opinions we wouldn't have the RFC process.
I tried my best to make sure the arguments here were as clear as possible and motivate using evidence instead of opinions wherever possible. There is an inherent bias in the motivation because, well, as I said, a policy proposal is an inherently biased position; I'm biased toward the proposal being accepted. I'm intentionally overloading on arguments because I feel like a lot of people haven't been listening to any arguments, but if some arguments are confusing, I do want to try and clarify them.
The last bit on "interpolary" versus "extrapolary", I agree that the choice of words is probably bad, and the section it's leading into also needs some work too, as demonstrated by this thread: #3959 (comment)
Like that thread, I'm also going to take some time to think about the best way to reword it.
To respond to this, though:
b) Humans are notoriously bad at logical deduction unless they resort to formal methods. There's a reason we invented systems like propositional logic. LLMs have access to the same tools. I'm sure there's a quantitative difference in ability here (for now) but I don't believe anyone can say with certainty whether there is a qualititative difference. If you can point to a paper on this I would be very interested to read it.
Just to make sure we're clear, what I think you're proposing is the idea of an LLM running some code to perform logical deduction to perform the equivalent of what a human "doing deduction manually" might be. We could extend the problem also to performing basic arithmetic as well.
As far as I'm aware, no model successfully does this correctly, and doing so does not avoid any issues with bias based upon symbolic information (i.e. GSM-Symbolic still fails), but my information could be out of date.
There was a problem hiding this comment.
Just to make sure we're clear, what I think you're proposing is the idea of an LLM running some code to perform logical deduction to perform the equivalent of what a human "doing deduction manually" might be. We could extend the problem also to performing basic arithmetic as well.
As far as I'm aware, no model successfully does this correctly, and doing so does not avoid any issues with bias based upon symbolic information (i.e. GSM-Symbolic still fails), but my information could be out of date.
Well, if you look at AI contributions to Erdős problems…
Many of the "AI standalone" solutions involve the LLM writing formal proofs in Lean, which sounds pretty close to what you said. I doubt this technique would be very applicable to Rust though.
Among the proofs that did not use Lean, many are marked incorrect, suggesting that even in mathematics (which LLMs have shown to be relatively good at), hallucinations are still an issue. Human mathematicians make mistakes too, but probably not as many, and/or they're better at catching them on review.
Still, the fact that LLMs have moved up from grade-school word problems (as seen in the GSM-Symbolic paper) to unsolved mathematical problems tends to put those issues in perspective. They are much better at logical reasoning than they used to be.
(Personally I have similar "losing the will to live speedrun" feelings about LLMs as mentioned in the RFC, but not when I see them write bad code. I have those feelings when I see them write good code. I agree that programming is an incredibly fun task, but it is also a useful task, and I fear that it will cease to be so.)
|
|
||
| The main issue is forming a policy with regard to all other forms of LLM usage. A large portion of the team have serious concerns regarding LLM usage, but there are also several team members who feel they would be excluded by a complete ban on LLM usage. There's also a pragmatic issue with enforcing any limit on LLM usage, where some LLM usage is simply impossible to detect and is effectively identical to human-authored changes. Similarly, there exist many accessibility tools, like speech-to-text and text-to-speech, are invaluable to those who need them *and* generally using LLMs to do so. Any potential policy should ensure that we allow accessibility tools and focus on LLM usage that creates a potential burden for maintainers, rather than focusing on LLM usage to achieve an "untainted" code base. (This is the sacrifice of "purity" described in the summary.) | ||
|
|
||
| Ultimately, the goal is to avoid a situation where users are encouraged to be dishonest about LLM usage, since this creates a situation where everyone is uncomfortable. Many LLM users, including team members, have indicated that they might simply continue using LLMs and avoid disclosure for fear of repercussions, and this is a very uncomfortable position to be in. It means that LLM users are encouraged to be dishonest about their actions, and it means that maintainers are forced to accuse users of LLM usage whenever they're suspicious, which can be hostile. This is combined on top of the mention of *trivial* LLM usage, as mentioned before: if we don't distinguish usage that actually affects the end result, people stop caring and we stop knowing whether the result is affected, which makes reviewing contributions difficult. |
There was a problem hiding this comment.
Slightly nitpicky but I think the opening here would read better and clearer without the double negatives:
Ultimately, the goal is to avoid a situation where users are encouraged to be dishonest about LLM usage, since this creates a situation where everyone is uncomfortable.
might become:
Ultimately, the goal is to encourage users to be honest about LLM usage, since this promotes an environment of collaboration in good faith.
This rephrasing would also reduce word count (slightly) and the good faith phrasing would tie in with this wording later in the paragraph too:
... LLM users are encouraged to be dishonest about their actions, and it means that maintainers are forced to accuse users of LLM usage whenever they're suspicious ...
There was a problem hiding this comment.
I like this proposal; if you hadn't given the example I probably would have struggled to word it, but when I get to it I'll update that line and make sure that there aren't any others that should be updated too.
Even if we are ultimately intending for the double negative, wording more positively is good IMHO.
There was a problem hiding this comment.
I love the detailed write-up on the concerns and effectiveness of GenAI, well done! <3
I'll add my own views to the discussion.
The focus on honesty is something I can respect, but unfortunately being honest about doing something unethical does not make it okay. It should not excuse the act. If I steal your bag of chips, and then state I stole it, it does not mean I get away with it. That's just an admission of guilt.
It would be a very bad look for the Rust project if it knowingly accepted contributions authored by GenAI, and would be very sad. The Rust project has from the start made ethical stances, such as focusing on being inclusive. This should not be different.
I would also explicitly add that the policy covers GenAI images, video, and everything else. The rules for code and other GenAI should not be different. In my views, we should be in solidarity with other affected professions in standing against it.
As a community, we should not look the other way when faced with the massive ethical issues and injustice with GenAI. We should not accept it. No-one should be able to proudly say they've used it for a contribution and then get their contribution accepted.
As for usage of GenAI in research, prototyping, search and similar cases, where the output is not part of the contribution, I feel it should be discouraged, but we can't realistically do more than that. With authorship, usage of GenAI should be banned completely.
Accessibility-wise, I am fine with a carve-out, as long as it is not too broad. The situation here is unfortunate; if all the money wasted on GenAI was instead spent on accessibility then we would have way better accessibility and it would be ethical. Sadly, that is not the world we live in, so we need to make do with what we have.
Also, I believe the policy should be a minimum baseline. Projects should be able to make it stricter, but not weaker.
There was a problem hiding this comment.
If I steal your bag of chips, and then state I stole it, it does not mean I get away with it. That's just an admission of guilt.
Nothing in this RFC states that you get away from it or not.
There was a problem hiding this comment.
If I steal your bag of chips, and then state I stole it, it does not mean I get away with it. That's just an admission of guilt.
Nothing in this RFC states that you get away from it or not.
I don't know what you mean by this. It states that disclosed nontrivial GenAI use is okay. (and in case of trivial, even undisclosed)
There was a problem hiding this comment.
FWIW, part of the goal of this comment:
To keep things focused on policy, there are two broad categories of comments we'd like to request you avoid:
- Simply stating your viewpoint on LLMs, even if you provide reasons. While these arguments can be useful for the RFC, they are better worded as explicit suggestions to specific areas of the RFC, rather than as just general comments.
Is mostly to avoid situations where things devolve into a discussion on whether ethics ultimately make usage irredeemable or not, since I don't think that it's going to go particularly well on the RFC. For what it's worth, I agree with you, although reiterating what I mentioned on fedi, the goal is to:
- Push the bar in a direction that focuses on accountability (if you do use an LLM, be honest about it and don't make the code worse for it)
- Put the ethical arguments on the table so that they can be used later to potentially strengthen the policy to reduce LLM usage
Right now, I'm only aiming for a low bar mostly because I feel like it's most likely to be accepted, and right now there's kind of a lot of open questions that are causing headaches for reviewers. Having a policy in place will make it easier to change the policy, etc.
Also, FWIW, I do explicitly outline in the policy that usage is not encouraged, only allowed for pragmatic reasons.
To respond to a few other points of feedback:
I would also explicitly add that the policy covers GenAI images, video, and everything else. The rules for code and other GenAI should not be different. In my views, we should be in solidarity with other affected professions in standing against it.
So, right now the policy does clarify that both text and code are included, although I didn't mention images, video, and audio since it's not particularly relevant here. However, I do think that this is kind of a weak argument and it's good to be specific.
Accessibility-wise, I am fine with a carve-out, as long as it is not too broad.
This is kind of what I aimed to go for with the trivial usage definition; the goal is that if the end result was just accessibility, or something that otherwise is indistinguishable, we probably shouldn't be wasting enforcement bandwidth on trying to litigate that. Additionally, it pushes people to understand what they're really using LLMs for, which hopefully will lead to people understanding their effects better.
Wikipedia's policy is one mentioned in the prior art that stands out pretty well: they allow LLMs for copy-editing but my understanding is that this is mostly pragmatic for enforcement reasons, and they can't really preserve a tone that condones LLM usage when things are publicly editable. Things are different here since you can't just edit others' text, but it makes sense to include in a policy for now when the status quo is people can just use LLMs and… everyone just feels uncomfortable about it because they don't know where the line for enforcement is.
There was a problem hiding this comment.
FWIW, part of the goal of this comment:
To keep things focused on policy, there are two broad categories of comments we'd like to request you avoid:
- Simply stating your viewpoint on LLMs, even if you provide reasons. While these arguments can be useful for the RFC, they are better worded as explicit suggestions to specific areas of the RFC, rather than as just general comments.
Is mostly to avoid situations where things devolve into a discussion on whether ethics ultimately make usage irredeemable or not, since I don't think that it's going to go particularly well on the RFC. For what it's worth, I agree with you, although reiterating what I mentioned on fedi, the goal is to:
1. Push the bar in a direction that focuses on accountability (if you do use an LLM, be honest about it and don't make the code worse for it) 2. Put the ethical arguments on the table so that they can be used later to potentially strengthen the policy to reduce LLM usageRight now, I'm only aiming for a low bar mostly because I feel like it's most likely to be accepted, and right now there's kind of a lot of open questions that are causing headaches for reviewers. Having a policy in place will make it easier to change the policy, etc.
Also, FWIW, I do explicitly outline in the policy that usage is not encouraged, only allowed for pragmatic reasons.
Oh, I apologize, my mind completely skipped over that.
I see what you mean, though I fear that once a policy is adopted, there will be much less of a drive to amend it to make it better.
To respond to a few other points of feedback:
I would also explicitly add that the policy covers GenAI images, video, and everything else. The rules for code and other GenAI should not be different. In my views, we should be in solidarity with other affected professions in standing against it.
So, right now the policy does clarify that both text and code are included, although I didn't mention images, video, and audio since it's not particularly relevant here. However, I do think that this is kind of a weak argument and it's good to be specific.
The reason I feel so strongly about this is that all of these uses of GenAI have the same drawbacks, so it's much more effective to organize against it as a bigger group, rather than only as developers. Though, this is I guess less so about policy, and more so about making a coherent movement against all of GenAI.
Kind of like a nod towards other professions that we are on the same side.
This is more so a nice-to-have for me, rather than a requirement, though. So it's fine to drop it.
Accessibility-wise, I am fine with a carve-out, as long as it is not too broad.
This is kind of what I aimed to go for with the trivial usage definition; the goal is that if the end result was just accessibility, or something that otherwise is indistinguishable, we probably shouldn't be wasting enforcement bandwidth on trying to litigate that. Additionally, it pushes people to understand what they're really using LLMs for, which hopefully will lead to people understanding their effects better.
Wikipedia's policy is one mentioned in the prior art that stands out pretty well: they allow LLMs for copy-editing but my understanding is that this is mostly pragmatic for enforcement reasons, and they can't really preserve a tone that condones LLM usage when things are publicly editable. Things are different here since you can't just edit others' text, but it makes sense to include in a policy for now when the status quo is people can just use LLMs and… everyone just feels uncomfortable about it because they don't know where the line for enforcement is.
I guess I'll concede with the trivial usage case and not make a problem out of it, as long as the scope of that stays very limited. Banning this might be more effort than it's worth, possibly. Would be nice to discourage its use for non-accessibility cases, here, though.
Non-trivial use is going to be the most harmful, and I would rather completely ban that, rather than just require disclosure.
There was a problem hiding this comment.
The RFC provides a lower bound of what is disallowed on non-trivial use. It requires disclosure, it doesn't say disclosed non-trivial use is always OK.
Yes, and I think it should be expanded to ban all non-trivial use.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
There was a problem hiding this comment.
Hi, this is the moderation team of the Rust project: this isn’t the right place to turn to a broad / general discussion about technology; I've collapsed the last 3 replies for this reason and ask you all not to continue that line of discussion.
As a general remark on this review thread: This RFC isn't even the right place for having too broad discussions about the topic of “LLM use in Rust”; we should stay focused to exchanges that do in some way still relate to the RFC at hand.
We're already having broader discussions on that matter in private channels for rust-lang members, and (even with that limitation on who can participate) it's still so much volume that it's becoming hard to follow; so I unfortunately also can't offer any alternative place where everyone else can participate in broader and/or longer discussions on that topic. [We're also still investigating ways to make as much of this as possible accessible for others to at least read.]
|
|
||
| 1. If the LLM usage is *trivial*, it is completely ignored by the policy and always allowed. Generally, this means that changes made by LLMs are indistinguishable from those made by humans, where the LLM didn't have any creative input into the change. | ||
| 2. If the LLM usage is *slop*, it is considered spam and moderated accordingly. Generally, this means submitting changes made by LLMs with minimal human intervention. | ||
| 3. *Nontrivial* LLM usage must be *disclosed* in ideally as detailed as a manner as possible. This may necessitate additional tooling to notify new contributors about the policy and explain how disclosure works. |
There was a problem hiding this comment.
In my opinion, it's best to ban it, rather than require disclosure. A bad thing is still a bad thing, regardless of whether you disclose it or not.
There was a problem hiding this comment.
FWIW, I am considering how this could be strengthened, although I do want to try and find a way for people to be honest regardless. Because under full bans there are going to be people who just decide to then be dishonest and we don't want them clogging up the moderation queue.
This is also the benefit of the trivial usage cause, because it pushes people to say "I used an AI but I had full creative control" rather than simply not disclosing and requiring extra back and forth. Although there are genuinely points to consider on whether all this matters because unfortunately everything is just guesswork right now, since we have no idea how people will act after a policy is in place.
There was a problem hiding this comment.
People may lie about it, just like they also could be writing the contribution on company time and using company equipment, without the company stating the copyright is given to the author. This case is way harder to detect, but we still cannot accept it.
I believe people lying about it should not be a reason not to enact a ban on non-trivial use. To me, the project should try to uphold the highest ethical standards practically possible.
GenAI code may slip through, and that is of course not ideal, but at least the project is striving to be ethical.
There was a problem hiding this comment.
I don't believe ethics should come into this at all. What's next, you ban people from contributing who eat meat?
The code of conduct exists because it is important to be inclusive and maintain good interpersonal relationships for the people working on Rust, in order for the project to be successful. It is explicitly not an attempt to push your own agenda on others.
There was a problem hiding this comment.
I don't believe ethics should come into this at all. What's next, you ban people from contributing who eat meat?
The code of conduct exists because it is important to be inclusive and maintain good interpersonal relationships for the people working on Rust, in order for the project to be successful. It is explicitly not an attempt to push your own agenda on others.
This policy has nothing to do with judging what people do in their personal lives. It's not preventing you from using GenAI in your personal projects. Nor is it restricting you in what you can do in your own forks of Rust projects. It is also not my goal to dictate what you do in your personal life, as I do not think any policy should be doing that.
It's also not my goal to be doing value judgements on people, and I do not think a policy should be doing that, either. Just outline what is acceptable for the project and its users. I am not interested in witch hunts and such, either.
If you do not care about the ethical issues with GenAI, or you disagree that there are ethical issues, that is also your choice and I am not here to tell you what to believe.
(If this is veering too much into an off-topic direction, feel free to tell me to hide this comment, and I will hide it.)
There was a problem hiding this comment.
I believe people lying about it should not be a reason not to enact a ban on non-trivial use.
[…] What's next, you ban people from contributing who eat meat?
I’d like everyone to pay attention to the different uses of “ban”. In prior discussions I’ve already noticed that it’s an easy thing to gloss over when using this verb. @clarfonthey did a good job avoiding ambiguities IMO (and there’s not a single instance of “ban” in the RFC Summary section)… but in this comment thread it’s coming up again and (I can’t read minds, but) this might have lead to some degree of miscommunication.
The commentary from @lumi-me-not could/should perhaps more closely relate to the already included point no. 6 in the RFC text, stating:
- Teams are allowed to form their own policies regarding nontrivial LLM usage, although as long as users are honest, follow the COC otherwise, and respect boundaries, the worst that will happen is the rejection of a contribution.
If the idea to be discussed is to raise the baseline from the minimum – possibly up to the maximum – and perhaps even not to allow teams some (or any) freedom in allowing less restrictions – within this frame of possibilities, a “enact a ban on non-trivial use” would still just mean “documenting the practice that certain kinds of PRs will generally be rejected”. It wouldn’t have to mean that any such person will be “banned” in a moderation / GitHub-blocking kind of sense.
[Moreover “rejected” doesn’t even have to mean that all problematic PRs are necessarily being closed; small but non-trivial use could still be reduced due to review/feedback.]
Of course, that distinction of “blocking people” or “rejecting contributions” is completely impossible to make in the “people who eat meat” analogy, which is why I think it’s a rather bad analogy.
There was a problem hiding this comment.
If you do not care about the ethical issues with GenAI, or you disagree that there are ethical issues, that is also your choice and I am not here to tell you what to believe.
While you say you are not trying to tell individuals what to believe, the proposed policy still asks the project itself to adopt a particular (subjective) ethical stance and evaluate contributions through that lens, rather than primarily on practical or technical grounds.
This inherently alienates anyone who doesn't share the same beliefs, even when those beliefs are perfectly compatible with the success and harmony of the project.
I think this reflects a deeper disagreement about the role of the project itself: whether it should function primarily as a technical collaboration with minimal ideological filtering, or as a community that uses its position and influence as a successful project to steer contributors toward particular ethical positions through its contribution policies.
View all comments
Preface
A lot of discussion has occurred in private about the topic of LLM policy, and while some of that context has been included in the prior art, most of it is intentionally omitted here.
To keep things focused on policy, there are two broad categories of comments we'd like to request you avoid:
In general, defer to the code of conduct.
This RFC is long, and most of it is the sections surrounding and justifying the actual policy, rather than the policy itself. You are both welcome to and encouraged to skip around using the outline feature on GitHub. (In the rendered view, this is the bulleted list button on the top-right of the file view.)
While I have received a lot of feedback that the RFC is perhaps too long, a lot of the sections that might be considered for removal constitute important arguments that have been mentioned. In terms of discussion on length alone, I would appreciate that these arguments be directed on potentially simplifying the text of the policy itself, rather than just removing sections of motivation. Similarly, if you feel a particular argument could be expanded or added, feel free to mention that as well.
In general, I know that this RFC is going to be exhausting, and the discussion before it has been too. Removing details is not really going to help, although I'd be happy to accept any feedback on revising the text of these sections to be shorter without losing any meaning.
Important
Since RFCs involve many conversations at once that can be difficult to follow, please use review comment threads on the text changes instead of direct comments on the RFC.
If you don't have a particular section of the RFC to comment on, you can click on the "Comment on this file" button on the top-right corner of the diff, to the right of the "Viewed" checkbox. This will create a separate thread even if others have commented on the file too.
Existing policies
Right now, there are a few concurrent policy decisions:
rust-lang/rust: Add an LLM policy forrust-lang/rustrust-forge#1040This RFC intends to supersede the two existing policy RFCs, but it intentionally does not supersede the current policy scoped to
rust-lang/rust; that policy is free to be merged before this one, as its original goal was to put a policy in place before an RFC was accepted. In fact, even if after an RFC is accepted, that one can still be merged, since updating the policies everywhere takes time, and getting a policy out immediately is still a net benefit.Once an RFC is accepted, things can be adjusted for consistency.
Summary
This RFC proposes a strict policy regarding generative Artificial Intelligence (AI) models, specifically Large Language Models (LLMs), and their use within the rust-lang organization.
It proposes an "honesty over purity" policy where maintainers are given broad authority to decide what amount of LLM-generated code is acceptable, while avoiding repercussions for those who do use LLMs and are honest about it. This can be summarized in the following checklist with terms that will be defined throughout the RFC:
In terms of additional tooling for disclosure, this RFC encourages the creation of a bot that automatically replies to contributions from new users informing them of the LLM policy and what constitutes sufficient disclosure. As mentioned, in general, going into as much detail as possible (e.g. prompts used, etc.) is preferred, but not always required. The RFC leaves the exact details of such implementation unspecified and up for revision later.
Rendered