OpenAI Says GPT-4 Is Great For Content Moderation, But Seems… A Bit Too Trusting

from the i-can’t-block-that,-dave dept

In our Moderator Mayhem mobile browser game that we released back in May, there are a couple of rounds where some AI is introduced to the process, as an attempt to “help” the moderators. Of course, in the game, the AI starts making the kinds of mistakes that AI makes. That said, there’s obviously a role for AI and other automated systems within content moderation and trust & safety, but it has to be done thoughtfully, rather than thinking that “AI will solve it.”

Jumping straight into the deep end, though, is OpenAI, which just announced how they’re using GPT-4 for content moderation, and making a bunch of (perhaps somewhat questionable) claims about how useful it is:

We use GPT-4 for content policy development and content moderation decisions, enabling more consistent labeling, a faster feedback loop for policy refinement, and less involvement from human moderators.

The explanation of how it works is… basically exactly what you’d expect. They write policy guidelines and feed it to GPT-4 and then run it against some content and see how it performs against human moderators using the same policies. It’s also setup so that a human reviewer can “ask” GPT-4 why it classified some content some way, though it’s not at all clear that GPT-4 will give an accurate answer to that question.

The point that OpenAI makes about this is that it allows for faster iteration, because you can roll out new policy changes instantly, rather than having to get a large team of human moderators up to speed. And, well, sure, but that doesn’t necessarily deal with most of the other challenges having to do with content moderation at scale.

The biggest question from me, knowing how GPT-4 works, is how consistent are the outputs? The whole thing about LLM tools like this is that every time you give it the same inputs it may give wildly different outputs. That’s part of the fun of these models that rather than looking for the “correct” answer, it just creates the answer on the fly by taking a probabilistic approach to figuring out what it should say. Given how one of the big complaints about content moderation is “unfair treatment” this seems like a pretty big question.

But not just in the most direct manner: one of the reasons why there are so many complaints about “unfair treatment” between “similar” content is that one thing a trust & safety team often really needs to understand is the deeper nuance and context between claims. Stated more explicitly: malicious and nefarious users often try to hide or justify their problematic behavior by presenting it in a form that mimics content from good actors… and then using that to play victim, pretending there’s been unfair treatment.

This is why some trust & safety experts talk about the “I wasn’t born yesterday…” test that a trust & safety team needs to apply in recognizing those deliberately trying to game the system. Can AI handle that? As of now there’s little to no indication that it can.

And, this gets even more complex and problematic when put into the context of the already problematic DSA in the EU. That requires that platforms explain their moderation choices in most cases. Now, I think that’s extremely problematic for a variety of reasons, and is likely going to lead to even greater abuse and harassment, but it will soon be the law in the EU and then we get to see how it plays out.

But, how does that work here? Yes, GPT-4 can “answer” the question, but again, there’s no way to know if that answer is accurate, or if it’s just saying what it thinks the person wants to hear.

I definitely think that AI tools like GPT-4 will absolutely be a helpful tool for handling trust & safety issues. There is a lot they can do in assisting a trust & safety team. But we should be realistic about its limits, and where it can be best put to use. And OpenAI’s description on this page sounds naively optimistic about some things, and just ill-informed about some others.

Over at Platformer, Casey Newton got a lot more positive responses from a bunch of experts (and they are all experts I trust) about this offering, noting that, at the very least it might be useful in raising the baseline for trust & safety teams, allowing the AI to handle the basic stuff, and pass along the thornier problems for humans to handle. For example, Dave Willner, who until recently ran trust & safety for OpenAI noted that it’s very good in certain circumstances:

“Is it more accurate than me? Probably not,” Willner said. “Is it more accurate than the median person actually moderating? It’s competitive for at least some categories. And, again, there’s a lot to learn here. So if it’s this good when we don’t really know how to use it yet, it’s reasonable to believe it will get there, probably quite soon.”

Similarly, Alex Stamos from the Stanford Internet Observatory said that in testing various AI systems, his students found GPT-4 to be really strong:

Alex Stamos, director of the Stanford Internet Observatory, told me that students in his trust and safety engineering course this spring had tested GPT-4-based moderation tools against their own models, Google/Jigsaw’s Perspective model, and others.

“GPT-4 was often the winner, with only a little bit of prompt engineering necessary to get to good results,” said Stamos, who added that overall he found that GPT-4 works “shockingly well for content moderation.”

One challenge his students found was that GPT-4 is simply more chatty than they are used to in building tools like this; instead of returning a simple number reflecting how likely a piece of content is to violate a policy, it responded with paragraphs of text.

Still, Stamos said, “my students found it to be completely usable for their projects.”

That’s good to hear, but it also… worries me. Note that both Willner and Stamos highlighted how it was good with caveats. But in operationalizing tools like this, I’m wondering how many companies are going to pay that much attention to those caveats as opposed to just going all in.

Again, I think it’s a useful tool. I keep talking about how important it is in all sorts of areas for us to look at AI as a tool that helps improve base level for all sorts of jobs, and how that could even revitalize the middle class. So, in general, this is a good step forward, but there’s a lot about it that makes me wonder if those implementing it will really understand its limitations.

Filed Under: , , , , ,

Companies: openai


Source link