How to Stop ChatGPT from Going Off the Rails

Share

When WIRED asked me to cover this week’s newsletter, my first instinct was to ask ChatGPT—OpenAI’s viral chatbot—to see what it came up with. It’s what I’ve been doing with emails, recipes, and LinkedIn posts all week. Productivity is way down, but sassy limericks about Elon Musk are up 1000 percent.

I asked the bot to write a column about itself in the style of Steven Levy, but the results weren’t great. ChatGPT served up generic commentary about the promise and pitfalls of AI, but didn’t really capture Steven’s voice or say anything new. As I wrote last week, it was fluent, but not entirely convincing. But it did get me thinking: Would I have gotten away with it? And what systems could catch people using AI for things they really shouldn’t, whether that’s work emails or college essays?

To find out, I spoke to Sandra Wachter, a professor of technology and regulation at the Oxford Internet Institute who speaks eloquently about how to build transparency and accountability into algorithms. I asked her what that might look like for a system like ChatGPT.

Amit Katwala: ChatGPT can pen everything from classical poetry to bland marketing copy, but one big talking point this week has been whether it could help students cheat. Do you think you could tell if one of your students had used it to write a paper?

Sandra Wachter: This will start to be a cat-and-mouse game. The tech is maybe not yet good enough to fool me as a person who teaches law, but it may be good enough to convince somebody who is not in that area. I wonder if technology will get better over time to where it can trick me too. We might need technical tools to make sure that what we’re seeing is created by a human being, the same way we have tools for deepfakes and detecting edited photos.

That seems inherently harder to do for text than it would be for deepfaked imagery, because there are fewer artifacts and telltale signs. Perhaps any reliable solution may need to be built by the company that’s generating the text in the first place. 

You do need to have buy-in from whoever is creating that tool. But if I’m offering services to students I might not be the type of company that is going to submit to that. And there might be a situation where even if you do put watermarks on, they’re removable. Very tech-savvy groups will probably find a way. But there is an actual tech tool [built with OpenAI’s input] that allows you to detect whether output is artificially created. 

What would a version of ChatGPT that had been designed with harm reduction in mind look like? 

A couple of things. First, I would really argue that whoever is creating those tools put watermarks in place. And maybe the EU’s proposed AI Act can help, because it deals with transparency around bots, saying you should always be aware when something isn’t real. But companies might not want to do that, and maybe the watermarks can be removed. So then it’s about fostering research into independent tools that look at AI output. And in education, we have to be more creative about how we assess students and how we write papers: What kind of questions can we ask that are less easily fakeable? It has to be a combination of tech and human oversight that helps us curb the disruption.