Hacking Generative AI for Fun and Profit

Share

You hardly need ChatGPT to generate a list of reasons why generative artificial intelligence is often less than awesome. The way algorithms are fed creative work often without permission, harbor nasty biases, and require huge amounts of energy and water for training are all serious issues.

Putting all that aside for a moment, though, it is remarkable how powerful generative AI can be for prototyping potentially useful new tools.

I got to witness this firsthand by visiting Sundai Club, a generative AI hackathon that takes place one Sunday each month near the MIT campus. A few months ago, the group kindly agreed to let me sit in and chose to spend that session exploring tools that might be useful to journalists. The club is backed by a Cambridge nonprofit called Æthos that promotes socially responsible use of AI.

The Sundai Club crew includes students from MIT and Harvard, a few professional developers and product managers, and even one person who works for the military. Each event starts with a brainstorm of possible projects that the group then whittles down to a final option that they actually try to build.

Notable pitches from the journalism hackathon included using multimodal language models to track political posts on TikTok, to auto-generate freedom of information requests and appeals, or to summarize video clips of local court hearings to help with local news coverage.

In the end, the group decided to build a tool that would help reporters covering AI identify potentially interesting papers posted to the Arxiv, a popular server for research paper preprints. It’s likely my presence swayed them here, given that I mentioned at the meeting that scouring the Arxiv for interesting research was a high priority for me.

After coming up with a goal, coders on the team were able to create a word embedding—a mathematical representation of words and their meanings—of Arxiv AI papers using the OpenAI API. This made it possible to analyze the data to find papers relevant to a particular term, and to explore relationships between different areas of research.

Using another word embedding of Reddit threads as well as a Google News search, the coders created a visualization that shows research papers along with Reddit discussions and relevant news reports.

The resulting prototype, called AI News Hound, is rough-and-ready, but it shows how large language models can help mine information in interesting new ways. Here’s a screenshot of the tool being used to search for the term “AI agents.” The two green squares closest to the news article and Reddit clusters represent research papers that could potentially be included in an article on efforts to build AI agents.

Compliments of Sundai Club.