Apple researchers have published a new paper on an artificial intelligence (AI) model that it claims is capable of understanding contextual language. The yet-to-be peer-reviewed research paper also mentions that the large language model (LLM) can operate entirely on-device without consuming a lot of computational power. The description of the AI model makes it seem suited for the role of a smartphone assistant, and it could upgrade Siri, the tech giant’s native voice assistant. Last month, Apple published another paper about a multimodal AI model dubbed MM1.
The research paper is currently in the pre-print stage and is published on arXiv, an open-access online repository of scholarly papers. The AI model has been named ReALM, which is shortened for Reference Resolution As Language Model. The paper highlights that the primary focus of the model is to perform and complete tasks that are prompted using contextual language, which is more common to how humans speak. For instance, as per the paper’s claim, it will be able to understand when a user says, “Take me to the one that’s second from the bottom”.
ReALM is made for performing tasks on a smart device. These tasks are divided into three segments — on-screen entities, conversational entities, and background entities. Based on the examples shared in the paper, on-screen entities refer to tasks that appear on the screen of the device, conversational entities are based on what the user has requested, and background entities refer to tasks that are occurring in the background such as a song playing on an app.
What is interesting about this AI model is that the paper claims despite taking on the complex task of understanding, processing, and performing actions suggested via contextual prompts, it does not require high amounts of computational energy, “making ReaLM an ideal choice for a practical reference resolution system that can exist on-device without compromising on performance.” It achieves this by using significantly fewer parameters than major LLMs such as GPT-3.5 and GPT-4.
The paper also goes on to claim that despite working in such a restricted environment, the AI model demonstrated “substantially” better performance than OpenAI’s GPT-3.5 and GPT-4. The paper further elaborates that while the model scored better on text-only benchmarks than GPT-3.5, it outperformed GPT-4 for domain-specific user utterances.
While the paper is promising, it is not peer-reviewed yet, and as such its validity remains uncertain. But if the paper gets positive reviews, that might push Apple to develop the model commercially and even use it to make Siri smarter.