top of page

The race to bring generative AI to mobile devices

Updated: Apr 29



05-15-2023 Article by Richard Waters


Tech companies seek processing power in handsets to reduce computing costs and improve speed of AI chatbots


Please use the sharing tools found via the share button at the top or side of articles. Copying articles to share with others is a breach of FT.com T&Cs and Copyright Policy. Email licensing@ft.com to buy additional rights. Subscribers may share up to 10 or 20 articles per month using the gift article service. More information can be found here


The race is on to bring the technology behind ChatGPT to the smartphone in your pocket. And to judge from the surprising speed at which the technology is advancing, the latest moves in artificial intelligence could transform mobile communications and computing far faster than seemed likely just months ago. 


As tech companies rush to embed generative AI into their software and services, they face significantly higher computing costs. The concern has weighed in particular on Google, with Wall Street analysts warning that the company’s profit margins could be squeezed if internet search users come to expect AI-generated content in standard search results. 


Running generative AI on mobile handsets, rather than through the cloud on servers operated by big tech groups, could answer one of the biggest economic questions raised by the latest tech fad. 


Google said last week that it had managed to run a version of PaLM 2, its latest large language model, on a Samsung Galaxy handset. Though it did not publicly demonstrate the scaled-down model, called Gecko, the move is the latest sign that a form of AI that has required computing resources only found in a data centre is quickly starting to find its way into many more places. 


The shift could make services such as chatbots far cheaper for companies to run and pave the way for more transformative applications using generative AI.


 You need to make the AI hybrid ” [running in both] the data centre and locally ” otherwise it will cost too much money,” Cristiano Amon, chief executive of mobile chip company Qualcomm, told the Financial Times. Tapping into the unused processing power on mobile handsets was the best way to spread the cost, he said.


 When the launch of ChatGPT late last year brought generative AI to widespread attention, the prospect of bringing it to handsets seemed distant. Besides training the so-called large language models behind such services, the work of inferencing — or running the models to produce results — is also computationally demanding. Handsets lack the memory to hold large models like the one behind ChatGPT, as well as the processing power required to run them. Generating a response to a query on a device, rather than waiting for a remote data centre to produce a result, could also reduce the latency, or delay, from using an application. 


When a user’s personal data is used to refine the generative responses, keeping all the processing on a handset could also enhance privacy. More than anything, generative AI could make it easier to carry out common activities on a smartphone, for instance when it comes to things that involve producing text. You could embed [the AI] in every office application: You get an email, it suggests a response,” said Amon. Young are going to need the ability to run those things locally as well as on the data centre.” Rapid advances in some of the underlying models have changed the equation. 


The biggest and most advanced, such as Google’s PaLM 2 and OpenAI’s GPT-4, have hogged the headlines. But an explosion of smaller models has made some of the same capabilities available in less technically demanding ways. These have benefited in part from new techniques for tuning language models based on a more careful curation of the data sets they are trained on, reducing the amount of information they need to hold. 


According to Arvind Krishna, chief executive of IBM, most companies that look to use generative AI in their own services will get much of what they need by combining a number of these smaller models. Speaking last week as IBM announced a technology platform to help its customers tap into generative AI, he said that many would opt to use open-source models, where the code was more transparent and could be adapted, in part because it would be easier to fine-tune the technology using their own data. Some of the smaller models have already demonstrated surprising capabilities. 


They include LLaMa, an open-source language model released by Meta, which is claimed to have matched many of the features of the largest systems. LLaMa comes in various sizes, the smallest of which has only 7bn parameters, far fewer than the 175bn of GPT-3, the breakthrough language model OpenAI released in 2020; the number of parameters in GPT-4, released this year, has not been disclosed. A research model based on LLaMa and developed at Stanford University has already been shown running on one of Google’s Pixel 6 handsets. 


As well as their far smaller size, the open-source nature of models such as this has also made it easier for researchers and developers to adapt them for different computing environments. Qualcomm earlier this year showed off what it claimed was the first Android handset running Stable Diffusion’s image-generation model, which has about 1bn parameters. The chipmaker had “quantised”, or cut down the model’s size to run it more easily on a handset without losing any of its accuracy, said Ziad Asghar, a senior vice-president at Qualcomm.


Please use the sharing tools found via the share button at the top or side of articles. Copying articles to share with others is a breach of FT.com T&Cs and Copyright Policy. Email licensing@ft.com to buy additional rights. Subscribers may share up to 10 or 20 articles per month using the gift article service. More information can be found here


Zoubin Ghahramani, a vice-president at Google DeepMind, the internet companyâs AI research arm, said its Gecko mobile model could process 16 tokens per second’s” a measure based on the number of short text units large language models work with. Most large models use one to two tokens per word generated, suggesting that Gecko might produce about 10 to 15 words per second on a handset, potentially making it suitable for suggesting text messages or short email responses. 


The particular requirements of mobile handsets meant that attention was likely to shift quickly to so-called multimodal models that can work with a range of image, text and other inputs, said Qualcomm’s Asghar. Mobile applications were likely to turn heavily on speech and images, he added, rather than text-heavy applications more common on a personal computer. The surprising speed with which generative AI is starting to move to smartphones is set to increase the attention on Apple, which has so far stood apart from the speculative frenzy around the technology.


 Well-known flaws in generative AI, such as the tendency of large models to “hallucinate” — or when the chatbot responds with fabricated information” meant Apple was unlikely to embed the technology into the iPhoneâ’s operating system for some time, said Creative Strategies™ Bajarin. Instead, he predicted that the company would look for ways to make it easier for app developers to start experimenting with the technology in their own services. “This is the posture you’ll see from Microsoft and Google as well: they’ll all want to give developers the tools to go and compete [with generative AI applications], Bajarin said. 


With Apple’s Worldwide Developers Conference set to begin on June 5, preceded by Microsof’s own event for developers called Build, the fight for developer attention is about to get intense. Generative AI may still be in its infancy, but the rush to get into many more users™ hands — and pockets — is already moving into overdrive.

Comentarios


bottom of page