It might technically be Sagittarius season, but Gemini is taking over.
On December 6th, Google DeepMind announced the release of its natively multimodal AI model and competitor to GPT-4, Gemini.
In a blog post heralding Gemini’s arrival, Google DeepMind CEO and Co-Founder Demis Hassabis touted their latest development as the “most capable and general model we’ve ever built.” Stoking rumors that the product series could be a serious candidate for artificial general intelligence (AGI), the impact of Gemini is already being felt in the cutthroat AI business landscape.
But what about its impact on digital products in the year ahead? Here’s what you need to know—the good, the bad, and the potentially ugly.
What is Gemini?
Gemini is not just one product but three; a family of large multimodal models (LMMs) succeeding Google DeepMind’s LaMDA and PaLM 2 neural networks. The three products are:
- Gemini Ultra: The most powerful model, set to be released to developers in early 2024. Ultra promises the ability to “understand,” generate, and combine virtually a wide range of inputs including audio, video, text, code, and images at speeds impressively close to real-time.
- Gemini Pro: The consumer-level model currently available to developers. Users in countries where Bard is available have been able to interact with Gemini Pro since it was announced on December 6th.
- Gemini Nano: The compact model that is designed to power mobile applications. Gemini Nano, like Gemini Ultra, is yet to be released and Google has not yet outlined its performance specifications.
What is multimodal AI?
Multimodal AI is artificial intelligence that is capable of processing multiple data types by using multiple intelligence processing algorithms. For example, a text-to-image AI model like Midjourney is considered a multimodal AI.
While many LMMs have already flooded the market, what sets Gemini apart from other large multimodal languages is that it is designed to “understand” many inputs (text, video, audio, code, and images) in a way that mimics human comprehension and creativity.
The Good: The potential of multimodal AI tech in 2024
Not even a full calendar year after the text-to-text generation of ChatGPT took over the public imagination, Gemini marks a new era in which multimodal functionality will soon become table stakes for AI-powered products.
“We've started thinking about things like real-time analysis of what you're visually seeing,” says Hubbell. “So I see new products, like glasses that can take in video feed, now being able to take video feed in and generate results (in real-time).”
While this will enable product teams a level of flexibility that was never possible before, it will also invariably lead to the early obsolescence of products that have been built to expand the functionality of platforms like GPT-4 to enable similar functionalities to Gemini—all of which are only a few months old themselves.
“(These products) will soon either be irrelevant or will need to be totally redesigned to incorporate all the new stuff that’s come in that they ‘hacked’ their way around before,” says Hubbell.
(These products) will soon either be irrelevant or will need to be totally redesigned to incorporate the stuff that they ‘hacked’ their way around before.KEN HUBBELL, CEO, SOFFOS.AI
While Hubbell admits that a lot of companies are going to take a bath on the investments they’ve poured into these now-obsolete workarounds, he says this is actually good news in disguise.
“When Alexa came out, a lot of people were finding ways of doing things that the Alexa product itself couldn't do—I was one of those people,” chuckles Hubbell, noting that the team at Amazon would quickly catch on to the features external developers were building and integrate them into the Alexa platform.
“It kind of screwed those of us who were developers, but on the other hand, it improved the back end so much that we were now able to focus on things that we didn't have to hack around—and we could actually make the final product that we really wanted.”
The Bad: Early reception of Gemini
While this new era of highly capable multimodal AI sounds revolutionary, early reception of Gemini Pro has been widely disappointing. During the week following the product launch announcement, LinkedIn and X feeds everywhere began to fill up with disgruntled reviews from users who had played around with Gemini Pro.
The problem? Gemini seems to be…kind of dumb.
Articles on TechCrunch and from multiple Medium authors highlighted posts from social media users showing screenshots of Gemini failing to answer questions correctly.
But as Hubbell explains, the platform is doing exactly what anyone should have expected it to do.
“Raising an AI is actually a lot like raising a child,” says Hubbell, who (before you ask), is a father as well. He remarks that LMs can only learn so much in the early phases of training, which is taking place in a closed environment with a very small sample size of users. "Once it's released into the wild, that's where the true growth takes place."
Once it's released into the wild, that's where the true growth takes place.KEN HUBBELL, CEO, SOFFOS.AI
The Ugly: The question of ethics
In their release materials, Google has been very forthcoming with promises of having developed Gemini ‘responsibly.’
This promise has, reasonably, made a few critics squirm in their seats.
An article by ZDNET points out that Google made the choice to omit model cards for Gemini products, which outline details including the potentially harmful outcomes of a neural network. This is especially disconcerting given that a team at Google invented model cards in the first place.
It also begs the question, with a product trained on an inherently biased dataset, who decides what ‘responsibility’ looks like?
In an article published in May on the Mind Foundry blog, Frankie Garcia, Google DeepMind’s new Operational AI Ethics and Safety Manager and former Mind Foundry AI Governance Product Manager, explains what makes a machine learning model trustworthy.
“When decisions have a material impact on the lives of individuals and populations, the importance of model trustworthiness and responsibility cannot be overstated,” writes Garcia in the article co-authored by Professor Brent Mittelstadt of the University of Oxford’s Internet Institute.
The article asserts that there are three key areas of machine learning trustworthiness:
- Bias and fairness: The use of several groups of “fairness metrics”, which the authors admit often counteract each other, to ensure the model is acting fairly.
- Interpretability and explainability: This describes the extent to which a human user can understand the logic and process the AI model has used to arrive at its output.
- Data drift and model fragility: This describes the way in which changes from the original training dataset due to patterns from user inputs and other factors can negatively influence the outputs of the model.
We can deduce that Google is keeping a close watch on these factors to ensure that the Gemini product family stays ‘in check.’ Whether we see a similar commitment from competitors vying for market share remains to be seen.
How does Gemini stack up to ChatGPT?
In terms of raw performance, Google has made some lofty promises for what developers can expect. Here are the performance benchmarks for Gemini Ultra and Gemini Pro versus ChatGPT, according to Google.
A new season, or a new era?
While the release of Gemini might just be the latest act in the ongoing saga of rapid AI innovation, this one feels different.
While the expanded functionality of Gemini’s particular brand of multimodal capabilities obviously presents a nearly unlimited opportunity for digital product development, its foremost contribution is the milestone in AI innovation it represents.
Will Gemini lead us to a true model of AGI before 2024 is out? It might be written in the stars.
Don’t forget to subscribe to our newsletter for more product management insights, guides, and resources created for you by industry leaders and experts.