Published on January 7, 2024, 5:29 pm
Google has recently unveiled its new generative AI platform called Gemini. While the platform shows promise in some aspects, it falls short in others. So, what exactly is Gemini? How can it be used? And how does it compare to other AI platforms?
Gemini is Google’s next-gen generative AI model family that has been developed by Google’s AI research labs DeepMind and Google Research. It comes in three variations. One of the key features that sets Gemini apart from other models is its “natively multimodal” capability. This means that Gemini can work with various forms of data such as text, audio, images, and videos. The models have been trained on a wide range of data sources including different languages, codebases, and multimedia content.
Unlike Google’s LaMDA language model, which is limited to working with text data only, Gemini models have the ability to understand and generate content beyond just text. However, their proficiency in understanding images, audio, and other modalities is still somewhat limited.
One aspect that has caused some confusion is Google’s branding strategy. Google did not initially make it clear that Gemini is separate from Bard. Bard serves as an interface for accessing certain Gemini models while Gemini itself refers to the family of models rather than an individual app or frontend experience. To draw a comparison to OpenAI’s products, Bard can be likened to ChatGPT (a conversational AI app) while Gemini corresponds to the underlying language model powering it.
Gemini models possess a range of capabilities such as transcribing speech, captioning images and videos, and generating artwork. However, most of these capabilities have not reached full product stage yet.
Google’s initial demonstrations of Gemini fell short of expectations when compared to their claims about the platform’s abilities. Some video demonstrations turned out to have been heavily edited or aspirational rather than showcasing actual capabilities. Despite this setback, there are plans for further development and expansion of Gemini’s features.
Gemini is available in different tiers. Gemini Ultra, which serves as the foundation model for others, is currently accessible only to a select set of customers across certain Google apps and services. Gemini Ultra can be used for tasks such as solving physics problems step-by-step, identifying relevant scientific papers, and updating data charts with more recent information. It has some image generation capabilities, but this feature won’t be included in the productized version at launch.
Gemini Pro, on the other hand, is publicly available and has been launched in text-only form within Bard. Early impressions suggest that it performs better than OpenAI’s GPT-3.5 in reasoning and understanding complex chains of logic. However, like most large language models, it struggles with math problems involving multiple digits and may make errors in reasoning. Gemini Pro is also accessible via API in Google’s Vertex AI platform.
Gemini Nano is a smaller version of the Gemini Pro and Ultra models that can run directly on some smartphones instead of relying on a server. It powers features such as summarization in the Recorder app and Smart Reply in Gboard.
While there are claims about Gemini’s superiority on benchmarks, indicating improvements over existing models, some early feedback suggests that its performance may not be significantly better than its counterparts from OpenAI.
Google plans to release Ultra later this year to allow users to compare its capabilities directly. The pricing details for using Gemini Pro have also been provided – costing $0.0025 per character for the model output once it exits preview stage.
Overall, Google’s new generative AI platform Gemini shows promise with its multimodal capabilities but still has room for improvement. As further developments are made and more use cases are explored, we will gain a clearer understanding of how it stacks up against other AI platforms in the market.