From Digital Age to Nano Age. WorldWide.

Tag: gemini

Robotic Automations

Google Gemini: Everything you need to know about the new generative AI platform | TechCrunch


Google’s trying to make waves with Gemini, its flagship suite of generative AI models, apps and services.

So what is Gemini? How can you use it? And how does it stack up to the competition?

To make it easier to keep up with the latest Gemini developments, we’ve put together this handy guide, which we’ll keep updated as new Gemini models, features and news about Google’s plans for Gemini are released.

What is Gemini?

Gemini is Google’s long-promised, next-gen GenAI model family, developed by Google’s AI research labs DeepMind and Google Research. It comes in three flavors:

  • Gemini Ultra, the most performant Gemini model.
  • Gemini Pro, a “lite” Gemini model.
  • Gemini Nano, a smaller “distilled” model that runs on mobile devices like the Pixel 8 Pro.

All Gemini models were trained to be “natively multimodal” — in other words, able to work with and use more than just words. They were pretrained and fine-tuned on a variety of audio, images and videos, a large set of codebases and text in different languages.

This sets Gemini apart from models such as Google’s own LaMDA, which was trained exclusively on text data. LaMDA can’t understand or generate anything other than text (e.g., essays, email drafts), but that isn’t the case with Gemini models.

What’s the difference between the Gemini apps and Gemini models?

Image Credits: Google

Google, proving once again that it lacks a knack for branding, didn’t make it clear from the outset that Gemini is separate and distinct from the Gemini apps on the web and mobile (formerly Bard). The Gemini apps are simply an interface through which certain Gemini models can be accessed — think of it as a client for Google’s GenAI.

Incidentally, the Gemini apps and models are also totally independent from Imagen 2, Google’s text-to-image model that’s available in some of the company’s dev tools and environments.

What can Gemini do?

Because the Gemini models are multimodal, they can in theory perform a range of multimodal tasks, from transcribing speech to captioning images and videos to generating artwork. Some of these capabilities have reached the product stage yet (more on that later), and Google’s promising all of them — and more — at some point in the not-too-distant future.

Of course, it’s a bit hard to take the company at its word.

Google seriously underdelivered with the original Bard launch. And more recently it ruffled feathers with a video purporting to show Gemini’s capabilities that turned out to have been heavily doctored and was more or less aspirational.

Still, assuming Google is being more or less truthful with its claims, here’s what the different tiers of Gemini will be able to do once they reach their full potential:

Gemini Ultra

Google says that Gemini Ultra — thanks to its multimodality — can be used to help with things like physics homework, solving problems step-by-step on a worksheet and pointing out possible mistakes in already filled-in answers.

Gemini Ultra can also be applied to tasks such as identifying scientific papers relevant to a particular problem, Google says — extracting information from those papers and “updating” a chart from one by generating the formulas necessary to re-create the chart with more recent data.

Gemini Ultra technically supports image generation, as alluded to earlier. But that capability hasn’t made its way into the productized version of the model yet — perhaps because the mechanism is more complex than how apps such as ChatGPT generate images. Rather than feed prompts to an image generator (like DALL-E 3, in ChatGPT’s case), Gemini outputs images “natively,” without an intermediary step.

Gemini Ultra is available as an API through Vertex AI, Google’s fully managed AI developer platform, and AI Studio, Google’s web-based tool for app and platform developers. It also powers the Gemini apps — but not for free. Access to Gemini Ultra through what Google calls Gemini Advanced requires subscribing to the Google One AI Premium Plan, priced at $20 per month.

The AI Premium Plan also connects Gemini to your wider Google Workspace account — think emails in Gmail, documents in Docs, presentations in Sheets and Google Meet recordings. That’s useful for, say, summarizing emails or having Gemini capture notes during a video call.

Gemini Pro

Google says that Gemini Pro is an improvement over LaMDA in its reasoning, planning and understanding capabilities.

An independent study by Carnegie Mellon and BerriAI researchers found that the initial version of Gemini Pro was indeed better than OpenAI’s GPT-3.5 at handling longer and more complex reasoning chains. But the study also found that, like all large language models, this version of Gemini Pro particularly struggled with mathematics problems involving several digits, and users found examples of bad reasoning and obvious mistakes.

Google promised remedies, though — and the first arrived in the form of Gemini 1.5 Pro.

Designed to be a drop-in replacement, Gemini 1.5 Pro is improved in a number of areas compared with its predecessor, perhaps most significantly in the amount of data that it can process. Gemini 1.5 Pro can take in ~700,000 words, or ~30,000 lines of code — 35x the amount Gemini 1.0 Pro can handle. And — the model being multimodal — it’s not limited to text. Gemini 1.5 Pro can analyze up to 11 hours of audio or an hour of video in a variety of different languages, albeit slowly (e.g., searching for a scene in a one-hour video takes 30 seconds to a minute of processing).

Gemini 1.5 Pro entered public preview on Vertex AI in April.

An additional endpoint, Gemini Pro Vision, can process text and imagery — including photos and video — and output text along the lines of OpenAI’s GPT-4 with Vision model.

Using Gemini Pro in Vertex AI. Image Credits: Gemini

Within Vertex AI, developers can customize Gemini Pro to specific contexts and use cases using a fine-tuning or “grounding” process. Gemini Pro can also be connected to external, third-party APIs to perform particular actions.

In AI Studio, there’s workflows for creating structured chat prompts using Gemini Pro. Developers have access to both Gemini Pro and the Gemini Pro Vision endpoints, and they can adjust the model temperature to control the output’s creative range and provide examples to give tone and style instructions — and also tune the safety settings.

Gemini Nano

Gemini Nano is a much smaller version of the Gemini Pro and Ultra models, and it’s efficient enough to run directly on (some) phones instead of sending the task to a server somewhere. So far, it powers a couple of features on the Pixel 8 Pro, Pixel 8 and Samsung Galaxy S24, including Summarize in Recorder and Smart Reply in Gboard.

The Recorder app, which lets users push a button to record and transcribe audio, includes a Gemini-powered summary of your recorded conversations, interviews, presentations and other snippets. Users get these summaries even if they don’t have a signal or Wi-Fi connection available — and in a nod to privacy, no data leaves their phone in the process.

Gemini Nano is also in Gboard, Google’s keyboard app. There, it powers a feature called Smart Reply, which helps to suggest the next thing you’ll want to say when having a conversation in a messaging app. The feature initially only works with WhatsApp but will come to more apps over time, Google says.

And in the Google Messages app on supported devices, Nano enables Magic Compose, which can craft messages in styles like “excited,” “formal” and “lyrical.”

Is Gemini better than OpenAI’s GPT-4?

Google has several times touted Gemini’s superiority on benchmarks, claiming that Gemini Ultra exceeds current state-of-the-art results on “30 of the 32 widely used academic benchmarks used in large language model research and development.” The company says that Gemini 1.5 Pro, meanwhile, is more capable at tasks like summarizing content, brainstorming and writing than Gemini Ultra in some scenarios; presumably this will change with the release of the next Ultra model.

But leaving aside the question of whether benchmarks really indicate a better model, the scores Google points to appear to be only marginally better than OpenAI’s corresponding models. And — as mentioned earlier — some early impressions haven’t been great, with users and academics pointing out that the older version of Gemini Pro tends to get basic facts wrong, struggles with translations and gives poor coding suggestions.

How much does Gemini cost?

Gemini 1.5 Pro is free to use in the Gemini apps and, for now, AI Studio and Vertex AI.

Once Gemini 1.5 Pro exits preview in Vertex, however, the model will cost $0.0025 per character while output will cost $0.00005 per character. Vertex customers pay per 1,000 characters (about 140 to 250 words) and, in the case of models like Gemini Pro Vision, per image ($0.0025).

Let’s assume a 500-word article contains 2,000 characters. Summarizing that article with Gemini 1.5 Pro would cost $5. Meanwhile, generating an article of a similar length would cost $0.1.

Ultra pricing has yet to be announced.

Where can you try Gemini?

Gemini Pro

The easiest place to experience Gemini Pro is in the Gemini apps. Pro and Ultra are answering queries in a range of languages.

Gemini Pro and Ultra are also accessible in preview in Vertex AI via an API. The API is free to use “within limits” for the time being and supports certain regions, including Europe, as well as features like chat functionality and filtering.

Elsewhere, Gemini Pro and Ultra can be found in AI Studio. Using the service, developers can iterate prompts and Gemini-based chatbots and then get API keys to use them in their apps — or export the code to a more fully featured IDE.

Code Assist (formerly Duet AI for Developers), Google’s suite of AI-powered assistance tools for code completion and generation, is using Gemini models. Developers can perform “large-scale” changes across codebases, for example updating cross-file dependencies and reviewing large chunks of code.

Google’s brought Gemini models to its dev tools for Chrome and Firebase mobile dev platform, and its database creation and management tools. And it’s launched new security products underpinned by Gemini, like Gemini in Threat Intelligence, a component of Google’s Mandiant cybersecurity platform that can analyze large portions of potentially malicious code and let users perform natural language searches for ongoing threats or indicators of compromise.

Gemini Nano

Gemini Nano is on the Pixel 8 Pro, Pixel 8 and Samsung Galaxy S24 — and will come to other devices in the future. Developers interested in incorporating the model into their Android apps can sign up for a sneak peek.

Is Gemini coming to the iPhone?

It might! Apple and Google are reportedly in talks to put Gemini to use for a number of features to be included in an upcoming iOS update later this year. Nothing’s definitive, as Apple is also reportedly in talks with OpenAI, and has been working on developing its own GenAI capabilities.

This post was originally published Feb. 16, 2024 and has since been updated to include new information about Gemini and Google’s plans for it.


Software Development in Sri Lanka

Robotic Automations

Watch: Google's Gemini Code Assist wants to use AI to help developers


Can AI eat the jobs of the developers who are busy building AI models? The short answer is no, but the longer answer is not yet settled. News this week that Google has a new AI-powered coding tool for developers, straight from the company’s Google Cloud Next 2024 event in Las Vegas, means that competitive pressures between major tech companies to build the best service to help coders write more code, more quickly is still heating up.

Microsoft’s GitHub Copilot service that has similar outlines has been steadily working toward enterprise adoption. Both companies want to eventually build developer-helping tech that can understand a company’s codebase, allowing it to offer up more tailored suggestions and tips.

Startups are in the fight as well, though they tend to focus more tailored solutions than the broader offerings from the largest tech companies; Pythagora, Tusk and Ellipsis from the most recent Y Combinator batch are working on app creation from user prompts, AI agents for bug-squashing and turning GitHub comments into code, respectively.

Everywhere you look, developers are building tools and services to help their own professional cohort.

Developers learning to code today won’t know a world in which they don’t have AI-powered coding helps. Call it the graphic calculator era for software builders. But the risk — or the worry, I suppose — is that in time the AI tools that are ingesting mountains of code to get smarter to help humans do more will eventually be able to do enough that fewer humans are needed to do the work of writing code for companies themselves. And if a company can spend less money and employ fewer people, it will; no job is safe, but some roles are just more difficult to replace at any given moment.

Thankfully, given the complexities of modern software services, ever-present tech debt and an infinite number of edge cases, what big tech and startups are busy building today seem to be very useful coding helps and not something ready to replace or even reduce the number of humans building them. For now. I wouldn’t take the other end of that bet on a multi-decade time frame.

And for those looking for an even deeper dive into what Google revealed this week, you can head here for our complete rundown, including details on exactly how Gemini Code Assist works, and Google’s in-depth developer walkthrough from Cloud Next 2024.


Software Development in Sri Lanka

Robotic Automations

Google Cloud Next 2024: Watch the keynote on Gemini AI, enterprise reveals right here | TechCrunch


It’s time for Google’s annual look up to the cloud, this time with a big dose of AI.

At 9 a.m. PT Tuesday, Google Cloud CEO Thomas Kurian kicked off the opening keynote for this year’s Google Cloud Next event, and you can watch the archive of their reveals above, or right here.

After this week we’ll know more about Google’s attempts to help the enterprise enter the age of AI. From a deeper dive into Gemini, the company’s AI-powered chatbot, to securing AI products and implementing generative AI into cloud applications, Google will continue to cover a wide range of topics.

We’re also keeping tabs on everything Google’s announcing at Cloud Next 2024, from Google Vids to Gemini Code Assist to Google Workspace updates.

And for those more interested in Google’s details and reveals for developers, their Developer Keynote started off at 11:30am PT Wednesday, and you can catch up on that full stream right here or via the embed below.


Software Development in Sri Lanka

Robotic Automations

Google's Gemini Pro 1.5 enters public preview on Vertex AI | TechCrunch


Gemini 1.5 Pro, Google’s most capable generative AI model, is now available in public preview on Vertex AI, Google’s enterprise-focused AI development platform. The company announced the news during its annual Cloud Next conference, which is taking place in Las Vegas this week.

Gemini 1.5 Pro launched in February, joining Google’s Gemini family of generative AI models. Undoubtedly its headlining feature is the amount of context that it can process: between 128,000 tokens to up to 1 million tokens, where “tokens” refers to subdivided bits of raw data (like the syllables “fan,” “tas” and “tic” in the word “fantastic”).

One million tokens is equivalent to around 700,000 words or around 30,000 lines of code. It’s about four times the amount of data that Anthropic’s flagship model, Claude 3, can take as input and about eight times as high as OpenAI’s GPT-4 Turbo max context.

A model’s context, or context window, refers to the initial set of data (e.g. text) the model considers before generating output (e.g. additional text). A simple question — “Who won the 2020 U.S. presidential election?” — can serve as context, as can a movie script, email, essay or e-book.

Models with small context windows tend to “forget” the content of even very recent conversations, leading them to veer off topic. This isn’t necessarily so with models with large contexts. And, as an added upside, large-context models can better grasp the narrative flow of data they take in, generate contextually richer responses and reduce the need for fine-tuning and factual grounding — hypothetically, at least.

So what specifically can one do with a 1 million-token context window? Lots of things, Google promises, like analyzing a code library, “reasoning across” lengthy documents and holding long conversations with a chatbot.

Because Gemini 1.5 Pro is multilingual — and multimodal in the sense that it’s able to understand images and videos and, as of Tuesday, audio streams in addition to text — the model can also analyze and compare content in media like TV shows, movies, radio broadcasts, conference call recordings and more across different languages. One million tokens translates to about an hour of video or around 11 hours of audio.

Thanks to its audio-processing capabilities, Gemini 1.5 Pro can generate transcriptions for video clips, as well, although the jury’s out on the quality of those transcriptions.

In a prerecorded demo earlier this year, Google showed Gemini 1.5 Pro searching the transcript of the Apollo 11 moon landing telecast (which comes to about 400 pages) for quotes containing jokes, and then finding a scene in movie footage that looked similar to a pencil sketch.

Google says that early users of Gemini 1.5 Pro — including United Wholesale Mortgage, TBS and Replit — are leveraging the large context window for tasks spanning mortgage underwriting; automating metadata tagging on media archives; and generating, explaining and transforming code.

Gemini 1.5 Pro doesn’t process a million tokens at the snap of a finger. In the aforementioned demos, each search took between 20 seconds and a minute to complete — far longer than the average ChatGPT query.

Google previously said that latency is an area of focus, though, and that it’s working to “optimize” Gemini 1.5 Pro as time goes on.

Of note, Gemini 1.5 Pro is slowly making its way to other parts of Google’s corporate product ecosystem, with the company announcing Tuesday that the model (in private preview) will power new features in Code Assist, Google’s generative AI coding assistance tool. Developers can now perform “large-scale” changes across codebases, Google says, for example updating cross-file dependencies and reviewing large chunks of code.




Software Development in Sri Lanka

Robotic Automations

Google injects generative AI into its cloud security tools | TechCrunch


At its annual Cloud Next conference in Las Vegas, Google on Tuesday introduced new cloud-based security products and services — in addition to updates to existing products and services — aimed at customers managing large, multi-tenant corporate networks.

Many of the announcements had to do with Gemini, Google’s flagship family of generative AI models.

For example, Google unveiled Gemini in Threat Intelligence, a new Gemini-powered component of the company’s Mandiant cybersecurity platform. Now in public preview, Gemini in Threat Intelligence can analyze large portions of potentially malicious code and let users perform natural language searches for ongoing threats or indicators of compromise, as well as summarize open source intelligence reports from around the web.

“Gemini in Threat Intelligence now offers conversational search across Mandiant’s vast and growing repository of threat intelligence directly from frontline investigations,” Sunil Potti, GM of cloud security at Google, wrote in a blog post shared with TechCrunch. “Gemini will navigate users to the most relevant pages in the integrated platform for deeper investigation … Plus, [Google’s malware detection service] VirusTotal now automatically ingests OSINT reports, which Gemini summarizes directly in the platform.”

Elsewhere, Gemini can now assist with cybersecurity investigations in Chronicle, Google’s cybersecurity telemetry offering for cloud customers. Set to roll out by the end of the month, the new capability guides security analysts through their typical workflows, recommending actions based on the context of a security investigation, summarizing security event data and creating breach and exploit detection rules from a chatbot-like interface.

And in Security Command Center, Google’s enterprise cybersecurity and risk management suite, a new Gemini-driven feature lets security teams search for threats using natural language while providing summaries of misconfigurations, vulnerabilities and possible attack paths.

Rounding out the security updates were privileged access manager (in preview), a service that offers just-in-time, time-bound and approval-based access options designed to help mitigate risks tied to privileged access misuse. Google’s also rolling out principal access boundary (in preview, as well), which lets admins implement restrictions on network root-level users so that those users can only access authorized resources within a specifically defined boundary.

Lastly, Autokey (in preview) aims to simplify creating and managing customer encryption keys for high-security use cases, while Audit Manager (also in preview) provides tools for Google Cloud customers in regulated industries to generate proof of compliance for their workloads and cloud-hosted data.

“Generative AI offers tremendous potential to tip the balance in favor of defenders,” Potti wrote in the blog post. “And we continue to infuse AI-driven capabilities into our products.”

Google isn’t the only company attempting to productize generative AI–powered security tooling. Microsoft last year launched a set of services that leverage generative AI to correlate data on attacks while prioritizing cybersecurity incidents. Startups, including Aim Security, are also jumping into the fray, aiming to corner the nascent space.

But with generative AI’s tendency to make mistakes, it remains to be seen whether these tools have staying power.


Software Development in Sri Lanka

Robotic Automations

Google's Gemini comes to databases | TechCrunch


Google wants Gemini, its family of generative AI models, to power your app’s databases — in a sense.

At its annual Cloud Next conference in Las Vegas, Google announced the public preview of Gemini in Databases, a collection of features underpinned by Gemini to — as the company pitched it — “simplify all aspects of the database journey.” In less jargony language, Gemini in Databases is a bundle of AI-powered, developer-focused tools for Google Cloud customers who are creating, monitoring and migrating app databases.

One piece of Gemini in Databases is Database Studio, an editor for structured query language (SQL), the language used to store and process data in relational databases. Built into the Google Cloud console, Database Studio can generate, summarize and fix certain errors with SQL code, Google says, in addition to offering general SQL coding suggestions through a chatbot-like interface.

Joining Database Studio under the Gemini in Databases brand umbrella is AI-assisted migrations via Google’s existing Database Migration Service. Google’s Gemini models can convert database code and deliver explanations of those changes along with recommendations, according to Google.

Elsewhere, in Google’s new Database Center — yet another Gemini in Databases component — users can interact with databases using natural language and can manage a fleet of databases with tools to assess their availability, security and privacy compliance. And should something go wrong, those users can ask a Gemini-powered bot to offer troubleshooting tips.

“Gemini in Databases enables customer to easily generate SQL; additionally, they can now manage, optimize and govern entire fleets of databases from a single pane of glass; and finally, accelerate database migrations with AI-assisted code conversions,” Andi Gutmans, GM of databases at Google Cloud, wrote in a blog post shared with TechCrunch. “Imagine being able to ask questions like ‘Which of my production databases in east Asia had missing backups in the last 24 hours?’ or ‘How many PostgreSQL resources have a version higher than 11?’ and getting instant insights about your entire database fleet.”

That assumes, of course, that the Gemini models don’t make mistakes from time to time — which is no guarantee.

Regardless, Google’s forging ahead, bringing Gemini to Looker, its business intelligence tool, as well.

Launching in private preview, Gemini in Looker lets users “chat with their business data,” as Google describes it in a blog post. Integrated with Workspace, Google’s suite of enterprise productivity tools, Gemini in Looker spans features such as conversational analytics; report, visualization and formula generation; and automated Google Slide presentation generation. 

I’m curious to see if Gemini in Looker’s report and presentation generation work reliably well. Generative AI models don’t exactly have a reputation for accuracy, after all, which could lead to embarrassing, or even mission-critical, mistakes. We’ll find out as Cloud Next continues into the week with any luck.

Gemini in Databases could be perceived as a response of sorts to top rival Microsoft’s recently launched Copilot in Azure SQL Database, which brought generative AI to Microsoft’s existing fully managed cloud database service. Microsoft is looking to stay a step ahead in the budding AI-driven database race and has also worked to build generative AI with Azure Data Studio, the company’s set of enterprise data management and development tools.


Software Development in Sri Lanka

Robotic Automations

Google rolls out Gemini in Android Studio for coding assistance | TechCrunch


Google continues rolling out Gemini to different products as today the company announced that Android Studio’s bot is getting upgraded with Gemini Pro.

In May 2023, during the Google I/O developer event, the company introduced Studio Bot powered by the PaLM-2 foundation model. The company is rolling out Gemini in Android Studio in more than 180 countries for the Android Studio Jellyfish version.

In February, Google also updated the base model for the Bard chatbot from PaLM-2 to Gemini Pro.

Just like the Studio Bot, the new Gemini bot lives in the IDE (integrated development environment) and developers can ask coding-related questions.

The company said developers should notice improved answer quality in code completions, debugging, finding relevant resources and writing documentation.

Google said that for privacy reasons, users will have to log in and enable Gemini explicitly to use it. Plus, the chatbot’s responses are dependent mostly on the conversation history and context provided by the developer.

The company said users can easily access the Gemini API starter template through Android Studio to add generative AI-powered features to their apps.

Google is determined to compete with tools like GitHub Copilot in its different developer-facing products. Last year, it introduced the PaLM-2-based Codey assistant to answer queries about programming and Google Cloud Services.


Software Development in Sri Lanka

Robotic Automations

Google Gemini: Everything you need to know about the new generative AI platform | TechCrunch


Google’s trying to make waves with Gemini, a new generative AI platform that recently made its big debut. But while Gemini appears to be promising in a few aspects, it’s falling short in others. So what is Gemini? How can you use it? And how does it stack up to the competition?

To make it easier to keep up with the latest Gemini developments, we’ve put together this handy guide, which we’ll keep updated as new Gemini models and features are released.

What is Gemini?

Gemini is Google’s long-promised, next-gen generative AI model family, developed by Google’s AI research labs DeepMind and Google Research. It comes in three flavors:

  • Gemini Ultra, the flagship Gemini model
  • Gemini Pro, a “lite” Gemini model
  • Gemini Nano, a smaller “distilled” model that runs on mobile devices like the Pixel 8 Pro

All Gemini models were trained to be “natively multimodal” — in other words, able to work with and use more than just text. They were pre-trained and fine-tuned on a variety audio, images and videos, a large set of codebases, and text in different languages.

That sets Gemini apart from models such as Google’s own large language model LaMDA, which was only trained on text data. LaMDA can’t understand or generate anything other than text (e.g. essays, email drafts and so on) — but that isn’t the case with Gemini models. Their ability to understand images, audio and other modalities is still limited, but it’s better than nothing.

What’s the difference between Bard and Gemini?

Image Credits: Google

Google, proving once again that it lacks a knack for branding, didn’t make it clear from the outset that Gemini is separate and distinct from Bard. Bard is simply an interface through which certain Gemini models can be accessed — think of it as an app or client for Gemini and other gen AI models. Gemini, on the other hand, is a family of models — not an app or frontend. There’s no standalone Gemini experience, nor will there likely ever be. If you were to compare to OpenAI’s products, Bard corresponds to ChatGPT, OpenAI’s popular conversational AI app, and Gemini corresponds to the language model that powers it, which in ChatGPT’s case is GPT-3.5 or 4.

Incidentally, Gemini is also totally independent from Imagen-2, a text-to-image model that may or may not fit into the company’s overall AI strategy. Don’t worry, you’re not the only one confused by this!

What can Gemini do?

Because the Gemini models are multimodal, they can in theory perform a range of tasks, from transcribing speech to captioning images and videos to generating artwork. Few of these capabilities have reached the product stage yet (more on that later), but Google’s promising all of them — and more — at some point in the not-too-distant future.

Of course, it’s a bit hard to take the company at its word.

Google seriously under-delivered with the original Bard launch. And more recently it ruffled feathers with a video purporting to show Gemini’s capabilities that turned out to have been heavily doctored and was more or less aspirational. Gemini is, to the tech giant’s credit, available in some form today — but a rather limited form.

Still, assuming Google is being more or less truthful with its claims, here’s what the different tiers of Gemini models will be able to do once they’re released:

Gemini Ultra

Few people have gotten their hands on Gemini Ultra, the “foundation” model on which the others are built, so far — just a “select set” of customers across a handful of Google apps and services. That won’t change until sometime later this year, when Google’s largest model launches more broadly. Most info about Ultra has come from Google-led product demos, so it’s best taken with a grain of salt.

Google says that Gemini Ultra can be used to help with things like physics homework, solving problems step-by-step on a worksheet and pointing out possible mistakes in already filled-in answers. Gemini Ultra can also be applied to tasks such as identifying scientific papers relevant to a particular problem, Google says — extracting information from those papers and “updating” a chart from one by generating the formulas necessary to recreate the chart with more recent data.

Gemini Ultra technically supports image generation, as alluded to earlier. But that capability won’t make its way into the productized version of the model at launch, according to Google — perhaps because the mechanism is more complex than how apps such as ChatGPT generate images. Rather than feed prompts to an image generator (like DALL-E 3, in ChatGPT’s case), Gemini outputs images “natively” without an intermediary step.

Gemini Pro

Unlike Gemini Ultra, Gemini Pro is available publicly today. But confusingly, its capabilities depend on where it’s used.

Google says that in Bard, where Gemini Pro launched first in text-only form, the model is an improvement over LaMDA in its reasoning, planning and understanding capabilities. An independent study by Carnegie Mellon and BerriAI researchers found that Gemini Pro is indeed better than OpenAI’s GPT-3.5 at handling longer and more complex reasoning chains.

But the study also found that, like all large language models, Gemini Pro particularly struggles with math problems involving several digits, and users have found plenty of examples of bad reasoning and mistakes. It made plenty of factual errors for simple queries like who won the latest Oscars. Google has promised improvements, but it’s not clear when they’ll arrive.

Gemini Pro is also available via API in Vertex AI, Google’s fully managed AI developer platform, which accepts text as input and generates text as output. An additional endpoint, Gemini Pro Vision, can process text and imagery — including photos and video — and output text along the lines of OpenAI’s GPT-4 with Vision model.

Using Gemini Pro in Vertex AI.

Within Vertex AI, developers can customize Gemini Pro to specific contexts and use cases using a fine-tuning or “grounding” process. Gemini Pro can also be connected to external, third-party APIs to perform particular actions.

Sometime in “early 2024,” Vertex customers will be able to tap Gemini Pro to power custom-built conversational voice and chat agents (i.e. chatbots). Gemini Pro will also become an option for driving search summarization, recommendation and answer generation features in Vertex AI, drawing on documents across modalities (e.g. PDFs, images) from different sources (e.g. OneDrive, Salesforce) to satisfy queries.

Image Credits: Gemini

In AI Studio, Google’s web-based tool for app and platform developers, there’s workflows for creating freeform, structured and chat prompts using Gemini Pro. Developers have access to both Gemini Pro and the Gemini Pro Vision endpoints, and they can adjust the model temperature to control the output’s creative range and provide examples to give tone and style instructions — and also tune the safety settings.

Gemini Nano

Gemini Nano is a much smaller version of the Gemini Pro and Ultra models, and it’s efficient enough to run directly on (some) phones instead of sending the task to a server somewhere. So far it powers two features on the Pixel 8 Pro: Summarize in Recorder and Smart Reply in Gboard.

The Recorder app, which lets users push a button to record and transcribe audio, includes a Gemini-powered summary of your recorded conversations, interviews, presentations and other snippets. Users get these summaries even if they don’t have a signal or Wi-Fi connection available — and in a nod to privacy, no data leaves their phone in the process.

Gemini Nano is also in Gboard, Google’s keyboard app, as a developer preview. There, it powers a feature called Smart Reply, which helps to suggest the next thing you’ll want to say when having a conversation in a messaging app. The feature initially only works with WhatsApp, but will come to more apps in 2024, Google says.

Is Gemini better than OpenAI’s GPT-4?

There’s no way to know how the Gemini family really stacks up until Google releases Ultra later this year, but the company has claimed improvements on the state of the art — which is usually OpenAI’s GPT-4.

Google has several times touted Gemini’s superiority on benchmarks, claiming that Gemini Ultra exceeds current state-of-the-art results on “30 of the 32 widely used academic benchmarks used in large language model research and development.” The company says that Gemini Pro, meanwhile, is more capable at tasks like summarizing content, brainstorming and writing than GPT-3.5.

But leaving aside the question of whether benchmarks really indicate a better model, the scores Google points to appear to be only marginally better than OpenAI’s corresponding models. And — as mentioned earlier — some early impressions haven’t been great, with users and academics pointing out that Gemini Pro tends to get basic facts wrong, struggles with translations, and gives poor coding suggestions.

How much will Gemini cost?

Gemini Pro is free to use in Bard and, for now, AI Studio and Vertex AI.

Once Gemini Pro exits preview in Vertex, however, the model will cost $0.0025 per character while output will cost $0.00005 per character. Vertex customers pay per 1,000 characters (about 140 to 250 words) and, in the case of models like Gemini Pro Vision, per image ($0.0025).

Let’s assume a 500-word article contains 2,000 characters. Summarizing that article with Gemini Pro would cost $5. Meanwhile, generating an article of a similar length would cost $0.1.

Where you can try Gemini?

Gemini Pro

The easiest place to experience Gemini Pro is in Bard. A fine-tuned version of Pro is answering text-based Bard queries in English in the U.S. right now, with additional languages and supported countries set to arrive down the line.

Gemini Pro is also accessible in preview in Vertex AI via an API. The API is free to use “within limits” for the time being and supports 38 languages and regions including Europe, as well as features like chat functionality and filtering.

Elsewhere, Gemini Pro can be found in AI Studio. Using the service, developers can iterate prompts and Gemini-based chatbots and then get API keys to use them in their apps — or export the code to a more fully featured IDE.

Duet AI for Developers, Google’s suite of AI-powered assistance tools for code completion and generation, will start using a Gemini model in the coming weeks. And Google plans to bring Gemini models to dev tools for Chrome and its Firebase mobile dev platform around the same time, in early 2024.

Gemini Nano

Gemini Nano is on the Pixel 8 Pro — and will come to other devices in the future. Developers interested in incorporating the model into their Android apps can sign up for a sneak peek.

We’ll keep this post up to date with the latest developments.


Software Development in Sri Lanka

Back
WhatsApp
Messenger
Viber