From Digital Age to Nano Age. WorldWide.

Tag: generative

Robotic Automations

Google Gemini: Everything you need to know about the new generative AI platform | TechCrunch


Google’s trying to make waves with Gemini, its flagship suite of generative AI models, apps and services.

So what is Gemini? How can you use it? And how does it stack up to the competition?

To make it easier to keep up with the latest Gemini developments, we’ve put together this handy guide, which we’ll keep updated as new Gemini models, features and news about Google’s plans for Gemini are released.

What is Gemini?

Gemini is Google’s long-promised, next-gen GenAI model family, developed by Google’s AI research labs DeepMind and Google Research. It comes in three flavors:

  • Gemini Ultra, the most performant Gemini model.
  • Gemini Pro, a “lite” Gemini model.
  • Gemini Nano, a smaller “distilled” model that runs on mobile devices like the Pixel 8 Pro.

All Gemini models were trained to be “natively multimodal” — in other words, able to work with and use more than just words. They were pretrained and fine-tuned on a variety of audio, images and videos, a large set of codebases and text in different languages.

This sets Gemini apart from models such as Google’s own LaMDA, which was trained exclusively on text data. LaMDA can’t understand or generate anything other than text (e.g., essays, email drafts), but that isn’t the case with Gemini models.

What’s the difference between the Gemini apps and Gemini models?

Image Credits: Google

Google, proving once again that it lacks a knack for branding, didn’t make it clear from the outset that Gemini is separate and distinct from the Gemini apps on the web and mobile (formerly Bard). The Gemini apps are simply an interface through which certain Gemini models can be accessed — think of it as a client for Google’s GenAI.

Incidentally, the Gemini apps and models are also totally independent from Imagen 2, Google’s text-to-image model that’s available in some of the company’s dev tools and environments.

What can Gemini do?

Because the Gemini models are multimodal, they can in theory perform a range of multimodal tasks, from transcribing speech to captioning images and videos to generating artwork. Some of these capabilities have reached the product stage yet (more on that later), and Google’s promising all of them — and more — at some point in the not-too-distant future.

Of course, it’s a bit hard to take the company at its word.

Google seriously underdelivered with the original Bard launch. And more recently it ruffled feathers with a video purporting to show Gemini’s capabilities that turned out to have been heavily doctored and was more or less aspirational.

Still, assuming Google is being more or less truthful with its claims, here’s what the different tiers of Gemini will be able to do once they reach their full potential:

Gemini Ultra

Google says that Gemini Ultra — thanks to its multimodality — can be used to help with things like physics homework, solving problems step-by-step on a worksheet and pointing out possible mistakes in already filled-in answers.

Gemini Ultra can also be applied to tasks such as identifying scientific papers relevant to a particular problem, Google says — extracting information from those papers and “updating” a chart from one by generating the formulas necessary to re-create the chart with more recent data.

Gemini Ultra technically supports image generation, as alluded to earlier. But that capability hasn’t made its way into the productized version of the model yet — perhaps because the mechanism is more complex than how apps such as ChatGPT generate images. Rather than feed prompts to an image generator (like DALL-E 3, in ChatGPT’s case), Gemini outputs images “natively,” without an intermediary step.

Gemini Ultra is available as an API through Vertex AI, Google’s fully managed AI developer platform, and AI Studio, Google’s web-based tool for app and platform developers. It also powers the Gemini apps — but not for free. Access to Gemini Ultra through what Google calls Gemini Advanced requires subscribing to the Google One AI Premium Plan, priced at $20 per month.

The AI Premium Plan also connects Gemini to your wider Google Workspace account — think emails in Gmail, documents in Docs, presentations in Sheets and Google Meet recordings. That’s useful for, say, summarizing emails or having Gemini capture notes during a video call.

Gemini Pro

Google says that Gemini Pro is an improvement over LaMDA in its reasoning, planning and understanding capabilities.

An independent study by Carnegie Mellon and BerriAI researchers found that the initial version of Gemini Pro was indeed better than OpenAI’s GPT-3.5 at handling longer and more complex reasoning chains. But the study also found that, like all large language models, this version of Gemini Pro particularly struggled with mathematics problems involving several digits, and users found examples of bad reasoning and obvious mistakes.

Google promised remedies, though — and the first arrived in the form of Gemini 1.5 Pro.

Designed to be a drop-in replacement, Gemini 1.5 Pro is improved in a number of areas compared with its predecessor, perhaps most significantly in the amount of data that it can process. Gemini 1.5 Pro can take in ~700,000 words, or ~30,000 lines of code — 35x the amount Gemini 1.0 Pro can handle. And — the model being multimodal — it’s not limited to text. Gemini 1.5 Pro can analyze up to 11 hours of audio or an hour of video in a variety of different languages, albeit slowly (e.g., searching for a scene in a one-hour video takes 30 seconds to a minute of processing).

Gemini 1.5 Pro entered public preview on Vertex AI in April.

An additional endpoint, Gemini Pro Vision, can process text and imagery — including photos and video — and output text along the lines of OpenAI’s GPT-4 with Vision model.

Using Gemini Pro in Vertex AI. Image Credits: Gemini

Within Vertex AI, developers can customize Gemini Pro to specific contexts and use cases using a fine-tuning or “grounding” process. Gemini Pro can also be connected to external, third-party APIs to perform particular actions.

In AI Studio, there’s workflows for creating structured chat prompts using Gemini Pro. Developers have access to both Gemini Pro and the Gemini Pro Vision endpoints, and they can adjust the model temperature to control the output’s creative range and provide examples to give tone and style instructions — and also tune the safety settings.

Gemini Nano

Gemini Nano is a much smaller version of the Gemini Pro and Ultra models, and it’s efficient enough to run directly on (some) phones instead of sending the task to a server somewhere. So far, it powers a couple of features on the Pixel 8 Pro, Pixel 8 and Samsung Galaxy S24, including Summarize in Recorder and Smart Reply in Gboard.

The Recorder app, which lets users push a button to record and transcribe audio, includes a Gemini-powered summary of your recorded conversations, interviews, presentations and other snippets. Users get these summaries even if they don’t have a signal or Wi-Fi connection available — and in a nod to privacy, no data leaves their phone in the process.

Gemini Nano is also in Gboard, Google’s keyboard app. There, it powers a feature called Smart Reply, which helps to suggest the next thing you’ll want to say when having a conversation in a messaging app. The feature initially only works with WhatsApp but will come to more apps over time, Google says.

And in the Google Messages app on supported devices, Nano enables Magic Compose, which can craft messages in styles like “excited,” “formal” and “lyrical.”

Is Gemini better than OpenAI’s GPT-4?

Google has several times touted Gemini’s superiority on benchmarks, claiming that Gemini Ultra exceeds current state-of-the-art results on “30 of the 32 widely used academic benchmarks used in large language model research and development.” The company says that Gemini 1.5 Pro, meanwhile, is more capable at tasks like summarizing content, brainstorming and writing than Gemini Ultra in some scenarios; presumably this will change with the release of the next Ultra model.

But leaving aside the question of whether benchmarks really indicate a better model, the scores Google points to appear to be only marginally better than OpenAI’s corresponding models. And — as mentioned earlier — some early impressions haven’t been great, with users and academics pointing out that the older version of Gemini Pro tends to get basic facts wrong, struggles with translations and gives poor coding suggestions.

How much does Gemini cost?

Gemini 1.5 Pro is free to use in the Gemini apps and, for now, AI Studio and Vertex AI.

Once Gemini 1.5 Pro exits preview in Vertex, however, the model will cost $0.0025 per character while output will cost $0.00005 per character. Vertex customers pay per 1,000 characters (about 140 to 250 words) and, in the case of models like Gemini Pro Vision, per image ($0.0025).

Let’s assume a 500-word article contains 2,000 characters. Summarizing that article with Gemini 1.5 Pro would cost $5. Meanwhile, generating an article of a similar length would cost $0.1.

Ultra pricing has yet to be announced.

Where can you try Gemini?

Gemini Pro

The easiest place to experience Gemini Pro is in the Gemini apps. Pro and Ultra are answering queries in a range of languages.

Gemini Pro and Ultra are also accessible in preview in Vertex AI via an API. The API is free to use “within limits” for the time being and supports certain regions, including Europe, as well as features like chat functionality and filtering.

Elsewhere, Gemini Pro and Ultra can be found in AI Studio. Using the service, developers can iterate prompts and Gemini-based chatbots and then get API keys to use them in their apps — or export the code to a more fully featured IDE.

Code Assist (formerly Duet AI for Developers), Google’s suite of AI-powered assistance tools for code completion and generation, is using Gemini models. Developers can perform “large-scale” changes across codebases, for example updating cross-file dependencies and reviewing large chunks of code.

Google’s brought Gemini models to its dev tools for Chrome and Firebase mobile dev platform, and its database creation and management tools. And it’s launched new security products underpinned by Gemini, like Gemini in Threat Intelligence, a component of Google’s Mandiant cybersecurity platform that can analyze large portions of potentially malicious code and let users perform natural language searches for ongoing threats or indicators of compromise.

Gemini Nano

Gemini Nano is on the Pixel 8 Pro, Pixel 8 and Samsung Galaxy S24 — and will come to other devices in the future. Developers interested in incorporating the model into their Android apps can sign up for a sneak peek.

Is Gemini coming to the iPhone?

It might! Apple and Google are reportedly in talks to put Gemini to use for a number of features to be included in an upcoming iOS update later this year. Nothing’s definitive, as Apple is also reportedly in talks with OpenAI, and has been working on developing its own GenAI capabilities.

This post was originally published Feb. 16, 2024 and has since been updated to include new information about Gemini and Google’s plans for it.


Software Development in Sri Lanka

Robotic Automations

NIST launches a new platform to assess generative AI | TechCrunch


The National Institute of Standards and Technology (NIST), the U.S. Commerce Department agency that develops and tests tech for the U.S. government, corporations and the broader public, today announced the launch of NIST GenAI, a new program spearheaded by NIST to assess generative AI technologies, including text- and image-generating AI.

A platform designed to evaluate various forms of generative AI tech, NIST GenAI will release benchmarks, help create “content authenticity” detection (i.e. deepfake-checking) systems and encourage the development of software to spot the source of fake or misleading information, explains NIST on its newly-launched NIST GenAI site and in a press release.

“The NIST GenAI program will issue a series of challenge problems designed to evaluate and measure the capabilities and limitations of generative AI technologies,” the press release reads. “These evaluations will be used to identify strategies to promote information integrity and guide the safe and responsible use of digital content.”

NIST GenAI’s first project is a pilot study to build systems that can reliably tell the difference between human-created and AI-generated media, starting with text. (While many services purport to detect deepfakes, studies — and our own testing — have shown them to be unreliable, particularly when it comes to text.) NIST GenAI is inviting teams from academia, industry and research labs to submit either “generators” — AI systems to generate content — or “discriminators” — systems that try to identify AI-generated content.

Generators in the study must generate summaries provided a topic and a set of documents, while discriminators must detect if a given summary is AI-written or not. To ensure fairness, NIST GenAI will provide the data necessary to train generators and discriminators; systems trained on publicly available data won’t be accepted, including but not limited to open models like Meta’s Llama 3.

Registration for the pilot will begin May 1, with the results scheduled to be published in February 2025.

NIST GenAI’s launch — and deepfake-focused study — comes as deepfakes grow exponentially.

According to data from Clarity, a deepfake detection firm, 900% more deepfakes have been created this year compared to the same time frame last year. It’s causing alarm, understandably. A recent poll from YouGov found that 85% of Americans said they were concerned about the spread of misleading deepfakes online.

The launch of NIST GenAI is a part of NIST’s response to President Joe Biden’s executive order on AI, which laid out rules requiring greater transparency from AI companies about how their models work and established a raft of new standards, including for labeling content generated by AI.

It’s also the first AI-related announcement from NIST after the appointment of Paul Christiano, a former OpenAI researcher, to the agency’s AI Safety Institute.

Christiano was a controversial choice for his “doomerist” views; he once predicted that “there’s a 50% chance AI development could end in [humanity’s destruction]” Critics — including scientists within NIST, reportedly — fear Cristiano may encourage the AI Safety Institute to focus to “fantasy scenarios” rather than realistic, more immediate risks from AI.

NIST says that NIST GenAI will inform the AI Safety Institute’s work.


Software Development in Sri Lanka

Robotic Automations

BigPanda launches generative AI tool designed specifically for ITOps | TechCrunch


IT operations personnel have a lot going on, and when an incident occurs that brings down a key system, time is always going to be against them. Over the years, companies have looked for an edge in getting up faster with playbooks designed to find answers to common problems, and postmortems to keep them from repeating, but not every problem is easily solved, and there is so much data and so many possible points of failure.

It’s actually a perfect problem for generative AI to solve, and AIOps startup BigPanda announced a new generative AI tool today called Biggy to help solve some of these issues faster. Biggy is designed to look across a wide variety of IT-related data to learn how the company operates and compare it to the problem scenario and other similar scenarios and suggest a solution.

BigPanda has been using AI since the early days of the company and deliberately designed two separate systems: one for the data layer and another for the AI. This in a way prepared them for this shift to generative AI based on large language models. “The AI engine before Gen AI was building a lot of other types of AI, but it was feeding off of the same data engine that will be feeding what we’re doing with Biggy, and what we’re doing with generative and conversational AI,” BigPanda CEO Assaf Resnick told TechCrunch.

Like most generative AI tools, this one makes a prompt box available where users can ask questions and interact with the bot. In this case, the underlying models have been trained on data inside the customer company, as well as on publicly available data on a particular piece of hardware or software, and are tuned to deal with the kinds of problems IT deals with on a regular basis.

“The out-of-the box LLMs have been trained on a huge amount of data, and they’re really good actually as generalists in all of the operational fields we look at — infrastructure, network, application development, everything there. And they actually know all the hardware very well,” Jason Walker, chief innovation officer at BigPanda, said. “So if you ask it about a certain HP blade server with this error code, it’s pretty good at putting that together, and we use that for a lot of the event traffic.” Of course, it has to be more than that or a human engineer could simply look this up in Google Search.

It combines this knowledge with what it is able to cull internally across a range of data types. “BigPanda ingests the customer’s operational and contextual data from observability, change, CDMB (the file that stores configuration information) and topology along with historical data and human, institutional context — and normalizes the data into key-value pairs, or tags,” Walker said. That’s a lot of technical jargon, but basically it means it looks at system-level information, organizational data and human interactions to deliver a response to help engineers solve the problem.

When a user enters a prompt, it looks across all the data to generate an answer that will hopefully point the engineers in the right direction to fix the problem. They acknowledge that it’s not always perfect because no generative AI is, but they let the user know when there is a lower degree of certainty that the answer is correct.

“For areas where we think we don’t have as much certainty, then we tell them that this is our best information, but a human should take a look at this,” Resnick said. For other areas where there is more certainty, they may introduce automation, working with a tool like Red Hat Ansible to solve the issue without human interaction, he said.

The data ingestion part isn’t always going to be trivial for customers, and this is a first step toward providing an AI assistant that can help IT get at the root of problems and solve them faster. No AI is foolproof, but having an interactive AI tool should be an improvement over current, more time-consuming manual approaches to IT systems troubleshooting.


Software Development in Sri Lanka

Robotic Automations

Snowflake releases a flagship generative AI model of its own | TechCrunch


All-around, highly generalizable generative AI models were the name of the game once, and they arguably still are. But increasingly, as cloud vendors large and small join the generative AI fray, we’re seeing a new crop of models focused on the deepest-pocketed potential customers: the enterprise.

Case in point: Snowflake, the cloud computing company, today unveiled Arctic LLM, a generative AI model that’s described as “enterprise-grade.” Available under an Apache 2.0 license, Arctic LLM is optimized for “enterprise workloads,” including generating database code, Snowflake says, and is free for research and commercial use.

“I think this is going to be the foundation that’s going to let us — Snowflake — and our customers build enterprise-grade products and actually begin to realize the promise and value of AI,” CEO Sridhar Ramaswamy said in press briefing. “You should think of this very much as our first, but big, step in the world of generative AI, with lots more to come.”

An enterprise model

My colleague Devin Coldewey recently wrote about how there’s no end in sight to the onslaught of generative AI models. I recommend you read his piece, but the gist is: Models are an easy way for vendors to drum up excitement for their R&D and they also serve as a funnel to their product ecosystems (e.g., model hosting, fine-tuning and so on).

Arctic LLM is no different. Snowflake’s flagship model in a family of generative AI models called Arctic, Arctic LLM — which took around three months, 1,000 GPUs and $2 million to train — arrives on the heels of Databricks’ DBRX, a generative AI model also marketed as optimized for the enterprise space.

Snowflake draws a direct comparison between Arctic LLM and DBRX in its press materials, saying Arctic LLM outperforms DBRX on the two tasks of coding (Snowflake didn’t specify which programming languages) and SQL generation. The company said Arctic LLM is also better at those tasks than Meta’s Llama 2 70B (but not the more recent Llama 3 70B) and Mistral’s Mixtral-8x7B.

Snowflake also claims that Arctic LLM achieves “leading performance” on a popular general language understanding benchmark, MMLU. I’ll note, though, that while MMLU purports to evaluate generative models’ ability to reason through logic problems, it includes tests that can be solved through rote memorization, so take that bullet point with a grain of salt.

“Arctic LLM addresses specific needs within the enterprise sector,” Baris Gultekin, head of AI at Snowflake, told TechCrunch in an interview, “diverging from generic AI applications like composing poetry to focus on enterprise-oriented challenges, such as developing SQL co-pilots and high-quality chatbots.”

Arctic LLM, like DBRX and Google’s top-performing generative model of the moment, Gemini 1.5 Pro, is a mixture of experts (MoE) architecture. MoE architectures basically break down data processing tasks into subtasks and then delegate them to smaller, specialized “expert” models. So, while Arctic LLM contains 480 billion parameters, it only activates 17 billion at a time — enough to drive the 128 separate expert models. (Parameters essentially define the skill of an AI model on a problem, like analyzing and generating text.)

Snowflake claims that this efficient design enabled it to train Arctic LLM on open public web data sets (including RefinedWeb, C4, RedPajama and StarCoder) at “roughly one-eighth the cost of similar models.”

Running everywhere

Snowflake is providing resources like coding templates and a list of training sources alongside Arctic LLM to guide users through the process of getting the model up and running and fine-tuning it for particular use cases. But, recognizing that those are likely to be costly and complex undertakings for most developers (fine-tuning or running Arctic LLM requires around eight GPUs), Snowflake’s also pledging to make Arctic LLM available across a range of hosts, including Hugging Face, Microsoft Azure, Together AI’s model-hosting service, and enterprise generative AI platform Lamini.

Here’s the rub, though: Arctic LLM will be available first on Cortex, Snowflake’s platform for building AI- and machine learning-powered apps and services. The company’s unsurprisingly pitching it as the preferred way to run Arctic LLM with “security,” “governance” and scalability.

Our dream here is, within a year, to have an API that our customers can use so that business users can directly talk to data,” Ramaswamy said. “It would’ve been easy for us to say, ‘Oh, we’ll just wait for some open source model and we’ll use it. Instead, we’re making a foundational investment because we think [it’s] going to unlock more value for our customers.”

So I’m left wondering: Who’s Arctic LLM really for besides Snowflake customers?

In a landscape full of “open” generative models that can be fine-tuned for practically any purpose, Arctic LLM doesn’t stand out in any obvious way. Its architecture might bring efficiency gains over some of the other options out there. But I’m not convinced that they’ll be dramatic enough to sway enterprises away from the countless other well-known and -supported, business-friendly generative models (e.g. GPT-4).

There’s also a point in Arctic LLM’s disfavor to consider: its relatively small context.

In generative AI, context window refers to input data (e.g. text) that a model considers before generating output (e.g. more text). Models with small context windows are prone to forgetting the content of even very recent conversations, while models with larger contexts typically avoid this pitfall.

Arctic LLM’s context is between ~8,000 and ~24,000 words, dependent on the fine-tuning method — far below that of models like Anthropic’s Claude 3 Opus and Google’s Gemini 1.5 Pro.

Snowflake doesn’t mention it in the marketing, but Arctic LLM almost certainly suffers from the same limitations and shortcomings as other generative AI models — namely, hallucinations (i.e. confidently answering requests incorrectly). That’s because Arctic LLM, along with every other generative AI model in existence, is a statistical probability machine — one that, again, has a small context window. It guesses based on vast amounts of examples which data makes the most “sense” to place where (e.g. the word “go” before “the market” in the sentence “I go to the market”). It’ll inevitably guess wrong — and that’s a “hallucination.”

As Devin writes in his piece, until the next major technical breakthrough, incremental improvements are all we have to look forward to in the generative AI domain. That won’t stop vendors like Snowflake from championing them as great achievements, though, and marketing them for all they’re worth.


Software Development in Sri Lanka

Robotic Automations

Amazon wants to host companies' custom generative AI models | TechCrunch


AWS, Amazon’s cloud computing business, wants to be the go-to place companies host and fine-tune their custom generative AI models.

Today, AWS announced the launch of Custom Model Import (in preview), a new feature in Bedrock, AWS’ enterprise-focused suite of generative AI services, that allows organizations to import and access their in-house generative AI models as fully managed APIs.

Companies’ proprietary models, once imported, benefit from the same infrastructure as other generative AI models in Bedrock’s library (e.g. Meta’s Llama 3, Anthropic’s Claude 3), including tools to expand their knowledge, fine-tune them and implement safeguards to mitigate their biases.

“There have been AWS customers that have been fine-tuning or building their own models outside of Bedrock using other tools,” Vasi Philomin, VP of generative AI at AWS, told TechCrunch in an interview. “This Custom Model Import capability allows them to bring their own proprietary models to Bedrock and see them right next to all of the other models that are already on Bedrock — and use them with all of the workflows that are also already on Bedrock, as well.”

Importing custom models

According to a recent poll from Cnvrg, Intel’s AI-focused subsidiary, the majority of enterprises are approaching generative AI by building their own models and refining them to their applications. Those same enterprises say that they see infrastructure, including cloud compute infrastructure, as their greatest barrier to deployment, per the poll.

With Custom Model Import, AWS aims to rush in to fill the need while maintaining pace with cloud rivals. (Amazon CEO Andy Jassy foreshadowed as much in his recent annual letter to shareholders.)

For some time, Vertex AI, Google’s analog to Bedrock, has allowed customers to upload generative AI models, tailor them and serve them through APIs. Databricks, too, has long provided toolsets to host and tweak custom models, including its own recently released DBRX.

Asked what sets Custom Model Import apart, Philomin asserted that it — and by extension Bedrock — offer a wider breadth and depth of model customization options than the competition, adding that “tens of thousands” of customers today are using Bedrock.

“Number one, Bedrock provides several ways for customers to deal with serving models,” Philomin said. “Number two, we have a whole bunch of workflows around these models — and now customers’ can stand right next to all of the other models that we have already available. A key thing that most people like about this is the ability to be able to experiment across multiple different models using the same workflows, and then actually take them to production from the same place.”

So what are the alluded-to model customization options?

Philomin points to Guardrails, which lets Bedrock users configure thresholds to filter — or at least attempt to filter — models’ outputs for things like hate speech, violence and private personal or corporate information. (Generative AI models are notorious for going off the rails in problematic ways, including leaking sensitive info; AWS’ have been no exception.) He also highlighted Model Evaluation, a Bedrock tool customers can use to test how well a model — or several — perform across a given set of criteria.

Both Guardrails and Model Evaluation are now generally available following a several-months-long preview.

I feel compelled to note here that Custom Model Import only supports three model architectures at the moment — Hugging Face’s Flan-T5, Meta’s Llama and Mistral’s models — and that Vertex AI and other Bedrock-rivaling services, including Microsoft’s AI development tools on Azure, offer more or less comparable safety and evaluation features (see Azure AI Content Safety, model evaluation in Vertex and so on).

What is unique to Bedrock, though, are AWS’ Titan family of generative AI models. And — coinciding with the release of Custom Model Import — there’s several noteworthy developments on that front.

Upgraded Titan models

Titan Image Generator, AWS’ text-to-image model, is now generally available after launching in preview last November. As before, Titan Image Generator can create new images given a text description or customize existing images, for example swapping out an image background while retaining the subjects in the image.

Compared to the preview version, Titan Image Generator in GA can generate images with more “creativity,” said Philomin, without going into detail. (Your guess as to what that means is as good as mine.)

I asked Philomin if he had any more details to share about how Titan Image Generator was trained.

At the model’s debut last November, AWS was vague about which data, exactly, it used in training Titan Image Generator. Few vendors readily reveal such information; they see training data as a competitive advantage and thus keep it and info relating to it close to the chest.

Training data details are also a potential source of IP-related lawsuits, another disincentive to reveal much. Several cases making their way through the courts reject vendors’ fair use defenses, arguing that text-to-image tools replicate artists’ styles without the artists’ explicit permission and allow users to generate new works resembling artists’ originals for which artists receive no payment.

Philomin would only tell me that AWS uses a combination of first-party and licensed data.

“We have a combination of proprietary data sources, but also we license a lot of data,” he said. “We actually pay copyright owners licensing fees in order to be able to use their data, and we do have contracts with several of them.”

It’s more detail than from November. But I have a feeling that Philomin’s answer won’t satisfy everyone, particularly the content creators and AI ethicists arguing for greater transparency where it concerns generative AI model training.

In lieu of transparency, AWS says it’ll continue to offer an indemnification policy that covers customers in the event a Titan model like Titan Image Generator regurgitates (i.e. spits out a mirror copy of) a potentially copyrighted training example. (Several rivals, including Microsoft and Google, offer similar policies covering their image generation models.)

To address another pressing ethical threat — deepfakes — AWS says that images created with Titan Image Generator will, as during the preview, come with a “tamper-resistant” invisible watermark. Philomin says that the watermark has been made more resistant in the GA release to compression and other image edits and manipulations.

Segueing into less controversial territory, I asked Philomin whether AWS — like Google, OpenAI and others — is exploring video generation given the excitement around (and investment in) the tech. Philomin didn’t say that AWS wasn’t… but he wouldn’t hint at any more than that.

“Obviously, we’re constantly looking to see what new capabilities customers want to have, and video generation definitely comes up in conversations with customers,” Philomin said. “I’d ask you to stay tuned.”

In one last piece of Titan-related news, AWS released the second generation of its Titan Embeddings model, Titan Text Embeddings V2. Titan Text Embeddings V2 converts text to numerical representations called embeddings to power search and personalization applications. So did the first-generation Embeddings model — but AWS claims that Titan Text Embeddings V2 is overall more efficient, cost-effective and accurate.

“What the Embeddings V2 model does is reduce the overall storage [necessary to use the model] by up to four times while retaining 97% of the accuracy,” Philomin claimed, “outperforming other models that are comparable.”

We’ll see if real-world testing bears that out.


Software Development in Sri Lanka

Robotic Automations

Hugging Face releases a benchmark for testing generative AI on health tasks | TechCrunch


Generative AI models are increasingly being brought to healthcare settings — in some cases prematurely, perhaps. Early adopters believe that they’ll unlock increased efficiency while revealing insights that’d otherwise be missed. Critics, meanwhile, point out that these models have flaws and biases that could contribute to worse health outcomes.

But is there a quantitative way to know how helpful, or harmful, a model might be when tasked with things like summarizing patient records or answering health-related questions?

Hugging Face, the AI startup, proposes a solution in a newly released benchmark test called Open Medical-LLM. Created in partnership with researchers at the nonprofit Open Life Science AI and the University of Edinburgh’s Natural Language Processing Group, Open Medical-LLM aims to standardize evaluating the performance of generative AI models on a range of medical-related tasks.

Open Medical-LLM isn’t a from-scratch benchmark, per se, but rather a stitching-together of existing test sets — MedQA, PubMedQA, MedMCQA and so on — designed to probe models for general medical knowledge and related fields, such as anatomy, pharmacology, genetics and clinical practice. The benchmark contains multiple choice and open-ended questions that require medical reasoning and understanding, drawing from material including U.S. and Indian medical licensing exams and college biology test question banks.

“[Open Medical-LLM] enables researchers and practitioners to identify the strengths and weaknesses of different approaches, drive further advancements in the field and ultimately contribute to better patient care and outcome,” Hugging Face wrote in a blog post.

Image Credits: Hugging Face

Hugging Face is positioning the benchmark as a “robust assessment” of healthcare-bound generative AI models. But some medical experts on social media cautioned against putting too much stock into Open Medical-LLM, lest it lead to ill-informed deployments.

On X, Liam McCoy, a resident physician in neurology at the University of Alberta, pointed out that the gap between the “contrived environment” of medical question-answering and actual clinical practice can be quite large.

Hugging Face research scientist Clémentine Fourrier, who co-authored the blog post, agreed.

“These leaderboards should only be used as a first approximation of which [generative AI model] to explore for a given use case, but then a deeper phase of testing is always needed to examine the model’s limits and relevance in real conditions,” Fourrier replied on X. “Medical [models] should absolutely not be used on their own by patients, but instead should be trained to become support tools for MDs.”

It brings to mind Google’s experience when it tried to bring an AI screening tool for diabetic retinopathy to healthcare systems in Thailand.

Google created a deep learning system that scanned images of the eye, looking for evidence of retinopathy, a leading cause of vision loss. But despite high theoretical accuracy, the tool proved impractical in real-world testing, frustrating both patients and nurses with inconsistent results and a general lack of harmony with on-the-ground practices.

It’s telling that of the 139 AI-related medical devices the U.S. Food and Drug Administration has approved to date, none use generative AI. It’s exceptionally difficult to test how a generative AI tool’s performance in the lab will translate to hospitals and outpatient clinics, and, perhaps more importantly, how the outcomes might trend over time.

That’s not to suggest Open Medical-LLM isn’t useful or informative. The results leaderboard, if nothing else, serves as a reminder of just how poorly models answer basic health questions. But Open Medical-LLM, and no other benchmark for that matter, is a substitute for carefully thought-out real-world testing.




Software Development in Sri Lanka

Robotic Automations

NeuBird is building a generative AI solution for complex cloud-native environments | TechCrunch


NeuBird founders Goutham Rao and Vinod Jayaraman came from Portworx, a cloud-native storage solution they eventually sold to PureStorage in 2019 for $370 million. It was their third successful exit. 

When they went looking for their next startup challenge last year, they saw an opportunity to combine their cloud-native knowledge, especially around IT operations, with the burgeoning area of generative AI. 

Today NeuBird announced a $22 million investment from Mayfield to get the idea to market. It’s a hefty amount for an early-stage startup, but the firm is likely banking on the founders’ experience to build another successful company.

Rao, the CEO, says that while the cloud-native community has done a good job at solving a lot of difficult problems, it has created increasing levels of complexity along the way. 

“We’ve done an incredible job as a community over the past 10-plus years building cloud-native architectures with service-oriented designs. This added a lot of layers, which is good. That’s a proper way to design software, but this also came at a cost of increased telemetry. There’s just too many layers in the stack,” Rao told TechCrunch.

They concluded that this level of data was making it impossible for human engineers to find, diagnose and solve problems at scale inside large organizations. At the same time, large language models were beginning to mature, so the founders decided to put them to work on the problem.

“We’re leveraging large language models in a very unique way to be able to analyze thousands and thousands of metrics, alerts, logs, traces and application configuration information in a matter of seconds and be able to diagnose what the health of the environment is, detect if there’s a problem and come up with a solution,” he said.

The company is essentially building a trusted digital assistant to the engineering team. “So it’s a digital co-worker that works alongside SREs and ITOps engineers, and monitors all of the alerts and logs looking for issues,” he said. The goal is to reduce the amount of time it takes to respond to and solve an incident from hours to minutes, and they believe that by putting generative AI to work on the problem, they can help companies achieve that goal. 

The founders understand the limitations of large language models, and are looking to reduce hallucinated or incorrect responses by using a limited set of data to train the models, and by setting up other systems that help deliver more accurate responses.

“Because we’re using this in a very controlled manner for a very specific use case for environments we know, we can cross check the results that are coming out of the AI, again through a vector database and see if it’s even making sense and if we’re not comfortable with it, we won’t recommend it to the user.”

Customers can connect directly to their various cloud systems by entering their credentials, and without moving data, NeuBird can use the access to cross-check against other available information to come up with a solution, reducing the overall difficulty associated with getting the company-specific data for the model to work with. 

NeuBird uses various models, including Llama 2 for analyzing logs and metrics. They are using Mistral for other types of analysis. The company actually turns every natural language interaction into a SQL query, essentially turning unstructured data into structured data. They believe this will result in greater accuracy. 

The early-stage startup is working with design and alpha partners right now refining the idea as they work to bring the product to market later this year. Rao says they took a big chunk of money out of the gate because they wanted the room to work on the problem without having to worry about looking for more money too soon.


Software Development in Sri Lanka

Robotic Automations

Google injects generative AI into its cloud security tools | TechCrunch


At its annual Cloud Next conference in Las Vegas, Google on Tuesday introduced new cloud-based security products and services — in addition to updates to existing products and services — aimed at customers managing large, multi-tenant corporate networks.

Many of the announcements had to do with Gemini, Google’s flagship family of generative AI models.

For example, Google unveiled Gemini in Threat Intelligence, a new Gemini-powered component of the company’s Mandiant cybersecurity platform. Now in public preview, Gemini in Threat Intelligence can analyze large portions of potentially malicious code and let users perform natural language searches for ongoing threats or indicators of compromise, as well as summarize open source intelligence reports from around the web.

“Gemini in Threat Intelligence now offers conversational search across Mandiant’s vast and growing repository of threat intelligence directly from frontline investigations,” Sunil Potti, GM of cloud security at Google, wrote in a blog post shared with TechCrunch. “Gemini will navigate users to the most relevant pages in the integrated platform for deeper investigation … Plus, [Google’s malware detection service] VirusTotal now automatically ingests OSINT reports, which Gemini summarizes directly in the platform.”

Elsewhere, Gemini can now assist with cybersecurity investigations in Chronicle, Google’s cybersecurity telemetry offering for cloud customers. Set to roll out by the end of the month, the new capability guides security analysts through their typical workflows, recommending actions based on the context of a security investigation, summarizing security event data and creating breach and exploit detection rules from a chatbot-like interface.

And in Security Command Center, Google’s enterprise cybersecurity and risk management suite, a new Gemini-driven feature lets security teams search for threats using natural language while providing summaries of misconfigurations, vulnerabilities and possible attack paths.

Rounding out the security updates were privileged access manager (in preview), a service that offers just-in-time, time-bound and approval-based access options designed to help mitigate risks tied to privileged access misuse. Google’s also rolling out principal access boundary (in preview, as well), which lets admins implement restrictions on network root-level users so that those users can only access authorized resources within a specifically defined boundary.

Lastly, Autokey (in preview) aims to simplify creating and managing customer encryption keys for high-security use cases, while Audit Manager (also in preview) provides tools for Google Cloud customers in regulated industries to generate proof of compliance for their workloads and cloud-hosted data.

“Generative AI offers tremendous potential to tip the balance in favor of defenders,” Potti wrote in the blog post. “And we continue to infuse AI-driven capabilities into our products.”

Google isn’t the only company attempting to productize generative AI–powered security tooling. Microsoft last year launched a set of services that leverage generative AI to correlate data on attacks while prioritizing cybersecurity incidents. Startups, including Aim Security, are also jumping into the fray, aiming to corner the nascent space.

But with generative AI’s tendency to make mistakes, it remains to be seen whether these tools have staying power.


Software Development in Sri Lanka

Robotic Automations

Intel and others commit to building open generative AI tools for the enterprise | TechCrunch


Can generative AI designed for the enterprise (e.g. AI that autocompletes reports, spreadsheet formulas and so on) ever be interoperable? Along with a coterie of organizations including Cloudera and Intel, the Linux Foundation — the nonprofit organization that supports and maintains a growing number of open source efforts — aim to find out.

The Linux Foundation today announced the launch of the Open Platform for Enterprise AI (OPEA), a project to foster the development of open, multi-provider and composable (i.e. modular) generative AI systems. Under the purview of the Linux Foundation’s LFAI and Data org, which focuses on AI- and data-related platform initiatives, OPEA’s goal will be to pave the way for the release of “hardened,” “scalable” generative AI systems that “harness the best open source innovation from across the ecosystem,” LFAI and Data executive director Ibrahim Haddad said in a press release.

“OPEA will unlock new possibilities in AI by creating a detailed, composable framework that stands at the forefront of technology stacks,” Haddad said. “This initiative is a testament to our mission to drive open source innovation and collaboration within the AI and data communities under a neutral and open governance model.”

In addition to Cloudera and Intel, OPEA — one of the Linux Foundation’s Sandbox Projects, an incubator program of sorts — counts among its members enterprise heavyweights like Intel, IBM-owned Red Hat, Hugging Face, Domino Data Lab, MariaDB and VMWare.

So what might they build together exactly? Haddad hints at a few possibilities, such as “optimized” support for AI toolchains and compilers, which enable AI workloads to run across different hardware components, as well as “heterogeneous” pipelines for retrieval-augmented generation (RAG).

RAG is becoming increasingly popular in enterprise applications of generative AI, and it’s not difficult to see why. Most generative AI models’ answers and actions are limited to the data on which they’re trained. But with RAG, a model’s knowledge base can be extended to info outside the original training data. RAG models reference this outside info — which can take the form of proprietary company data, a public database or some combination of the two — before generating a response or performing a task.

A diagram explaining RAG models.

Intel offered a few more details in its own press release:

Enterprises are challenged with a do-it-yourself approach [to RAG] because there are no de facto standards across components that allow enterprises to choose and deploy RAG solutions that are open and interoperable and that help them quickly get to market. OPEA intends to address these issues by collaborating with the industry to standardize components, including frameworks, architecture blueprints and reference solutions.

Evaluation will also be a key part of what OPEA tackles.

In its GitHub repository, OPEA proposes a rubric for grading generative AI systems along four axes: performance, features, trustworthiness and “enterprise-grade” readiness. Performance as OPEA defines it pertains to “black-box” benchmarks from real-world use cases. Features is an appraisal of a system’s interoperability, deployment choices and ease of use. Trustworthiness looks at an AI model’s ability to guarantee “robustness” and quality. And enterprise readiness focuses on the requirements to get a system up and running sans major issues.

Rachel Roumeliotis, director of open source strategy at Intel, says that OPEA will work with the open source community to offer tests based on the rubric — and provide assessments and grading of generative AI deployments on request.

OPEA’s other endeavors are a bit up in the air at the moment. But Haddad floated the potential of open model development along the lines of Meta’s expanding Llama family and Databricks’ DBRX. Toward that end, in the OPEA repo, Intel has already contributed reference implementations for an generative-AI-powered chatbot, document summarizer and code generator optimized for its Xeon 6 and Gaudi 2 hardware.

Now, OPEA’s members are very clearly invested (and self-interested, for that matter) in building tooling for enterprise generative AI. Cloudera recently launched partnerships to create what it’s pitching as an “AI ecosystem” in the cloud. Domino offers a suite of apps for building and auditing business-forward generative AI. And VMWare — oriented toward the infrastructure side of enterprise AI — last August rolled out new “private AI” compute products.

The question is — under OPEA — will these vendors actually work together to build cross-compatible AI tools?

There’s an obvious benefit to doing so. Customers will happily draw on multiple vendors depending on their needs, resources and budgets. But history has shown that it’s all too easy to become inclined toward vendor lock-in. Let’s hope that’s not the ultimate outcome here.


Software Development in Sri Lanka

Robotic Automations

Adobe's working on generative video, too | TechCrunch


Adobe says it’s building an AI model to generate video. But it’s not revealing when this model will launch, exactly — or much about it besides the fact that it exists.

Offered as an answer of sorts to OpenAI’s Sora, Google’s Imagen 2 and models from the growing number of startups in the nascent generative AI video space, Adobe’s model — a part of the company’s expanding Firefly family of generative AI products — will make its way into Premiere Pro, Adobe’s flagship video editing suite, sometime later this year, Adobe says.

Like many generative AI video tools today, Adobe’s model creates footage from scratch (either a prompt or reference images) — and it powers three new features in Premiere Pro: object addition, object removal and generative extend.

They’re pretty self-explanatory.

Object addition lets users select a segment of a video clip — the upper third, say, or lower left corner — and enter a prompt to insert objects within that segment. In a briefing with TechCrunch, an Adobe spokesperson showed a still of a real-world briefcase filled with diamonds generated by Adobe’s model.

AI-generated diamonds, courtesy of Adobe.

Object removal removes objects from clips, like boom mics or coffee cups in the background of a shot.

Removing objects with AI. Notice the results aren’t quite perfect.

As for generative extend, it adds a few frames to the beginning or end of a clip (unfortunately, Adobe wouldn’t say how many frames). Generative extend isn’t meant to create whole scenes, but rather add buffer frames to sync up with a soundtrack or hold on to a shot for an extra beat — for instance to add emotional heft.

Image Credits: Adobe

To address fears of deepfakes that inevitably crops up around generative AI tools such as these, Adobe says it’s bringing Content Credentials — metadata to identify AI-generated media — to Premiere. Content Credentials, a media provenance standard that Adobe backs through its Content Authenticity Initiative, were already in Photoshop and a component of Adobe’s image-generating Firefly models. In Premiere, they’ll indicate not only which content was AI-generated but which AI model was used to generate it.

I asked Adobe what data — images, videos and so on — were used to train the model. The company wouldn’t say, nor would it say how (or whether) it’s compensating contributors to the data set.

Last week, Bloomberg, citing sources familiar with the matter, reported that Adobe’s paying photographers and artists on its stock media platform, Adobe Stock, up to $120 for submitting short video clips to train its video generation model. The pay’s said to range from around $2.62 per minute of video to around $7.25 per minute depending on the submission, with higher-quality footage commanding correspondingly higher rates.

That’d be a departure from Adobe’s current arrangement with Adobe Stock artists and photographers whose work it’s using to train its image generation models. The company pays those contributors an annual bonus, not a one-time fee, depending on the volume of content they have in Stock and how it’s being used — albeit a bonus that’s subject to an opaque formula and not guaranteed from year to year.

Bloomberg’s reporting, if accurate, depicts an approach in stark contrast to that of generative AI video rivals like OpenAI, which is said to have scraped publicly available web data — including videos from YouTube — to train its models. YouTube’s CEO, Neal Mohan, recently said that use of YouTube videos to train OpenAI’s text-to-video generator would be an infraction of the platform’s terms of service, highlighting the legal tenuousness of OpenAI’s and others’ fair use argument.

Companies including OpenAI are being sued over allegations that they’re violating IP law by training their AI on copyrighted content without providing the owners credit or pay. Adobe seems to be intent on avoiding this end, like its sometime generative AI competition Shutterstock and Getty Images (which also have arrangements to license model training data), and — with its IP indemnity policy — positioning itself as a verifiably “safe” option for enterprise customers.

On the subject of payment, Adobe isn’t saying how much it’ll cost customers to use the upcoming video generation features in Premiere; presumably, pricing’s still being hashed out. But the company did reveal that the payment scheme will follow the generative credits system established with its early Firefly models.

For customers with a paid subscription to Adobe Creative Cloud, generative credits renew beginning each month, with allotments ranging from 25 to 1,000 per month depending on the plan. More complex workloads (e.g. higher-resolution generated images or multiple-image generations) require more credits, as a general rule.

The big question in my mind is, will Adobe’s AI-powered video features be worth whatever they end up costing?

The Firefly image generation models so far have been widely derided as underwhelming and flawed compared to Midjourney, OpenAI’s DALL-E 3 and other competing tools. The lack of release time frame on the video model doesn’t instill a lot of confidence that it’ll avoid the same fate. Neither does the fact that Adobe declined to show me live demos of object addition, object removal and generative extend — insisting instead on a prerecorded sizzle reel.

Perhaps to hedge its bets, Adobe says that it’s in talks with third-party vendors about integrating their video generation models into Premiere, as well, to power tools like generative extend and more.

One of those vendors is OpenAI.

Adobe says it’s collaborating with OpenAI on ways to bring Sora into the Premiere workflow. (An OpenAI tie-up makes sense given the AI startup’s overtures to Hollywood recently; tellingly, OpenAI CTO Mira Murati will be attending the Cannes Film Festival this year.) Other early partners include Pika, a startup building AI tools to generate and edit videos, and Runway, which was one of the first vendors market with a generative video model.

An Adobe spokesperson said the company would be open to working with others in the future.

Now, to be crystal clear, these integrations are more of a thought experiment than a working product at present. Adobe stressed to me repeatedly that they’re in “early preview” and “research” rather than a thing customers can expect to play with anytime soon.

And that, I’d say, captures the overall tone of Adobe’s generative video presser.

Adobe’s clearly trying to signal with these announcements that it’s thinking about generative video, if only in the preliminary sense. It’d be foolish not to — to be caught flat-footed in the generative AI race is to risk losing out on a valuable potential new revenue stream, assuming the economics eventually work out in Adobe’s favors. (AI models are costly to train, run and serve after all.)

But what it’s showing — concepts — isn’t super compelling frankly. With Sora in the wild and surely more innovations coming down the pipeline, the company has much to prove.


Software Development in Sri Lanka

Back
WhatsApp
Messenger
Viber