From Digital Age to Nano Age. WorldWide.

Tag: Generative AI

Robotic Automations

Adobe claims its new image generation model is its best yet | TechCrunch

Firefly, Adobe’s family of generative AI models, doesn’t have the best reputation among creatives.

The Firefly image generation model in particular has been derided as underwhelming and flawed compared to Midjourney, OpenAI’s DALL-E 3, and other rivals, with a tendency to distort limbs and landscapes and miss the nuances in prompts. But Adobe is trying to right the ship with its third-generation model, Firefly Image 3, releasing this week during the company’s Max London conference.

The model, now available in Photoshop (beta) and Adobe’s Firefly web app, produces more “realistic” imagery than its predecessor (Image 2) and its predecessor’s predecessor (Image 1) thanks to an ability to understand longer, more complex prompts and scenes as well as improved lighting and text generation capabilities. It should more accurately render things like typography, iconography, raster images and line art, says Adobe, and is “significantly” more adept at depicting dense crowds and people with “detailed features” and “a variety of moods and expressions.”

For what it’s worth, in my brief unscientific testing, Image 3 does appear to be a step up from Image 2.

I wasn’t able to try Image 3 myself. But Adobe PR sent a few outputs and prompts from the model, and I managed to run those same prompts through Image 2 on the web to get samples to compare the Image 3 outputs with. (Keep in mind that the Image 3 outputs could’ve been cherry-picked.)

Notice the lighting in this headshot from Image 3 compared to the one below it, from Image 2:

From Image 3. Prompt: “Studio portrait of young woman.”

Same prompt as above, from Image 2.

The Image 3 output looks more detailed and lifelike to my eyes, with shadowing and contrast that’s largely absent from the Image 2 sample.

Here’s a set of images showing Image 3’s scene understanding at play:

From Image 3. Prompt: “An artist in her studio sitting at desk looking pensive with tons of paintings and ethereal.”

Same prompt as above. From Image 2.

Note the Image 2 sample is fairly basic compared to the output from Image 3 in terms of the level of detail — and overall expressiveness. There’s wonkiness going on with the subject in the Image 3 sample’s shirt (around the waist area), but the pose is more complex than the subject’s from Image 2. (And Image 2’s clothes are also a bit off.)

Some of Image 3’s improvements can no doubt be traced to a larger and more diverse training data set.

Like Image 2 and Image 1, Image 3 is trained on uploads to Adobe Stock, Adobe’s royalty-free media library, along with licensed and public domain content for which the copyright has expired. Adobe Stock grows all the time, and consequently so too does the available training data set.

In an effort to ward off lawsuits and position itself as a more “ethical” alternative to generative AI vendors who train on images indiscriminately (e.g. OpenAI, Midjourney), Adobe has a program to pay Adobe Stock contributors to the training data set. (We’ll note that the terms of the program are rather opaque, though.) Controversially, Adobe also trains Firefly models on AI-generated images, which some consider a form of data laundering.

Recent Bloomberg reporting revealed AI-generated images in Adobe Stock aren’t excluded from Firefly image-generating models’ training data, a troubling prospect considering those images might contain regurgitated copyrighted material. Adobe has defended the practice, claiming that AI-generated images make up only a small portion of its training data and go through a moderation process to ensure they don’t depict trademarks or recognizable characters or reference artists’ names.

Of course, neither diverse, more “ethically” sourced training data nor content filters and other safeguards guarantee a perfectly flaw-free experience — see users generating people flipping the bird with Image 2. The real test of Image 3 will come once the community gets its hands on it.

New AI-powered features

Image 3 powers several new features in Photoshop beyond enhanced text-to-image.

A new “style engine” in Image 3, along with a new auto-stylization toggle, allows the model to generate a wider array of colors, backgrounds and subject poses. They feed into Reference Image, an option that lets users condition the model on an image whose colors or tone they want their future generated content to align with.

Three new generative tools — Generate Background, Generate Similar and Enhance Detail — leverage Image 3 to perform precision edits on images. The (self-descriptive) Generate Background replaces a background with a generated one that blends into the existing image, while Generate Similar offers variations on a selected portion of a photo (a person or an object, for example). As for Enhance Detail, it “fine-tunes” images to improve sharpness and clarity.

If these features sound familiar, that’s because they’ve been in beta in the Firefly web app for at least a month (and Midjourney for much longer than that). This marks their Photoshop debut — in beta.

Speaking of the web app, Adobe isn’t neglecting this alternate route to its AI tools.

To coincide with the release of Image 3, the Firefly web app is getting Structure Reference and Style Reference, which Adobe’s pitching as new ways to “advance creative control.” (Both were announced in March, but they’re now becoming widely available.) With Structure Reference, users can generate new images that match the “structure” of a reference image — say, a head-on view of a race car. Style Reference is essentially style transfer by another name, preserving the content of an image (e.g. elephants in the African Safari) while mimicking the style (e.g. pencil sketch) of a target image.

Here’s Structure Reference in action:

Original image.

Transformed with Structure Reference.

And Style Reference:

Original image.

Transformed with Style Reference.

I asked Adobe if, with all the upgrades, Firefly image generation pricing would change. Currently, the cheapest Firefly premium plan is $4.99 per month — undercutting competition like Midjourney ($10 per month) and OpenAI (which gates DALL-E 3 behind a $20-per-month ChatGPT Plus subscription).

Adobe said that its current tiers will remain in place for now, along with its generative credit system. It also said that its indemnity policy, which states Adobe will pay copyright claims related to works generated in Firefly, won’t be changing either, nor will its approach to watermarking AI-generated content. Content Credentials — metadata to identify AI-generated media — will continue to be automatically attached to all Firefly image generations on the web and in Photoshop, whether generated from scratch or partially edited using generative features.

Software Development in Sri Lanka

Robotic Automations

Hugging Face releases a benchmark for testing generative AI on health tasks | TechCrunch

Generative AI models are increasingly being brought to healthcare settings — in some cases prematurely, perhaps. Early adopters believe that they’ll unlock increased efficiency while revealing insights that’d otherwise be missed. Critics, meanwhile, point out that these models have flaws and biases that could contribute to worse health outcomes.

But is there a quantitative way to know how helpful, or harmful, a model might be when tasked with things like summarizing patient records or answering health-related questions?

Hugging Face, the AI startup, proposes a solution in a newly released benchmark test called Open Medical-LLM. Created in partnership with researchers at the nonprofit Open Life Science AI and the University of Edinburgh’s Natural Language Processing Group, Open Medical-LLM aims to standardize evaluating the performance of generative AI models on a range of medical-related tasks.

Open Medical-LLM isn’t a from-scratch benchmark, per se, but rather a stitching-together of existing test sets — MedQA, PubMedQA, MedMCQA and so on — designed to probe models for general medical knowledge and related fields, such as anatomy, pharmacology, genetics and clinical practice. The benchmark contains multiple choice and open-ended questions that require medical reasoning and understanding, drawing from material including U.S. and Indian medical licensing exams and college biology test question banks.

“[Open Medical-LLM] enables researchers and practitioners to identify the strengths and weaknesses of different approaches, drive further advancements in the field and ultimately contribute to better patient care and outcome,” Hugging Face wrote in a blog post.

Image Credits: Hugging Face

Hugging Face is positioning the benchmark as a “robust assessment” of healthcare-bound generative AI models. But some medical experts on social media cautioned against putting too much stock into Open Medical-LLM, lest it lead to ill-informed deployments.

On X, Liam McCoy, a resident physician in neurology at the University of Alberta, pointed out that the gap between the “contrived environment” of medical question-answering and actual clinical practice can be quite large.

Hugging Face research scientist Clémentine Fourrier, who co-authored the blog post, agreed.

“These leaderboards should only be used as a first approximation of which [generative AI model] to explore for a given use case, but then a deeper phase of testing is always needed to examine the model’s limits and relevance in real conditions,” Fourrier replied on X. “Medical [models] should absolutely not be used on their own by patients, but instead should be trained to become support tools for MDs.”

It brings to mind Google’s experience when it tried to bring an AI screening tool for diabetic retinopathy to healthcare systems in Thailand.

Google created a deep learning system that scanned images of the eye, looking for evidence of retinopathy, a leading cause of vision loss. But despite high theoretical accuracy, the tool proved impractical in real-world testing, frustrating both patients and nurses with inconsistent results and a general lack of harmony with on-the-ground practices.

It’s telling that of the 139 AI-related medical devices the U.S. Food and Drug Administration has approved to date, none use generative AI. It’s exceptionally difficult to test how a generative AI tool’s performance in the lab will translate to hospitals and outpatient clinics, and, perhaps more importantly, how the outcomes might trend over time.

That’s not to suggest Open Medical-LLM isn’t useful or informative. The results leaderboard, if nothing else, serves as a reminder of just how poorly models answer basic health questions. But Open Medical-LLM, and no other benchmark for that matter, is a substitute for carefully thought-out real-world testing.

Software Development in Sri Lanka

Robotic Automations

Meta releases Llama 3, claims it's among the best open models available | TechCrunch

Meta has released the latest entry in its Llama series of open source generative AI models: Llama 3. Or, more accurately, the company has open sourced two models in its new Llama 3 family, with the rest to come at an unspecified future date.

Meta describes the new models — Llama 3 8B, which contains 8 billion parameters, and Llama 3 70B, which contains 70 billion parameters — as a “major leap” compared to the previous-gen Llama models, Llama 2 8B and Llama 2 70B, performance-wise. (Parameters essentially define the skill of an AI model on a problem, like analyzing and generating text; higher-parameter-count models are, generally speaking, more capable than lower-parameter-count models.) In fact, Meta says that, for their respective parameter counts, Llama 3 8B and Llama 3 70B — trained on two custom-built 24,000 GPU clusters — are are among the best-performing generative AI models available today.

That’s quite a claim to make. So how is Meta supporting it? Well, the company points to the Llama 3 models’ scores on popular AI benchmarks like MMLU (which attempts to measure knowledge), ARC (which attempts to measure skill acquisition) and DROP (which tests a model’s reasoning over chunks of text). As we’ve written about before, the usefulness — and validity — of these benchmarks is up for debate. But for better or worse, they remain one of the few standardized ways by which AI players like Meta evaluate their models.

Llama 3 8B bests other open source models like Mistral’s Mistral 7B and Google’s Gemma 7B, both of which contain 7 billion parameters, on at least nine benchmarks: MMLU, ARC, DROP, GPQA (a set of biology-, physics- and chemistry-related questions), HumanEval (a code generation test), GSM-8K (math word problems), MATH (another mathematics benchmark), AGIEval (a problem-solving test set) and BIG-Bench Hard (a commonsense reasoning evaluation).

Now, Mistral 7B and Gemma 7B aren’t exactly on the bleeding edge (Mistral 7B was released last September), and in a few of benchmarks Meta cites, Llama 3 8B scores only a few percentage points higher than either. But Meta also makes the claim that the larger-parameter-count Llama 3 model, Llama 3 70B, is competitive with flagship generative AI models including Gemini 1.5 Pro, the latest in Google’s Gemini series.

Image Credits: Meta

Llama 3 70B beats Gemini 1.5 Pro on MMLU, HumanEval and GSM-8K, and — while it doesn’t rival Anthropic’s most performant model, Claude 3 Opus — Llama 3 70B scores better than the weakest model in the Claude 3 series, Claude 3 Sonnet, on five benchmarks (MMLU, GPQA, HumanEval, GSM-8K and MATH).

Image Credits: Meta

For what it’s worth, Meta also developed its own test set covering use cases ranging from coding and creating writing to reasoning to summarization, and — surprise! — Llama 3 70B came out on top against Mistral’s Mistral Medium model, OpenAI’s GPT-3.5 and Claude Sonnet. Meta says that it gated its modeling teams from accessing the set to maintain objectivity, but obviously — given that Meta itself devised the test — the results have to be taken with a grain of salt.

Image Credits: Meta

More qualitatively, Meta says that users of the new Llama models should expect more “steerability,” a lower likelihood to refuse to answer questions, and higher accuracy on trivia questions, questions pertaining to history and STEM fields such as engineering and science and general coding recommendations. That’s in part thanks to a much larger data set: a collection of 15 trillion tokens, or a mind-boggling ~750,000,000,000 words — seven times the size of the Llama 2 training set. (In the AI field, “tokens” refers to subdivided bits of raw data, like the syllables “fan,” “tas” and “tic” in the word “fantastic.”)

Where did this data come from? Good question. Meta wouldn’t say, revealing only that it drew from “publicly available sources,” included four times more code than in the Llama 2 training data set, and that 5% of that set has non-English data (in ~30 languages) to improve performance on languages other than English. Meta also said it used synthetic data — i.e. AI-generated data — to create longer documents for the Llama 3 models to train on, a somewhat controversial approach due to the potential performance drawbacks.

“While the models we’re releasing today are only fine tuned for English outputs, the increased data diversity helps the models better recognize nuances and patterns, and perform strongly across a variety of tasks,” Meta writes in a blog post shared with TechCrunch.

Many generative AI vendors see training data as a competitive advantage and thus keep it and info pertaining to it close to the chest. But training data details are also a potential source of IP-related lawsuits, another disincentive to reveal much. Recent reporting revealed that Meta, in its quest to maintain pace with AI rivals, at one point used copyrighted ebooks for AI training despite the company’s own lawyers’ warnings; Meta and OpenAI are the subject of an ongoing lawsuit brought by authors including comedian Sarah Silverman over the vendors’ alleged unauthorized use of copyrighted data for training.

So what about toxicity and bias, two other common problems with generative AI models (including Llama 2)? Does Llama 3 improve in those areas? Yes, claims Meta.

Meta says that it developed new data-filtering pipelines to boost the quality of its model training data, and that it’s updated its pair of generative AI safety suites, Llama Guard and CybersecEval, to attempt to prevent the misuse of and unwanted text generations from Llama 3 models and others. The company’s also releasing a new tool, Code Shield, designed to detect code from generative AI models that might introduce security vulnerabilities.

Filtering isn’t foolproof, though — and tools like Llama Guard, CybersecEval and Code Shield only go so far. (See: Llama 2’s tendency to make up answers to questions and leak private health and financial information.) We’ll have to wait and see how the Llama 3 models perform in the wild, inclusive of testing from academics on alternative benchmarks.

Meta says that the Llama 3 models — which are available for download now, and powering Meta’s Meta AI assistant on Facebook, Instagram, WhatsApp, Messenger and the web — will soon be hosted in managed form across a wide range of cloud platforms including AWS, Databricks, Google Cloud, Hugging Face, Kaggle, IBM’s WatsonX, Microsoft Azure, Nvidia’s NIM and Snowflake. In the future, versions of the models optimized for hardware from AMD, AWS, Dell, Intel, Nvidia and Qualcomm will also be made available.

And more capable models are on the horizon.

Meta says that it’s currently training Llama 3 models over 400 billion parameters in size — models with the ability to “converse in multiple languages,” take more data in and understand images and other modalities as well as text, which would bring the Llama 3 series in line with open releases like Hugging Face’s Idefics2.

Image Credits: Meta

“Our goal in the near future is to make Llama 3 multilingual and multimodal, have longer context and continue to improve overall performance across core [large language model] capabilities such as reasoning and coding,” Meta writes in a blog post. “There’s a lot more to come.”


Software Development in Sri Lanka

Robotic Automations

NeuBird is building a generative AI solution for complex cloud-native environments | TechCrunch

NeuBird founders Goutham Rao and Vinod Jayaraman came from Portworx, a cloud-native storage solution they eventually sold to PureStorage in 2019 for $370 million. It was their third successful exit. 

When they went looking for their next startup challenge last year, they saw an opportunity to combine their cloud-native knowledge, especially around IT operations, with the burgeoning area of generative AI. 

Today NeuBird announced a $22 million investment from Mayfield to get the idea to market. It’s a hefty amount for an early-stage startup, but the firm is likely banking on the founders’ experience to build another successful company.

Rao, the CEO, says that while the cloud-native community has done a good job at solving a lot of difficult problems, it has created increasing levels of complexity along the way. 

“We’ve done an incredible job as a community over the past 10-plus years building cloud-native architectures with service-oriented designs. This added a lot of layers, which is good. That’s a proper way to design software, but this also came at a cost of increased telemetry. There’s just too many layers in the stack,” Rao told TechCrunch.

They concluded that this level of data was making it impossible for human engineers to find, diagnose and solve problems at scale inside large organizations. At the same time, large language models were beginning to mature, so the founders decided to put them to work on the problem.

“We’re leveraging large language models in a very unique way to be able to analyze thousands and thousands of metrics, alerts, logs, traces and application configuration information in a matter of seconds and be able to diagnose what the health of the environment is, detect if there’s a problem and come up with a solution,” he said.

The company is essentially building a trusted digital assistant to the engineering team. “So it’s a digital co-worker that works alongside SREs and ITOps engineers, and monitors all of the alerts and logs looking for issues,” he said. The goal is to reduce the amount of time it takes to respond to and solve an incident from hours to minutes, and they believe that by putting generative AI to work on the problem, they can help companies achieve that goal. 

The founders understand the limitations of large language models, and are looking to reduce hallucinated or incorrect responses by using a limited set of data to train the models, and by setting up other systems that help deliver more accurate responses.

“Because we’re using this in a very controlled manner for a very specific use case for environments we know, we can cross check the results that are coming out of the AI, again through a vector database and see if it’s even making sense and if we’re not comfortable with it, we won’t recommend it to the user.”

Customers can connect directly to their various cloud systems by entering their credentials, and without moving data, NeuBird can use the access to cross-check against other available information to come up with a solution, reducing the overall difficulty associated with getting the company-specific data for the model to work with. 

NeuBird uses various models, including Llama 2 for analyzing logs and metrics. They are using Mistral for other types of analysis. The company actually turns every natural language interaction into a SQL query, essentially turning unstructured data into structured data. They believe this will result in greater accuracy. 

The early-stage startup is working with design and alpha partners right now refining the idea as they work to bring the product to market later this year. Rao says they took a big chunk of money out of the gate because they wanted the room to work on the problem without having to worry about looking for more money too soon.

Software Development in Sri Lanka

Robotic Automations

Google Cloud Next 2024: Watch the keynote on Gemini AI, enterprise reveals right here | TechCrunch

It’s time for Google’s annual look up to the cloud, this time with a big dose of AI.

At 9 a.m. PT Tuesday, Google Cloud CEO Thomas Kurian kicked off the opening keynote for this year’s Google Cloud Next event, and you can watch the archive of their reveals above, or right here.

After this week we’ll know more about Google’s attempts to help the enterprise enter the age of AI. From a deeper dive into Gemini, the company’s AI-powered chatbot, to securing AI products and implementing generative AI into cloud applications, Google will continue to cover a wide range of topics.

We’re also keeping tabs on everything Google’s announcing at Cloud Next 2024, from Google Vids to Gemini Code Assist to Google Workspace updates.

And for those more interested in Google’s details and reveals for developers, their Developer Keynote started off at 11:30am PT Wednesday, and you can catch up on that full stream right here or via the embed below.

Software Development in Sri Lanka

Robotic Automations

New Google Vids product helps create a customized video with an AI assist | TechCrunch

All of the major vendors have been looking at ways to use AI to help customers develop creative content. On Tuesday at the Google Cloud Next customer conference in Las Vegas, Google introduced a new AI-fueled video creation tool called Google Vids. The tool will become part of the Google Workspace productivity suite when it’s released.

“I want to share something really entirely new. At Google Cloud Next, we’re unveiling Google Vids, a brand new, AI-powered video creation app for work,” Aparna Pappu, VP & GM at Google Workspace said, introducing the tool.

Image Credits: Frederic Lardinois/TechCrunch

The idea is to provide a video creation tool alongside other Workspace tools like Docs and Sheets with a similar ability to create and collaborate in the browser, except in this case, on video. “This is your video editing, writing and production assistant, all in one,” Pappu said. “We help transform the assets you already have — whether marketing copy or images or whatever else in your drive — into a compelling video.”

Like other Google Workspace tools, you can collaborate with colleagues in real time in the browser. “No need to email files back and forth. You and your team can work on the story at the same time with all the same access controls and security that we provide for all of Workspace,” she said.

Image Credits: Google Cloud

Examples of the kinds of videos people are creating with Google Vids include product pitches, training content or celebratory team videos. Like most generative AI tooling, Google Vids starts with a prompt. You enter a description of what you want the video to look like. You can then access files in your Google Drive or use stock content provided by Google and the AI goes to work, creating a storyboard of the video based on your ideas.

You can then reorder the different parts of the video, add transitions, select a template and insert an audio track where you record the audio or add a script and a preset voice will read it. Once you’re satisfied, you can generate the video. Along the way colleagues can comment or make changes, just as with any Google Workspace tool.

Google Vids is currently in limited testing. In June it will roll out to additional testers in Google Labs and will eventually be available for customers with Gemini for Workspace subscriptions.

Image Credits: Frederic Lardinois/TechCrunch

Software Development in Sri Lanka

Robotic Automations

Google releases Imagen 2, a video clip generator | TechCrunch

Google doesn’t have the best track record when it comes to image-generating AI.

In February, the image generator built into Gemini, Google’s AI-powered chatbot, was found to be randomly injecting gender and racial diversity into prompts about people, resulting in images of racially diverse Nazis, among other offensive inaccuracies.

Google pulled the generator, vowing to improve it and eventually re-release it. As we await its return, the company’s launching an enhanced image-generating tool, Imagen 2, inside its Vertex AI developer platform — albeit a tool with a decidedly more enterprise bent. Google announced Imagen 2 at its annual Cloud Next conference in Las Vegas.

Image Credits: Frederic Lardinois/TechCrunch

Imagen 2 — which is actually a family of models, launched in December after being previewed at Google’s I/O conference in May 2023 — can create and edit images given a text prompt, like OpenAI’s DALL-E and Midjourney. Of interest to corporate types, Imagen 2 can render text, emblems and logos in multiple languages, optionally overlaying those elements in existing images — for example, onto business cards, apparel and products.

After launching first in preview, image editing with Imagen 2 is now generally available in Vertex AI along with two new capabilities: inpainting and outpainting. Inpainting and outpainting, features other popular image generators such as DALL-E have offered for some time, can be used to remove unwanted parts of an image, add new components and expand the borders of an image to create a wider field of view.

But the real meat of the Imagen 2 upgrade is what Google’s calling “text-to-live images.”

Imagen 2 can now create short, four-second videos from text prompts, along the lines of AI-powered clip generation tools like Runway, Pika and Irreverent Labs. True to Imagen 2’s corporate focus, Google’s pitching live images as a tool for marketers and creatives, such as a GIF generator for ads showing nature, food and animals — subject matter that Imagen 2 was fine-tuned on.

Google says that live images can capture “a range of camera angles and motions” while “supporting consistency over the entire sequence.” But they’re in low resolution for now: 360 pixels by 640 pixels. Google’s pledging that this will improve in the future. 

To allay (or at least attempt to allay) concerns around the potential to create deepfakes, Google says that Imagen 2 will employ SynthID, an approach developed by Google DeepMind, to apply invisible, cryptographic watermarks to live images. Of course, detecting these watermarks — which Google claims are resilient to edits, including compression, filters and color tone adjustments — requires a Google-provided tool that’s not available to third parties.

And no doubt eager to avoid another generative media controversy, Google’s emphasizing that live image generations will be “filtered for safety.” A spokesperson told TechCrunch via email: “The Imagen 2 model in Vertex AI has not experienced the same issues as the Gemini app. We continue to test extensively and engage with our customers.”

Image Credits: Frederic Lardinois/TechCrunch

But generously assuming for a moment that Google’s watermarking tech, bias mitigations and filters are as effective as it claims, are live images even competitive with the video generation tools already out there?

Not really.

Runway can generate 18-second clips in much higher resolutions. Stability AI’s video clip tool, Stable Video Diffusion, offers greater customizability (in terms of frame rate). And OpenAI’s Sora — which, granted, isn’t commercially available yet — appears poised to blow away the competition with the photorealism it can achieve.

So what are the real technical advantages of live images? I’m not really sure. And I don’t think I’m being too harsh.

After all, Google is behind genuinely impressive video generation tech like Imagen Video and Phenaki. Phenaki, one of Google’s more interesting experiments in text-to-video, turns long, detailed prompts into two-minute-plus “movies” — with the caveat that the clips are low resolution, low frame rate and only somewhat coherent.

In light of recent reports suggesting that the generative AI revolution caught Google CEO Sundar Pichai off guard and that the company’s still struggling to maintain pace with rivals, it’s not surprising that a product like live images feels like an also-ran. But it’s disappointing nonetheless. I can’t help the feeling that there is — or was — a more impressive product lurking in Google’s skunkworks.

Models like Imagen are trained on an enormous number of examples usually sourced from public sites and datasets around the web. Many generative AI vendors see training data as a competitive advantage and thus keep it and info pertaining to it close to the chest. But training data details are also a potential source of IP-related lawsuits, another disincentive to reveal much.

I asked, as I always do around announcements pertaining to generative AI models, about the data that was used to train the updated Imagen 2, and whether creators whose work might’ve been swept up in the model training process will be able to opt out at some future point.

Google told me only that its models are trained “primarily” on public web data, drawn from “blog posts, media transcripts and public conversation forums.” Which blogs, transcripts and forums? It’s anyone’s guess.

A spokesperson pointed to Google’s web publisher controls that allow webmasters to prevent the company from scraping data, including photos and artwork, from their websites. But Google wouldn’t commit to releasing an opt-out tool or, alternatively, compensating creators for their (unknowing) contributions — a step that many of its competitors, including OpenAI, Stability AI and Adobe, have taken.

Another point worth mentioning: Text-to-live images isn’t covered by Google’s generative AI indemnification policy, which protects Vertex AI customers from copyright claims related to Google’s use of training data and outputs of its generative AI models. That’s because text-to-live images is technically in preview; the policy only covers generative AI products in general availability (GA).

Regurgitation, or where a generative model spits out a mirror copy of an example (e.g., an image) that it was trained on, is rightly a concern for corporate customers. Studies both informal and academic have shown that the first-gen Imagen wasn’t immune to this, spitting out identifiable photos of people, artists’ copyrighted works and more when prompted in particular ways.

Barring controversies, technical issues or some other major unforeseen setbacks, text-to-live images will enter GA somewhere down the line. But with live images as it exists today, Google’s basically saying: use at your own risk.

Software Development in Sri Lanka

Robotic Automations

Live selling startup CommentSold uses AI to generate shoppable, social-ready clips | TechCrunch

CommentSold, the e-commerce tech startup that provides web and video tools to online retailers, launched a new generative AI-powered tool on Wednesday that can sift through livestreamed footage and generate short product explainer videos for sellers to post to their website, app and social media platforms.

The “AI ClipHero” feature creates short clips from livestreamed selling events, which often last for hours. Instead of retailers rewatching content and scouring for relevant clips to edit and post, CommentSold’s new tool saves them some time by automatically identifying the most interesting parts of the livestream for customers who missed the event to get a brief summary of the products. The tool also uses speech recognition to generate captions.

“Shoppable ‘explainer’ videos are the most powerful video commerce medium right now, with TikTok and Instagram becoming the primary way Gen Z discovers, learns about products and purchases products. However, creating shoppable videos [requires] significant production times,” CommentSold CEO Guatam Goswami told TechCrunch.

Image Credits: CommentSold

AI-powered clipping software isn’t new, but not many companies have developed AI-powered tools specifically designed for live commerce. Various startups (Powder, Eklipse, and others), though, have introduced similar features for content creators to capture highlights from gaming streams.

“Companies like TikTok and Twitch have been trying to create AI that can create shoppable videos from live-stream events … CommentSold is now the first provider to launch a commercially available AI, which learns from millions of hours of livestreams in CommentSold’s library to identify and create product explainer videos from livestream selling events,” Goswami said.

In addition to its AI ClipHero feature, CommentSold recently rolled out PopClips, which allows retailers to tag products in a banner at the bottom of each clip to direct customers to the product page and drive more sales. The company also provides tools for custom website and mobile app building, as well as systems to automate inventory, invoices, shipping, and more.

Since launching in 2017, CommentSold now helps over 7,000 small- and mid-sized businesses deliver live shopping and e-commerce experiences. According to the company, it has facilitated the sale of over 180 million items with more than $4.4 billion in lifetime gross merchandise value (GMV), up from $3.8 billion in 2023.

Software Development in Sri Lanka

Robotic Automations

Google injects generative AI into its cloud security tools | TechCrunch

At its annual Cloud Next conference in Las Vegas, Google on Tuesday introduced new cloud-based security products and services — in addition to updates to existing products and services — aimed at customers managing large, multi-tenant corporate networks.

Many of the announcements had to do with Gemini, Google’s flagship family of generative AI models.

For example, Google unveiled Gemini in Threat Intelligence, a new Gemini-powered component of the company’s Mandiant cybersecurity platform. Now in public preview, Gemini in Threat Intelligence can analyze large portions of potentially malicious code and let users perform natural language searches for ongoing threats or indicators of compromise, as well as summarize open source intelligence reports from around the web.

“Gemini in Threat Intelligence now offers conversational search across Mandiant’s vast and growing repository of threat intelligence directly from frontline investigations,” Sunil Potti, GM of cloud security at Google, wrote in a blog post shared with TechCrunch. “Gemini will navigate users to the most relevant pages in the integrated platform for deeper investigation … Plus, [Google’s malware detection service] VirusTotal now automatically ingests OSINT reports, which Gemini summarizes directly in the platform.”

Elsewhere, Gemini can now assist with cybersecurity investigations in Chronicle, Google’s cybersecurity telemetry offering for cloud customers. Set to roll out by the end of the month, the new capability guides security analysts through their typical workflows, recommending actions based on the context of a security investigation, summarizing security event data and creating breach and exploit detection rules from a chatbot-like interface.

And in Security Command Center, Google’s enterprise cybersecurity and risk management suite, a new Gemini-driven feature lets security teams search for threats using natural language while providing summaries of misconfigurations, vulnerabilities and possible attack paths.

Rounding out the security updates were privileged access manager (in preview), a service that offers just-in-time, time-bound and approval-based access options designed to help mitigate risks tied to privileged access misuse. Google’s also rolling out principal access boundary (in preview, as well), which lets admins implement restrictions on network root-level users so that those users can only access authorized resources within a specifically defined boundary.

Lastly, Autokey (in preview) aims to simplify creating and managing customer encryption keys for high-security use cases, while Audit Manager (also in preview) provides tools for Google Cloud customers in regulated industries to generate proof of compliance for their workloads and cloud-hosted data.

“Generative AI offers tremendous potential to tip the balance in favor of defenders,” Potti wrote in the blog post. “And we continue to infuse AI-driven capabilities into our products.”

Google isn’t the only company attempting to productize generative AI–powered security tooling. Microsoft last year launched a set of services that leverage generative AI to correlate data on attacks while prioritizing cybersecurity incidents. Startups, including Aim Security, are also jumping into the fray, aiming to corner the nascent space.

But with generative AI’s tendency to make mistakes, it remains to be seen whether these tools have staying power.

Software Development in Sri Lanka

Robotic Automations

Google's Gemini comes to databases | TechCrunch

Google wants Gemini, its family of generative AI models, to power your app’s databases — in a sense.

At its annual Cloud Next conference in Las Vegas, Google announced the public preview of Gemini in Databases, a collection of features underpinned by Gemini to — as the company pitched it — “simplify all aspects of the database journey.” In less jargony language, Gemini in Databases is a bundle of AI-powered, developer-focused tools for Google Cloud customers who are creating, monitoring and migrating app databases.

One piece of Gemini in Databases is Database Studio, an editor for structured query language (SQL), the language used to store and process data in relational databases. Built into the Google Cloud console, Database Studio can generate, summarize and fix certain errors with SQL code, Google says, in addition to offering general SQL coding suggestions through a chatbot-like interface.

Joining Database Studio under the Gemini in Databases brand umbrella is AI-assisted migrations via Google’s existing Database Migration Service. Google’s Gemini models can convert database code and deliver explanations of those changes along with recommendations, according to Google.

Elsewhere, in Google’s new Database Center — yet another Gemini in Databases component — users can interact with databases using natural language and can manage a fleet of databases with tools to assess their availability, security and privacy compliance. And should something go wrong, those users can ask a Gemini-powered bot to offer troubleshooting tips.

“Gemini in Databases enables customer to easily generate SQL; additionally, they can now manage, optimize and govern entire fleets of databases from a single pane of glass; and finally, accelerate database migrations with AI-assisted code conversions,” Andi Gutmans, GM of databases at Google Cloud, wrote in a blog post shared with TechCrunch. “Imagine being able to ask questions like ‘Which of my production databases in east Asia had missing backups in the last 24 hours?’ or ‘How many PostgreSQL resources have a version higher than 11?’ and getting instant insights about your entire database fleet.”

That assumes, of course, that the Gemini models don’t make mistakes from time to time — which is no guarantee.

Regardless, Google’s forging ahead, bringing Gemini to Looker, its business intelligence tool, as well.

Launching in private preview, Gemini in Looker lets users “chat with their business data,” as Google describes it in a blog post. Integrated with Workspace, Google’s suite of enterprise productivity tools, Gemini in Looker spans features such as conversational analytics; report, visualization and formula generation; and automated Google Slide presentation generation. 

I’m curious to see if Gemini in Looker’s report and presentation generation work reliably well. Generative AI models don’t exactly have a reputation for accuracy, after all, which could lead to embarrassing, or even mission-critical, mistakes. We’ll find out as Cloud Next continues into the week with any luck.

Gemini in Databases could be perceived as a response of sorts to top rival Microsoft’s recently launched Copilot in Azure SQL Database, which brought generative AI to Microsoft’s existing fully managed cloud database service. Microsoft is looking to stay a step ahead in the budding AI-driven database race and has also worked to build generative AI with Azure Data Studio, the company’s set of enterprise data management and development tools.

Software Development in Sri Lanka