From Digital Age to Nano Age. WorldWide.

Tag: model

Robotic Automations

Snowflake releases a flagship generative AI model of its own | TechCrunch


All-around, highly generalizable generative AI models were the name of the game once, and they arguably still are. But increasingly, as cloud vendors large and small join the generative AI fray, we’re seeing a new crop of models focused on the deepest-pocketed potential customers: the enterprise.

Case in point: Snowflake, the cloud computing company, today unveiled Arctic LLM, a generative AI model that’s described as “enterprise-grade.” Available under an Apache 2.0 license, Arctic LLM is optimized for “enterprise workloads,” including generating database code, Snowflake says, and is free for research and commercial use.

“I think this is going to be the foundation that’s going to let us — Snowflake — and our customers build enterprise-grade products and actually begin to realize the promise and value of AI,” CEO Sridhar Ramaswamy said in press briefing. “You should think of this very much as our first, but big, step in the world of generative AI, with lots more to come.”

An enterprise model

My colleague Devin Coldewey recently wrote about how there’s no end in sight to the onslaught of generative AI models. I recommend you read his piece, but the gist is: Models are an easy way for vendors to drum up excitement for their R&D and they also serve as a funnel to their product ecosystems (e.g., model hosting, fine-tuning and so on).

Arctic LLM is no different. Snowflake’s flagship model in a family of generative AI models called Arctic, Arctic LLM — which took around three months, 1,000 GPUs and $2 million to train — arrives on the heels of Databricks’ DBRX, a generative AI model also marketed as optimized for the enterprise space.

Snowflake draws a direct comparison between Arctic LLM and DBRX in its press materials, saying Arctic LLM outperforms DBRX on the two tasks of coding (Snowflake didn’t specify which programming languages) and SQL generation. The company said Arctic LLM is also better at those tasks than Meta’s Llama 2 70B (but not the more recent Llama 3 70B) and Mistral’s Mixtral-8x7B.

Snowflake also claims that Arctic LLM achieves “leading performance” on a popular general language understanding benchmark, MMLU. I’ll note, though, that while MMLU purports to evaluate generative models’ ability to reason through logic problems, it includes tests that can be solved through rote memorization, so take that bullet point with a grain of salt.

“Arctic LLM addresses specific needs within the enterprise sector,” Baris Gultekin, head of AI at Snowflake, told TechCrunch in an interview, “diverging from generic AI applications like composing poetry to focus on enterprise-oriented challenges, such as developing SQL co-pilots and high-quality chatbots.”

Arctic LLM, like DBRX and Google’s top-performing generative model of the moment, Gemini 1.5 Pro, is a mixture of experts (MoE) architecture. MoE architectures basically break down data processing tasks into subtasks and then delegate them to smaller, specialized “expert” models. So, while Arctic LLM contains 480 billion parameters, it only activates 17 billion at a time — enough to drive the 128 separate expert models. (Parameters essentially define the skill of an AI model on a problem, like analyzing and generating text.)

Snowflake claims that this efficient design enabled it to train Arctic LLM on open public web data sets (including RefinedWeb, C4, RedPajama and StarCoder) at “roughly one-eighth the cost of similar models.”

Running everywhere

Snowflake is providing resources like coding templates and a list of training sources alongside Arctic LLM to guide users through the process of getting the model up and running and fine-tuning it for particular use cases. But, recognizing that those are likely to be costly and complex undertakings for most developers (fine-tuning or running Arctic LLM requires around eight GPUs), Snowflake’s also pledging to make Arctic LLM available across a range of hosts, including Hugging Face, Microsoft Azure, Together AI’s model-hosting service, and enterprise generative AI platform Lamini.

Here’s the rub, though: Arctic LLM will be available first on Cortex, Snowflake’s platform for building AI- and machine learning-powered apps and services. The company’s unsurprisingly pitching it as the preferred way to run Arctic LLM with “security,” “governance” and scalability.

Our dream here is, within a year, to have an API that our customers can use so that business users can directly talk to data,” Ramaswamy said. “It would’ve been easy for us to say, ‘Oh, we’ll just wait for some open source model and we’ll use it. Instead, we’re making a foundational investment because we think [it’s] going to unlock more value for our customers.”

So I’m left wondering: Who’s Arctic LLM really for besides Snowflake customers?

In a landscape full of “open” generative models that can be fine-tuned for practically any purpose, Arctic LLM doesn’t stand out in any obvious way. Its architecture might bring efficiency gains over some of the other options out there. But I’m not convinced that they’ll be dramatic enough to sway enterprises away from the countless other well-known and -supported, business-friendly generative models (e.g. GPT-4).

There’s also a point in Arctic LLM’s disfavor to consider: its relatively small context.

In generative AI, context window refers to input data (e.g. text) that a model considers before generating output (e.g. more text). Models with small context windows are prone to forgetting the content of even very recent conversations, while models with larger contexts typically avoid this pitfall.

Arctic LLM’s context is between ~8,000 and ~24,000 words, dependent on the fine-tuning method — far below that of models like Anthropic’s Claude 3 Opus and Google’s Gemini 1.5 Pro.

Snowflake doesn’t mention it in the marketing, but Arctic LLM almost certainly suffers from the same limitations and shortcomings as other generative AI models — namely, hallucinations (i.e. confidently answering requests incorrectly). That’s because Arctic LLM, along with every other generative AI model in existence, is a statistical probability machine — one that, again, has a small context window. It guesses based on vast amounts of examples which data makes the most “sense” to place where (e.g. the word “go” before “the market” in the sentence “I go to the market”). It’ll inevitably guess wrong — and that’s a “hallucination.”

As Devin writes in his piece, until the next major technical breakthrough, incremental improvements are all we have to look forward to in the generative AI domain. That won’t stop vendors like Snowflake from championing them as great achievements, though, and marketing them for all they’re worth.


Software Development in Sri Lanka

Robotic Automations

Tesla launches new Model 3 Performance variant to rev up demand | TechCrunch


Tesla has officially revealed a new Performance variant of the recently-refreshed Model 3 sedan as the company looks to fight off receding demand.

The new version of the Model 3, which starts at $52,990, has a new active damping system and adaptive suspension for better handling and comfort, 296 miles of battery range and can travel from 0 to 60 miles per hour in 2.9 seconds with 510 horsepower on offer.

Compared to the previous Model 3 Performance, the new version has 32% more peak power and 16% more peak torque, and 5% less drag. It does all this while consuming less energy than its predecessor, according to Tesla. That’s thanks in part to a new-generation drive unit, and also a rear diffuser and spoiler. The front and rear ends of the car have also benefited from a slight facelift, separating it from the other versions of the newly-tweaked Model 3 revealed last year.

The Model 3 Performance still carries with it the wholesale changes made with that recent refresh. That means there’s an ambient light bar wrapping around the cabin interior, better sound dampening and upgraded materials throughout, a stalk-less steering wheel and a new touchscreen display.

Tesla is launching the new Model 3 Performance at a time when the company is coming off one of its worst quarters for deliveries in recent memory, having dropped 20% compared to the fourth quarter of 2023. The impact of that disappointing first quarter is set to be revealed Tuesday when the company publishes its financial results after the market closes.

Tesla is also just one week removed from announcing sweeping layoffs of more than 10% to its global workforce, with the cuts affecting seemingly all corners of the company.

Orders placed Tuesday, at least at the time of publication, show an estimated delivery window of May/June 2024 in North America.


Software Development in Sri Lanka

Robotic Automations

Adobe claims its new image generation model is its best yet | TechCrunch


Firefly, Adobe’s family of generative AI models, doesn’t have the best reputation among creatives.

The Firefly image generation model in particular has been derided as underwhelming and flawed compared to Midjourney, OpenAI’s DALL-E 3, and other rivals, with a tendency to distort limbs and landscapes and miss the nuances in prompts. But Adobe is trying to right the ship with its third-generation model, Firefly Image 3, releasing this week during the company’s Max London conference.

The model, now available in Photoshop (beta) and Adobe’s Firefly web app, produces more “realistic” imagery than its predecessor (Image 2) and its predecessor’s predecessor (Image 1) thanks to an ability to understand longer, more complex prompts and scenes as well as improved lighting and text generation capabilities. It should more accurately render things like typography, iconography, raster images and line art, says Adobe, and is “significantly” more adept at depicting dense crowds and people with “detailed features” and “a variety of moods and expressions.”

For what it’s worth, in my brief unscientific testing, Image 3 does appear to be a step up from Image 2.

I wasn’t able to try Image 3 myself. But Adobe PR sent a few outputs and prompts from the model, and I managed to run those same prompts through Image 2 on the web to get samples to compare the Image 3 outputs with. (Keep in mind that the Image 3 outputs could’ve been cherry-picked.)

Notice the lighting in this headshot from Image 3 compared to the one below it, from Image 2:

From Image 3. Prompt: “Studio portrait of young woman.”

Same prompt as above, from Image 2.

The Image 3 output looks more detailed and lifelike to my eyes, with shadowing and contrast that’s largely absent from the Image 2 sample.

Here’s a set of images showing Image 3’s scene understanding at play:

From Image 3. Prompt: “An artist in her studio sitting at desk looking pensive with tons of paintings and ethereal.”

Same prompt as above. From Image 2.

Note the Image 2 sample is fairly basic compared to the output from Image 3 in terms of the level of detail — and overall expressiveness. There’s wonkiness going on with the subject in the Image 3 sample’s shirt (around the waist area), but the pose is more complex than the subject’s from Image 2. (And Image 2’s clothes are also a bit off.)

Some of Image 3’s improvements can no doubt be traced to a larger and more diverse training data set.

Like Image 2 and Image 1, Image 3 is trained on uploads to Adobe Stock, Adobe’s royalty-free media library, along with licensed and public domain content for which the copyright has expired. Adobe Stock grows all the time, and consequently so too does the available training data set.

In an effort to ward off lawsuits and position itself as a more “ethical” alternative to generative AI vendors who train on images indiscriminately (e.g. OpenAI, Midjourney), Adobe has a program to pay Adobe Stock contributors to the training data set. (We’ll note that the terms of the program are rather opaque, though.) Controversially, Adobe also trains Firefly models on AI-generated images, which some consider a form of data laundering.

Recent Bloomberg reporting revealed AI-generated images in Adobe Stock aren’t excluded from Firefly image-generating models’ training data, a troubling prospect considering those images might contain regurgitated copyrighted material. Adobe has defended the practice, claiming that AI-generated images make up only a small portion of its training data and go through a moderation process to ensure they don’t depict trademarks or recognizable characters or reference artists’ names.

Of course, neither diverse, more “ethically” sourced training data nor content filters and other safeguards guarantee a perfectly flaw-free experience — see users generating people flipping the bird with Image 2. The real test of Image 3 will come once the community gets its hands on it.

New AI-powered features

Image 3 powers several new features in Photoshop beyond enhanced text-to-image.

A new “style engine” in Image 3, along with a new auto-stylization toggle, allows the model to generate a wider array of colors, backgrounds and subject poses. They feed into Reference Image, an option that lets users condition the model on an image whose colors or tone they want their future generated content to align with.

Three new generative tools — Generate Background, Generate Similar and Enhance Detail — leverage Image 3 to perform precision edits on images. The (self-descriptive) Generate Background replaces a background with a generated one that blends into the existing image, while Generate Similar offers variations on a selected portion of a photo (a person or an object, for example). As for Enhance Detail, it “fine-tunes” images to improve sharpness and clarity.

If these features sound familiar, that’s because they’ve been in beta in the Firefly web app for at least a month (and Midjourney for much longer than that). This marks their Photoshop debut — in beta.

Speaking of the web app, Adobe isn’t neglecting this alternate route to its AI tools.

To coincide with the release of Image 3, the Firefly web app is getting Structure Reference and Style Reference, which Adobe’s pitching as new ways to “advance creative control.” (Both were announced in March, but they’re now becoming widely available.) With Structure Reference, users can generate new images that match the “structure” of a reference image — say, a head-on view of a race car. Style Reference is essentially style transfer by another name, preserving the content of an image (e.g. elephants in the African Safari) while mimicking the style (e.g. pencil sketch) of a target image.

Here’s Structure Reference in action:

Original image.

Transformed with Structure Reference.

And Style Reference:

Original image.

Transformed with Style Reference.

I asked Adobe if, with all the upgrades, Firefly image generation pricing would change. Currently, the cheapest Firefly premium plan is $4.99 per month — undercutting competition like Midjourney ($10 per month) and OpenAI (which gates DALL-E 3 behind a $20-per-month ChatGPT Plus subscription).

Adobe said that its current tiers will remain in place for now, along with its generative credit system. It also said that its indemnity policy, which states Adobe will pay copyright claims related to works generated in Firefly, won’t be changing either, nor will its approach to watermarking AI-generated content. Content Credentials — metadata to identify AI-generated media — will continue to be automatically attached to all Firefly image generations on the web and in Photoshop, whether generated from scratch or partially edited using generative features.




Software Development in Sri Lanka

Robotic Automations

Poe introduces a price-per-message revenue model for AI bot creators | TechCrunch


Bot creators now have a new way to make money with Poe, the Quora-owned AI chatbot platform. On Monday, the company introduced a revenue model that allows creators to set a per-message price for their bots so they can make money whenever a user messages them. The addition follows an October 2023 release of a revenue-sharing program that would give bot creators a cut of the earnings when their users subscribed to Poe’s premium product.

First launched by Quora in February 2023, Poe offers users the ability to sample a variety of AI chatbots, including those from ChatGPT maker OpenAI, Anthropic, Google, and others. The idea is to give consumers an easy way to toy with new AI technologies all in one place while also giving Quora a potential source of new content.

The company’s revenue models offer a new twist on the creator economy by rewarding AI enthusiasts who generate “prompt bots,” as well as developer-built server bots that integrate with Poe’s AI.

Last fall, Quora announced it would begin a revenue-sharing program with bot creators and said it would “soon” open up the option for creators to set a per-message fee on their bots. Although it’s been nearly 5 months since that announcement — hardly “soon” — the latter is now going live.

Quora CEO Adam D’Angelo explained on Monday that Poe users will only see message points for each bot, which encompasses the same points they have as either a free user or Poe subscriber. However, creators will be paid in dollars, he said.

“This pricing mechanism is important for developers with substantial model inference or API costs,” D’Angelo noted in a post on X. “Our goal is to enable a thriving ecosystem of model developers and bot creators who build on top of models, and covering these operational costs is a key part of that,” he added.

The new revenue model could spur the development of new kinds of bots, including in areas like tutoring, knowledge, assistants, analysis, storytelling, and image generation, D’Angelo believes.

The offering is currently available to U.S. bot creators only but will expand globally in the future. It joins the creator monetization program that pays up to $20 per user who subscribes to Poe thanks to a creator’s bots.

Alongside the per-message revenue model, Poe also launched an enhanced analytics dashboard that displays average earnings for creators’ bots across paywalls, subscriptions, and messages. Its insights are updated daily and will allow creators to get a better handle on how their pricing drives bot usage and revenue.




Software Development in Sri Lanka

Robotic Automations

Google open sources tools to support AI model development | TechCrunch


In a typical year, Cloud Next — one of Google’s two major annual developer conferences, the other being I/O — almost exclusively features managed and otherwise closed source, gated-behind-locked-down-APIs products and services. But this year, whether to foster developer goodwill or advance its ecosystem ambitions (or both), Google debuted a number of open source tools primarily aimed at supporting generative AI projects and infrastructure.

The first, MaxDiffusion, which Google actually quietly released in February, is a collection of reference implementations of various diffusion models — models like the image generator Stable Diffusion — that run on XLA devices. “XLA” stands for Accelerated Linear Algebra, an admittedly awkward acronym referring to a technique that optimizes and speeds up specific types of AI workloads, including fine-tuning and serving.

Google’s own tensor processing units (TPUs) are XLA devices, as are recent Nvidia GPUs.

Beyond MaxDiffusion, Google’s launching JetStream, a new engine to run generative AI models — specifically text-generating models (so not Stable Diffusion). Currently limited to supporting TPUs with GPU compatibility supposedly coming in the future, JetStream offers up to 3x higher “performance per dollar” for models like Google’s own Gemma 7B and Meta’s Llama 2, Google claims.

“As customers bring their AI workloads to production, there’s an increasing demand for a cost-efficient inference stack that delivers high performance,” Mark Lohmeyer, Google Cloud’s GM of compute and machine learning infrastructure, wrote in a blog post shared with TechCrunch. “JetStream helps with this need … and includes optimizations for popular open models such as Llama 2 and Gemma.”

Now, “3x” improvement is quite a claim to make, and it’s not exactly clear how Google arrived at that figure. Using which generation of TPU? Compared to which baseline engine? And how’s “performance” being defined here, anyway?

I’ve asked Google all these questions and will update this post if I hear back.

Second-to-last on the list of Google’s open source contributions are new additions to MaxText, Google’s collection of text-generating AI models targeting TPUs and Nvidia GPUs in the cloud. MaxText now includes Gemma 7B, OpenAI’s GPT-3 (the predecessor to GPT-4), Llama 2 and models from AI startup Mistral — all of which Google says can be customized and fine-tuned to developers’ needs.

We’ve heavily optimized [the models’] performance on TPUs and also partnered closely with Nvidia to optimize performance on large GPU clusters,” Lohmeyer said. “These improvements maximize GPU and TPU utilization, leading to higher energy efficiency and cost optimization.”

Finally, Google’s collaborated with Hugging Face, the AI startup, to create Optimum TPU, which provides tooling to bring certain AI workloads to TPUs. The goal is to reduce the barrier to entry for getting generative AI models onto TPU hardware, according to Google — in particular text-generating models.

But at present, Optimum TPU is a bit bare-bones. The only model it works with is Gemma 7B. And Optimum TPU doesn’t yet support training generative models on TPUs — only running them.

Google’s promising improvements down the line.


Software Development in Sri Lanka

Robotic Automations

Vista Equity to take revenue optimization platform Model N private in $1.25B deal | TechCrunch


Model N, a platform used by companies such as Johnson & Johnson, AstraZeneca, and AMD to automate decisions related to pricing, incentives, and compliance, is going private in a $1.25 billion deal with private equity firm Vista Equity Partners. The acquisition underscores how PE firms continue to scoop up tech companies that have struggled to perform well in public markets in the last couple of years.

Vista Equity is doling out $30 per share in the all-cash transaction, representing a 12% premium on Friday’s closing price, and 16% on its 30-day average.

This is Vista Equity’s fifth such acquisition in the past 18 months, following Avalara ($8.4 billion); KnowBe4 ($4.6 billion); Duck Creek Technologies ($2.6 billion); and EngageSmart ($4 billion).

Founded in 1999, Model N’s software integrates with various data sources and internal systems to help companies analyze trends, pricing efficacy, market demand, and more. The platform is typically used in industries such as pharmaceuticals and life sciences, where there may be complex pricing structures, and where regulatory or market changes can impact business.

The San Mateo-headquartered company went public on the New York Stock Exchange (NYSE) in 2013, and it has generally performed well in the intervening years — particularly since around 2019, when its market cap steadily started to increase, hitting an all-time high of $1.6 billion last year. However, its valuation has generally hovered below the $1 billion market for the past six months, sparking Vista Equity Partners into action today.

Vista said that it expects the transaction to close in the middle of 2024, though it is of course subject to the usual conditions, including shareholder approval.


Software Development in Sri Lanka

Robotic Automations

Tesla slashes Model Y inventory prices by as much as $7,000 | TechCrunch


Tesla is dropping prices of unsold Model Y SUVs in the U.S. by thousands of dollars in an attempt to clear out an unprecedented backlog of inventory.

Many long-range and performance Model Ys are now selling for $5,000 less than their original price, while rear-wheel drive versions are seeing even bigger cuts of more than $7,000.

The discounts come as Tesla once again made far more vehicles than it sold in the last quarter. The company built 433,371 vehicles in the first quarter but only shipped 386,810, likely adding more than 40,000 EVs to its inventory glut. (Some of those vehicles were likely in transit, though Tesla didn’t say how many.) The company has built more cars than it shipped in seven of the last eight quarters, Bloomberg News noted Friday.

In January, Tesla warned sales growth could be “notably lower” in 2024 compared to previous years — a trend that has bothered every player in the market from big automakers like Ford to struggling upstarts like Lucid.

Tesla went through a typical end-of-quarter push to deliver as many cars as it could over the last few weeks, with lead designer Franz von Holzhausen once again pitching in to get them out the door in the final days. But Tesla also tried to boost sales in other ways. It announced a $1,000 price hike was coming to the Model Y, its most popular vehicle, on April 1. Tesla CEO Elon Musk also started mandating demos of the company’s advanced driver assistance system to all potential buyers. That software package costs $12,000 and can be a huge boost to the profit Tesla makes on a vehicle.

Musk has more or less admitted that Tesla has had to work harder to drum up demand for its vehicles lately. He has largely blamed the struggle on high interest rates, all while his company dramatically cut prices on the Model Y and Model 3 throughout 2023.




Software Development in Sri Lanka

Robotic Automations

OpenAI announces Tokyo office and GPT-4 model optimized for the Japanese language | TechCrunch


OpenAI is expanding to Japan, announcing today a new Tokyo hub and plans for a GPT-4 model optimized specifically for the Japanese language.

The ChatGPT-maker opened its first international office in London last year, followed by its inaugural European Union (EU) office in Dublin a few months later. Tokyo will represent OpenAI’s first office in Asia and fourth globally (including its San Francisco HQ), with CEO Sam Altman highlighting Japan’s “rich history of people and technology coming together to do more” among its reasons for setting up a formal presence in the region.

OpenAI’s global expansion efforts so far have been strategic, insofar as the U.K. is a major hub for AI talent while the EU is currently driving the AI regulatory agenda. Japan, meanwhile, is also positioned prominently in the AI revolution, most recently as the G7 chair and President of the G7’s Hiroshima AI Process, an initiative centered around AI governance and pushing for safe and trustworthy AI.

Its choice on who will head up its new Japanese business is also notable. OpenAI Japan will be led by Tadao Nagasaki, who joins the company from Amazon Web Services (AWS), where he headed up Amazon’s cloud computing subsidiary in the region for the past 12 years — so it’s clear that OpenAI is really targeting the enterprise segment with this latest expansion.

Enterprising

As President of OpenAI Japan, Nagasaki will be tasked with building a local team on the ground and double down on OpenAI’s recent growth in Japan which has seen it secure customers including Daikin, Rakuten, and Toyota which are using ChatGPT’s enterprise-focused incarnation which sports additional privacy, data analysis, and customization options on top of the standard consumer-grade ChatGPT.

OpenAI says ChatGPT is also already being used by local governments to “improve the efficiency of public services in Japan.”

GPT-4 customized for Japanese Image Credits: OpenAI

While ChatGPT has long been conversant in multiple languages, including Japanese, optimizing the latest version of the underlying GPT large language model (LLM) for Japanese will give it enhanced understanding of the nuances within the Japanese language, including cultural comprehension which should make it more effective particularly in business settings such as customer service and content creation. OpenAI also says that the custom model comes with improved performance, which means it should perform faster and be more cost effective than its predecessor.

For now, OpenAI is giving early access to the GPT-4 custom model to some local businesses, with access gradually opened up via the OpenAI API “in the coming months.”


Software Development in Sri Lanka

Robotic Automations

OctoAI wants to make private AI model deployments easier with OctoStack | TechCrunch


OctoAI (formerly known as OctoML), announced the launch of OctoStack, its new end-to-end solution for deploying generative AI models in a company’s private cloud, be that on-premises or in a virtual private cloud from one of the major vendors, including AWS, Google, Microsoft and Azure, as well as CoreWeave, Lambda Labs, Snowflake and others.

In its early days, OctoAI focused almost exclusively on optimizing models to run more effectively. Based on the Apache TVM machine learning compiler framework, the company then launched its TVM-as-a-Service platform and, over time, expanded that into a fully fledged model-serving offering that combined its optimization chops with a DevOps platform. With the rise of generative AI, the team then launched the fully managed OctoAI platform to help its users serve and fine-tune existing models. OctoStack, at its core, is that OctoAI platform, but for private deployments.

Image Credits: OctoAI

OctoAI CEO and co-founder Luis Ceze told me the company has over 25,000 developers on the platform and hundreds of paying customers who use it in production. A lot of these companies, Ceze said, are GenAI-native companies. The market of traditional enterprises wanting to adopt generative AI is significantly larger, though, so it’s maybe no surprise that OctoAI is now going after them as well with OctoStack.

“One thing that became clear is that, as the enterprise market is going from experimentation last year to deployments, one, all of them are looking around because they’re nervous about sending data over an API,” Ceze said. “Two: a lot of them have also committed their own compute, so why am I going to buy an API when I already have my own compute? And three, no matter what certifications you get and how big of a name you have, they feel like their AI is precious like their data and they don’t want to send it over. So there’s this really clear need in the enterprise to have the deployment under your control.”

Ceze noted that the team had been building out the architecture to offer both its SaaS and hosted platform for a while now. And while the SaaS platform is optimized for Nvidia hardware, OctoStack can support a far wider range of hardware, including AMD GPUs and AWS’s Inferentia accelerator, which in turn makes the optimization challenge quite a bit harder (while also playing to OctoAI’s strengths).

Deploying OctoStack should be straightforward for most enterprises, as OctoAI delivers the platform with read-to-go containers and their associated Helm charts for deployments. For developers, the API remains the same, no matter whether they are targeting the SaaS product or OctoAI in their private cloud.

The canonical enterprise use case remains using text summarization and RAG to allow users to chat with their internal documents, but some companies are also fine-tuning these models on their internal code bases to run their own code generation models (similar to what GitHub now offers to Copilot Enterprise users).

For many enterprises, being able to do that in a secure environment that is strictly under their control is what now enables them to put these technologies into production for their employees and customers.

“For our performance- and security-sensitive use case, it is imperative that the models which process calls data run in an environment that offers flexibility, scale and security,” said Dali Kaafar, founder and CEO at Apate AI. “OctoStack lets us easily and efficiently run the customized models we need, within environments that we choose, and deliver the scale our customers require.”


Software Development in Sri Lanka

Back
WhatsApp
Messenger
Viber