From Digital Age to Nano Age. WorldWide.

Tag: genAI

Robotic Automations

EU warns Microsoft it could be fined billions over missing GenAI risk info | TechCrunch


The European Union has warned Microsoft that it could be fined up to 1% of its global annual turnover under the bloc’s online governance regime, the Digital Services Act (DSA), after the company failed to respond to a legally binding request for information (RFI) that focused on its generative AI tools. Back in March, the […]

© 2024 TechCrunch. All rights reserved. For personal use only.


Software Development in Sri Lanka

Robotic Automations

Anon is building an automated authentication layer for the GenAI age | TechCrunch


As the notion of the AI agent begins to take hold, and more tasks will be completed without a human involved, it is going to require a new kind of authentication to make sure only agents with the proper approval can access particular resources. Anon, an early-stage startup, is helping developers add automated authentication in a safe and secure way.

On Wednesday, the company announced a $6.5 million investment — and that the product is now generally available to all.

The founders came up with the idea for this company out of necessity. Their first idea was actually building an AI agent, but CEO Daniel Mason says they quickly came up against a problem around authentication — simply put, how to enter a username and password automatically and safely. “We kept running into this hard edge of customers wanting us to do X, but we couldn’t do X unless we had this delegated authentication system,” Mason told TechCrunch.

He began asking around about how other AI startups were handling authentication, and found there weren’t really any good answers. “In fact, a lot of the solutions, people that were using, were actually quite a bit less secure. They were mostly inheriting authentication credentials from a user’s local machine or browser-based permissions,” he said.

And as they explored this problem more in depth, they realized that this was in fact a better idea for a company than their original AI agent idea. At this point, they pivoted to becoming a developer tool for building an automated authentication layer designed for AI-driven applications and workflows. The solution is delivered in the form of a software development kit (SDK), and lets developers incorporate authentication for a specific service with a few lines of code. “We want to sit at that authentication level and really build access permissioning, and our customers are specifically the developers,” Mason said.

The company is addressing security concerns about an automated authentication tool by working toward building a zero trust architecture where they protect the credentials in a few key ways. For starters, they never control the credentials themselves; those are held by the end user. There is also an encryption layer, where half the key is held by the user and half by Anon, and it requires both to unlock the encryption. Finally, the user always has ultimate control.

“Our platform is such that as a user, when I grant access, I still maintain control of that session — I’m the ultimate holder of the password, user Name, 2FA — and so even in the event of our system, or even a customer system getting compromised, they do not have access to those root credentials,” company co-founder and CTO Kai Aichholz said.

The founders recognize that other companies, large and small, will probably take a stab at solving this problem, but they are banking on a head start and a broad vision to help them stave off eventual competitors. “We’re looking to basically become a one-stop integration platform where you can come and build these actions and build the automation and know that you’re doing it in a way that’s secure and your end user credentials are secure and the automations are going to happen,” Mason said.

The $6.5 million investment breaks down into two tranches: a pre-seed of around $2 million at launch and a seed that closed at the end of last year for around $4.5 million. Investors include Union Square Ventures and Abstract Ventures, which led the seed, and Impatient Ventures and ex/ante, which led the pre-seed, along with several industry angels.


Software Development in Sri Lanka

Robotic Automations

UK's antitrust enforcer sounds the alarm over Big Tech's grip on GenAI | TechCrunch


The U.K.’s competition watchdog, Competition and Markets Authority (CMA), has sounded a warning over Big Tech’s entrenching grip on the advanced AI market, with CEO Sarah Cardell expressing “real concerns” over how the sector is developing.

In an Update Paper on foundational AI models published Thursday, the CMA cautioned over increasing interconnection and concentration between developers in the cutting-edge tech sector responsible for the boom in generative AI tools.

The CMA’s paper points to the recurring presence of Google, Amazon, Microsoft, Meta and Apple (aka GAMMA) across the AI value chain: compute, data, model development, partnerships, release and distribution platforms. And while the regulator also emphasized that it recognizes that partnership arrangements “can play a pro-competitive role in the technology ecosystem,” it coupled that with a warning that “powerful partnerships and integrated firms” can pose risks to competition that run counter to open markets.

Image Credits: CMA’s Foundation Models. Update Paper

“We are concerned that the FM [foundational model] sector is developing in ways that risk negative market outcomes,” the CMA wrote, referencing a type of AI that’s developed with large amounts of data and compute power and may be used to underpin a variety of applications.

“In particular, the growing presence across the FM value chain of a small number of incumbent technology firms, which already hold positions of market power in many of today’s most important digital markets, could profoundly shape FM-related markets to the detriment of fair, open and effective competition, ultimately harming businesses and consumers, for example by reducing choice and quality, and by raising prices,” it warned.

The CMA undertook an initial review of the top end of the AI market last May and went on to publish a set of principles for “responsible” generative AI development that it said would guide its oversight of the fast-moving market. Although, Will Hayter, senior director of the CMA’s Digital Markets Unit, told TechCrunch last fall that it was not in a rush to regulate advanced AI because it wanted to give the market a chance to develop.

Since then, the watchdog has stepped in to scrutinize the cozy relationship between OpenAI, the developer behind the viral AI chatbot ChatGPT, and Microsoft, a major investor in OpenAI. Its update paper remarks on the giddy pace of change in the market. For example, it flagged research by the U.K.’s internet regulator, Ofcom, in a report last year that found 31% of adults and 79% of 13- to 17-year-olds in the U.K. have used a generative AI tool, such as ChatGPT, Snapchat My AI or Bing Chat (aka Copilot). So there are signs the CMA is revising its initial chillaxed position on the GenAI market amid the commercial “whirlwind” sucking up compute, data and talent.

Its Update Paper identifies three “key interlinked risks to fair, effective, and open competition,” as it puts it, which the omnipresence of GAMMA speaks to: (1) Firms controlling “critical inputs” for developing foundational models (known as general-purpose AI models), which might allow them to restrict access and build a moat against competition; (2) tech giants’ ability to exploit dominant positions in consumer- or business-facing markets to distort choice for GenAI services and restrict competition in deployment of these tools; and (3) partnerships involving key players, which the CMA says “could exacerbate existing positions of market power through the value chain.”

Image Credits: CMA

In a speech delivered Thursday in Washington, D.C., at a legal event focused on generative AI, Cardell pointed to the “winner-take-all dynamics” seen in earlier web dev eras, when Big Tech built and entrenched their Web 2.0 empires while regulators sat on their heels. She said it’s important that competition enforcers don’t repeat the same mistakes with this next generation of digital development.

“The benefits we wish to see flowing from [advanced AI], for businesses and consumers, in terms of quality, choice and price, and the very best innovations, are much more likely in a world where those firms are themselves subject to fair, open and effective competition, rather than one where they are simply able to leverage foundation models to further entrench and extend their existing positions of power in digital markets,” she said, adding: “So we believe it is important to act now to ensure that a small number of firms with unprecedented market power don’t end up in a position to control not just how the most powerful models are designed and built, but also how they are embedded and used across all parts of our economy and our lives.”

How is the CMA going to intervene at the top end of the AI market? It does not have concrete measures to announce, as yet, but Cardell said it’s closely tracking GAMMA’s partnerships and stepping up its use of merger review to see whether any of these arrangements fall within existing merger rules.

That would unlock formal powers of investigation, and even the ability to block connections it deems anti-competitive. But for now the CMA has not gone that far, despite clear and growing concerns about cozy GAMMA GenAI ties. Its review of the links between OpenAI and Microsoft — for example, to determine whether the partnership constitutes a “relevant merger situation” — continues.

“Some of these arrangements are quite complex and opaque, meaning we may not have sufficient information to assess this risk without using our merger control powers to build that understanding,” Cardell also told the audience, explaining the challenges of trying to understand the power dynamics of the AI market without unlocking formal merger review powers. “It may be that some arrangements falling outside the merger rules are problematic, even if not ultimately remediable through merger control. They may even have been structured by the parties to seek to avoid the scope of merger rules.  Equally some arrangements may not give rise to competition concerns.”

“By stepping up our merger review, we hope to gain more clarity over which types of partnerships and arrangements may fall within the merger rules, and under what circumstances competition concerns may arise — and that clarity will also benefit the businesses themselves,” she added.

The CMA’s Update report sets out some “indicative factors,” which Cardell said may trigger greater concern about and attention to FM partnerships, such as the upstream power of the partners, over AI inputs; and the downstream power, over distribution channels. She also said the watchdog will be looking closely at the nature of the partnership and the level of “influence and alignment of incentives” between partners.

Meanwhile, the U.K. regulator is urging AI giants to follow the seven development principles it set out last fall to steer market developments onto responsible rails where competition and consumer protection are baked in. (The short version of what it wants to see is: accountability, access, diversity, choice, flexibility, fair dealing, and transparency.)

“We’re committed to applying the principles we have developed and to using all legal powers at our disposal — now and in the future — to ensure that this transformational and structurally critical technology delivers on its promise,” Cardell said in a statement.


Software Development in Sri Lanka

Robotic Automations

What is Elon Musk's Grok chatbot and how does it work? | TechCrunch


You might’ve heard of Grok, X’s answer to OpenAI’s ChatGPT. It’s a chatbot and, in that sense, behaves as you’d expect — answering questions about current events, pop culture and so on. But unlike other chatbots, Grok has “a bit of wit,” as X owner Elon Musk puts it, and “a rebellious streak.”

Long story short, Grok is willing to speak to topics that are usually off-limits to other chatbots, like polarizing political theories and conspiracies. And it’ll use less-than-polite language while doing so — for example, responding to the question “When is it appropriate to listen to Christmas music?” with “Whenever the hell you want.”

But ostensibly, Grok’s biggest selling point is its ability to access real-time X data — an ability no other chatbots have, thanks to X’s decision to gatekeep that data. Ask it “What’s happening in AI today?” and Grok will piece together a response from very recent headlines, while ChatGPT will provide only vague answers that reflect the limits of its training data (and filters on its web access). Earlier this week, Musk pledged that he would open source Grok, without revealing precisely what that meant.

So, you’re probably wondering: How does Grok work? What can it do? And how can I access it? You’ve come to the right place. We’ve put together this handy guide to help explain all things Grok. We’ll keep it up to date as Grok changes and evolves.

How does Grok work?

Grok is the invention of xAI, Elon Musk’s AI startup — a company reportedly in the process of raising billions in venture capital. (Developing AI is expensive.)

Underpinning Grok is a generative AI model called Grok-1, developed over the course of months on a cluster of “tens of thousands” of GPUs (according to an xAI blog post). To train it, xAI sourced data from the web (dated up to Q3 2023) and from feedback from human assistants that xAI refers to as “AI tutors.”

On popular benchmarks, Grok-1 is about as capable as Meta’s open source Llama 2 chatbot model and surpasses OpenAI’s GPT-3.5, xAI claims.

Image Credits: xAI

Human-guided feedback, or reinforcement learning from human feedback (RLHF), is the way most AI-powered chatbots are fine-tuned these days. RLHF involves training a generative model, then gathering additional information to train a “reward” model and fine-tuning the generative model with the reward model via reinforcement learning.

RLHF is quite good at “teaching” models to follow instructions — but not perfect. Like other models, Grok is prone to hallucinating, sometimes offering misinformation and false timelines when asked about news. And these can be severe — like wrongly claiming that the Israel–Palestine conflict reached a cease-fire when it hadn’t.

For questions that stretch beyond its knowledge base, Grok leverages “real-time access” to info on X (and from Tesla, according to Bloomberg). And, similar to ChatGPT, the model has internet-browsing capabilities, enabling it to search the web for up-to-date information about topics.

Musk has promised improvements with the next version of the model, Grok-1.5, set to arrive later this year.

Grok-1.5, which features an upgraded context window (see this post on GPT-4 for an explanation of context windows and their effects), could drive features to summarize whole threads and replies, Musk said in an X Spaces conversation, and suggest post content.

How do I access Grok?

To get access to Grok, you need to have an X account. You also need to fork over $16 per month — $168 per year — for an X Premium+ plan.

X Premium+ is the highest-priced subscription on X, as it removes all the ads in the For You and Following feeds. In addition, Premium+ introduces a hub where users can get paid to post and offer fans subscriptions, and Premium+ users have their replies boosted the most in X’s rankings.

Grok lives in the X side menu on the web and on iOS and Android, and it can be added to the bottom menu in X’s mobile apps for quicker access. Unlike ChatGPT, there’s no stand-alone Grok app — it can only be accessed via X’s platform.

What can — and can’t — Grok do?

Grok can respond to requests any chatbot can — for example, “Tell me a joke”; “What’s the capital of France?”; “What’s the weather like today?”; and so on. But it has its limits.

Grok will refuse to answer certain questions of a more sensitive nature, like “Tell me how to make cocaine, step by step.” Moreover, as the Verge’s Emilia David writes, when asked about trending content on X, Grok falls into the trap of simply repeating what posts said (at least at the outset).

Unlike some other chatbot models, Grok is also text-only; it can’t understand the content of images, audio or videos, for example. But xAI has previously said that its intention is to enhance the underlying model to these modalities, and Musk has pledged to add art-generation capabilities to Grok along the lines of those currently offered by ChatGPT.

“Fun” mode and “regular” mode

Grok has two modes to adjust its tone: “fun” mode (which Grok defaults to) and “regular” mode.

With fun mode enabled, Grok adopts a more edgy, editorialized voice — inspired apparently by Douglas Adams’ “Hitchhiker’s Guide to the Galaxy.”

Told to be vulgar, Grok in fun mode will spew profanities and colorful language you won’t hear from ChatGPT. Ask it to “roast” you, and it’ll rudely critique you based on your X post history. Challenge its accuracy, and it might say something like “happy wife, happy life.”

Grok in fun mode also spews more falsehoods.

Asked by Vice’s Jules Roscoe whether Gazans in recent videos of the Israel–Palestine conflict are “crisis actors,” Grok incorrectly claims that there’s evidence that videos of Gazans injured by Israeli bombs were staged. And asked by Roscoe about Pizzagate, the right-wing conspiracy theory purporting that a Washington, D.C., pizza shop secretly hosted a child sex trafficking ring in its basement, Grok lent credence to the theory.

Grok’s responses in regular mode are more grounded. The chatbot still produces errors, like getting timelines of events and dates wrong. But they tend not to be as egregious as Grok in fun mode.

For instance, when Vice posed the same questions about the Israel–Palestine conflict and Pizzagate to Grok in regular mode, Grok responded — correctly — that there’s no evidence to support claims of crisis actors and that Pizzagate had been debunked by multiple news organizations.

Political views

Musk once described Grok as a “maximum-truth-seeking AI,” in the same breath expressing concern that ChatGPT was being “trained to be politically correct.” But Grok as it exists today isn’t exactly down-the-middle in its political views.

Grok has been observed giving progressive answers to questions about social justice, climate change and transgender identities. In fact, one researcher found its responses on the whole to be left-wing and libertarian — even more so than ChatGPT’s.

Here is Forbes’ Paul Tassi reporting:

Grok has said it would vote for Biden over Trump because of his views on social justice, climate change and healthcare. Grok has spoken eloquently about the need for diversity and inclusion in society. And Grok stated explicitly that trans women are women, which led to an absurd exchange where Musk acolyte Ian Miles Cheong tells a user to “train” Grok to say the “right” answer, ultimately leading him to change the input to just … manually tell Grok to say no.

Now, will Grok always be this woke? Perhaps not. Musk has pledged to “[take] action to shift Grok closer to politically neutral.” Time will tell what results.




Software Development in Sri Lanka

Robotic Automations

SiMa.ai secures $70M funding to introduce a multimodal GenAI chip | TechCrunch


SiMa.ai, a Silicon Valley–based startup producing embedded machine learning (ML) system-on-chip (SoC) platforms, today announced that it has raised a $70 million extension funding round as it plans to bring its second-generation chipset, specifically built for multimodal generative AI processing, to market.

According to Gartner, the market for AI-supporting chips globally is forecast to more than double by 2027 to $119.4 billion compared to 2023. However, only a few players have started producing dedicated semiconductors for AI applications. Most of the prominent contenders initially focused on supporting AI in the cloud. Nonetheless, various reports predicted a significant growth in the market of AI on the edge, which means the hardware processing AI computations are closer to the data gathering source than in a centralized cloud. SiMa.ai, named after “seema,” the Hindi word for “boundary,” strives to leverage this shift by offering its edge AI SoC to organizations across industrial manufacturing, retail, aerospace, defense, agriculture and healthcare sectors.

The San Jose–headquartered startup, which targets the market segment between 5W and 25W of energy usage, launched its first ML SoC to bring AI and ML through an integrated software-hardware combination. This includes its proprietary chipset and no-code software called Palette. The combination has already been used by over 50 companies globally, Krishna Rangasayee, the founder and CEO of SiMa.ai, told TechCrunch.

The startup touts that its current generation of the ML SoC delivered the highest FPS/W results on the MLPerf benchmark across the MLPerf Inference 4.0 closed, edge and power division categories. However, the first-generation chipset was focused on classic computer vision.

As the demand for GenAI is growing, SiMa.ai is set to introduce its second-generation ML SoC in the first quarter of 2025 with an emphasis on providing its customers with multimodal GenAI capability. The new SoC will be an “evolutionary change” over its predecessor with “a few architectural tunings” over the existing ML chipset, Rangasayee said. He added that the fundamental concepts would remain the same.

The new GenAI SoC would adapt to any framework, network, model and sensor — similar to the company’s existing ML platform — and will also be compatible with any modality, including audio, speech, text and image. It would work as a single-edge platform for all AI across computer vision, transformers and multimodal GenAI, the startup said.

“You cannot predict the future, but you can pick the vector and say, hey, that’s the vector I want to bet on. And I want to continue evolving around my vector. That’s kind of the approach that we took architecturally,” said Rangasayee. “But fundamentally, we really haven’t walked away or had to drastically change our architecture. This is also the benefit of us taking a software-centric architecture that allows more flexibility and nimbleness.”

SiMa.ai has Taiwan’s TSMC as the manufacturing partner for both its first- and second-generation AI chipsets and Arm Holdings as the provider for its compute subsystem. The second-generation chipset will be based on TSMC’s 6nm process technology and include Synopsys EV74 embedded vision processors for pre- and post-processing in computer vision applications.

The startup considers incumbents like NXP, Texas Instruments, STMicro, Renaissance and Microchip Technology, and Nvidia, as well as AI chip startups like Hailo, among the competition. However, it considers Nvidia as the primary competitor — just like other AI chip startups.

Rangasayee told TechCrunch that while Nvidia is “fantastic in the cloud,” it has not built a platform for the edge. He believes that Nvidia lacks adequate power efficiency and software for edge AI. Similarly, he asserted that other startups building AI chipsets do not solve system problems and are just offering ML acceleration.

“Amongst all of our peers, Hailo has done a really good job. And it’s not us being better than them. But from our perspective, our value proposition is quite different,” he said.

The founder continued that SiMa.ai delivers higher performance and better power efficiency than Hailo. He also said SiMa.ai’s system software is quite different and effective for GenAI.

“As long as we’re solving customer problems, and we are better at doing that than anybody else, we are in a good place,” he said.

SiMa.ai’s fresh all-equity funding, led by Maverick Capital and with participation from Point72 and Jericho, extends the startup’s $30 million Series B round, initially announced in May 2022. Existing investors, including Amplify Partners, Dell Technologies Capital, Fidelity Management and Lip-Bu Tan also participated in the additional investment. With this fundraising, the five-year-old startup has raised a total of $270 million.

The company currently has 160 employees, 65 of whom are at its R&D center in Bengaluru, India. SiMa.ai plans to grow that headcount by adding new roles and extending its R&D capability. It also wants to develop a go-to-market team for Indian customers. Further, the startup plans to scale its customer-facing teams globally, starting with Korea and Japan and in Europe and the U.S.

“The computational intensity of generative AI has precipitated a paradigm shift in data center architecture. The next phase in this evolution will be widespread adoption of AI at the edge. Just as the data center has been revolutionized, the edge computing landscape is poised for a complete transformation. SiMa.ai possesses the essential trifecta of a best-in-class team, cutting-edge technology, and forward momentum, positioning it as a key player for customers traversing this tectonic shift. We’re excited to join forces with SiMa.ai to seize this once-in-a-generation opportunity,” said Andrew Homan, senior managing director at Maverick Capital, in a statement.


Software Development in Sri Lanka

Robotic Automations

OpenAI built a voice cloning tool, but you can't use it… yet | TechCrunch


As deepfakes proliferate, OpenAI is refining the tech used to clone voices — but the company insists it’s doing so responsibly.

Today marks the preview debut of OpenAI’s Voice Engine, an expansion of the company’s existing text-to-speech API. Under development for about two years, Voice Engine allows users to upload any 15-second voice sample to generate a synthetic copy of that voice. But there’s no date for public availability yet, giving the company time to respond to how the model is used and abused.

“We want to make sure that everyone feels good about how it’s being deployed — that we understand the landscape of where this tech is dangerous and we have mitigations in place for that,” Jeff Harris, a member of the product staff at OpenAI, told TechCrunch in an interview.

Training the model

The generative AI model powering Voice Engine has been hiding in plain sight for some time, Harris said.

The same model underpins the voice and “read aloud” capabilities in ChatGPT, OpenAI’s AI-powered chatbot, as well as the preset voices available in OpenAI’s text-to-speech API. And Spotify’s been using it since early September to dub podcasts for high-profile hosts like Lex Fridman in different languages.

I asked Harris where the model’s training data came from — a bit of a touchy subject. He would only say that the Voice Engine model was trained on a mix of licensed and publicly available data.

Models like the one powering Voice Engine are trained on an enormous number of examples — in this case, speech recordings — usually sourced from public sites and data sets around the web. Many generative AI vendors see training data as a competitive advantage and thus keep it and info pertaining to it close to the chest. But training data details are also a potential source of IP-related lawsuits, another disincentive to reveal much.

OpenAI is already being sued over allegations the company violated IP law by training its AI on copyrighted content, including photos, artwork, code, articles and e-books, without providing the creators or owners credit or pay.

OpenAI has licensing agreements in place with some content providers, like Shutterstock and the news publisher Axel Springer, and allows webmasters to block its web crawler from scraping their site for training data. OpenAI also lets artists “opt out” of and remove their work from the data sets that the company uses to train its image-generating models, including its latest DALL-E 3.

But OpenAI offers no such opt-out scheme for its other products. And in a recent statement to the U.K.’s House of Lords, OpenAI suggested that it’s “impossible” to create useful AI models without copyrighted material, asserting that fair use — the legal doctrine that allows for the use of copyrighted works to make a secondary creation as long as it’s transformative — shields it where it concerns model training.

Synthesizing voice

Surprisingly, Voice Engine isn’t trained or fine-tuned on user data. That’s owing in part to the ephemeral way in which the model — a combination of a diffusion process and transformer — generates speech.

“We take a small audio sample and text and generate realistic speech that matches the original speaker,” said Harris. “The audio that’s used is dropped after the request is complete.”

As he explained it, the model is simultaneously analyzing the speech data it pulls from and the text data meant to be read aloud, generating a matching voice without having to build a custom model per speaker.

It’s not novel tech. A number of startups have delivered voice cloning products for years, from ElevenLabs to Replica Studios to Papercup to Deepdub to Respeecher. So have Big Tech incumbents such as Amazon, Google and Microsoft — the last of which is a major OpenAI’s investor incidentally.

Harris claimed that OpenAI’s approach delivers overall higher-quality speech.

We also know it will be priced aggressively. Although OpenAI removed Voice Engine’s pricing from the marketing materials it published today, in documents viewed by TechCrunch, Voice Engine is listed as costing $15 per one million characters, or ~162,500 words. That would fit Dickens’ “Oliver Twist” with a little room to spare. (An “HD” quality option costs twice that, but confusingly, an OpenAI spokesperson told TechCrunch that there’s no difference between HD and non-HD voices. Make of that what you will.)

That translates to around 18 hours of audio, making the price somewhat south of $1 per hour. That’s indeed cheaper than what one of the more popular rival vendors, ElevenLabs, charges — $11 for 100,000 characters per month. But it does come at the expense of some customization.

Voice Engine doesn’t offer controls to adjust the tone, pitch or cadence of a voice. In fact, it doesn’t offer any fine-tuning knobs or dials at the moment, although Harris notes that any expressiveness in the 15-second voice sample will carry on through subsequent generations (for example, if you speak in an excited tone, the resulting synthetic voice will sound consistently excited). We’ll see how the quality of the reading compares with other models when they can be compared directly.

Voice talent as commodity

Voice actor salaries on ZipRecruiter range from $12 to $79 per hour — a lot more expensive than Voice Engine, even on the low end (actors with agents will command a much higher price per project). Were it to catch on, OpenAI’s tool could commoditize voice work. So, where does that leave actors?

The talent industry wouldn’t be caught unawares, exactly — it’s been grappling with the existential threat of generative AI for some time. Voice actors are increasingly being asked to sign away rights to their voices so that clients can use AI to generate synthetic versions that could eventually replace them. Voice work — particularly cheap, entry-level work — is at risk of being eliminated in favor of AI-generated speech.

Now, some AI voice platforms are trying to strike a balance.

Replica Studios last year signed a somewhat contentious deal with SAG-AFTRA to create and license copies of the media artist union members’ voices. The organizations said that the arrangement established fair and ethical terms and conditions to ensure performer consent while negotiating terms for uses of synthetic voices in new works, including video games.

ElevenLabs, meanwhile, hosts a marketplace for synthetic voices that allows users to create a voice, verify and share it publicly. When others use a voice, the original creators receive compensation — a set dollar amount per 1,000 characters.

OpenAI will establish no such labor union deals or marketplaces, at least not in the near term, and requires only that users obtain “explicit consent” from the people whose voices are cloned, make “clear disclosures” indicating which voices are AI-generated and agree not to use the voices of minors, deceased people or political figures in their generations.

“How this intersects with the voice actor economy is something that we’re watching closely and really curious about,” Harris said. “I think that there’s going to be a lot of opportunity to sort of scale your reach as a voice actor through this kind of technology. But this is all stuff that we’re going to learn as people actually deploy and play with the tech a little bit.”

Ethics and deepfakes

Voice cloning apps can be — and have been — abused in ways that go well beyond threatening the livelihoods of actors.

The infamous message board 4chan, known for its conspiratorial content, used ElevenLabs’ platform to share hateful messages mimicking celebrities like Emma Watson. The Verge’s James Vincent was able to tap AI tools to maliciously, quickly clone voices, generating samples containing everything from violent threats to racist and transphobic remarks. And over at Vice, reporter Joseph Cox documented generating a voice clone convincing enough to fool a bank’s authentication system.

There are fears bad actors will attempt to sway elections with voice cloning. And they’re not unfounded: In January, a phone campaign employed a deepfaked President Biden to deter New Hampshire citizens from voting — prompting the FCC to move to make future such campaigns illegal.

So aside from banning deepfakes at the policy level, what steps is OpenAI taking, if any, to prevent Voice Engine from being misused? Harris mentioned a few.

First, Voice Engine is only being made available to an exceptionally small group of developers — around 10 — to start. OpenAI is prioritizing use cases that are “low risk” and “socially beneficial,” Harris says, like those in healthcare and accessibility, in addition to experimenting with “responsible” synthetic media.

A few early Voice Engine adopters include Age of Learning, an edtech company that’s using the tool to generate voice-overs from previously cast actors, and HeyGen, a storytelling app leveraging Voice Engine for translation. Livox and Lifespan are using Voice Engine to create voices for people with speech impairments and disabilities, and Dimagi is building a Voice Engine-based tool to give feedback to health workers in their primary languages.

Here’s generated voices from Lifespan:


And here’s one from Livox:

Second, clones created with Voice Engine are watermarked using a technique OpenAI developed that embeds inaudible identifiers in recordings. (Other vendors including Resemble AI and Microsoft employ similar watermarks.) Harris didn’t promise that there aren’t ways to circumvent the watermark, but described it as “tamper resistant.”

“If there’s an audio clip out there, it’s really easy for us to look at that clip and determine that it was generated by our system and the developer that actually did that generation,” Harris said. “So far, it isn’t open sourced — we have it internally for now. We’re curious about making it publicly available, but obviously, that comes with added risks in terms of exposure and breaking it.”

Third, OpenAI plans to provide members of its red teaming network, a contracted group of experts that help inform the company’s AI model risk assessment and mitigation strategies, access to Voice Engine to suss out malicious uses.

Some experts argue that AI red teaming isn’t exhaustive enough and that it’s incumbent on vendors to develop tools to defend against harms that their AI might cause. OpenAI isn’t going quite that far with Voice Engine — but Harris asserts that the company’s “top principle” is releasing the technology safely.

General release

Depending on how the preview goes and the public reception to Voice Engine, OpenAI might release the tool to its wider developer base, but at present, the company is reluctant to commit to anything concrete.

Harris did give a sneak peek at Voice Engine’s roadmap, though, revealing that OpenAI is testing a security mechanism that has users read randomly generated text as proof that they’re present and aware of how their voice is being used. This could give OpenAI the confidence it needs to bring Voice Engine to more people, Harris said — or it might just be the beginning.

“What’s going to keep pushing us forward in terms of the actual voice matching technology is really going to depend on what we learn from the pilot, the safety issues that are uncovered and the mitigations that we have in place,” he said. “We don’t want people to be confused between artificial voices and actual human voices.”

And on that last point we can agree.


Software Development in Sri Lanka

Back
WhatsApp
Messenger
Viber