From Digital Age to Nano Age. WorldWide.

Tag: voice

Robotic Automations

Buymeacoffee's founder has built an AI-powered voice note app | TechCrunch


AI-powered tools like OpenAI’s Whisper have enabled many apps to make transcription an integral part of their feature set for personal note-taking, and the space has quickly flourished as a result. Apps like AudioPen, Cleft Notes, and TalkNotes have proliferated across app stores and the Internet, but most offer a pretty limited feature set: They […]

© 2024 TechCrunch. All rights reserved. For personal use only.


Software Development in Sri Lanka

Robotic Automations

Gen Z is losing its political voice on social media | TechCrunch


President Joe Biden signed the bill this week that could ban TikTok from the U.S. if its parent company ByteDance doesn’t sell the platform. According to young political content creators, the ban could decimate Gen Z’s access to political news and information.

“An unfortunately large amount of 18- to 24-year-olds find out information about local elections from TikTok, so my heart is breaking,” Emma Mont, a political content creator, told TechCrunch. According to the Pew Research Center, about a third of American adults between ages 18 and 29 regularly get their news from TikTok.

“I think it’s going to have an impact not only on the people who provide information, but also the people who receive that information,” Mont said. “Part of the reason I make the content I do is that I know there’s someone who’s watching and this is the first time they’re ever gonna learn about Roe v. Wade, or whatever I’m talking about.”

For most content creators, the transition away from TikTok is difficult, but not insurmountable — many full-time creators already cultivate multi-platform followings, rather than depending on one platform, in preparation for this exact kind of worst-case scenario (remember Vine?).

Instagram Reels is a clear alternative to TikTok, but for political creators, it’s not a real option. As of March, Instagram is filtering out political content from users that you don’t already follow. That means that it’s basically impossible for political creators and activists to reach a wider audience.

“I think it’s ridiculous,” said Pratika Katiyar, a Northeastern University student and research assistant at Harvard’s Berkman Klein Center for Internet & Society. “There’s no need for Instagram to limit political content. That’s just driving users away from their platforms.”

Even before Instagram’s recent policy update, users alleged that their posts about the war in Gaza were being suppressed. Meta communications director Andy Stone chalked up these complaints to a “bug” that had “nothing to do with the subject matter” of the posts.

“I post a lot on my [Instagram] story about politics and the work I’m doing, and it’s becoming really, really hard,” Katiyar told TechCrunch. “There’s no way to get visibility anymore on Instagram, and now with the limiting of political content, I just fear that’s being compounded.”

These gripes have been so prevalent among creators that Instagram head Adam Mosseri addressed the issue on Threads.

“Before some of you say ‘the algorithm’ is the culprit, understand that ranking and recommendations *increase* the amount of posts people get to,” Mosseri wrote.

Lawmakers are adamant that this bill isn’t a ban. Rather, they say it’s forced divestiture of TikTok from its Chinese parent company. But ByteDance could have a hard time finding an American company that can afford to buy TikTok without raising antitrust concerns. Even if it does find a buyer, the Chinese government has the power to block a forced sale anyway.

All the while, President Biden’s reelection campaign is posting multiple TikToks per day, accumulating over 300,000 followers since creating the account in February.

“I’m even more surprised that Biden signed it into law,” TikTok creator Annie Silkaitis told TechCrunch. “I think it’s going to be such a hot topic this year, his campaign being on the app while he’s actively trying to ban it or force them to sell it. It just feels very hypocritical.”

An obstacle for Biden’s campaign

Biden’s decision to set up shop on TikTok makes sense: It’s a platform where more than 170 million Americans spend their time. This is especially true of younger voters, who are part of a key voting bloc with a historically low turnout. But Biden’s presence on the app, which he’s helping to ban, rubs users the wrong way.

“Being on TikTok is a brilliant campaign move, but I do think it’s a bit of a shot in the foot to take it away,” Mont said. “How do you come to terms with these two true things, that you’re banning TikTok and your campaign has had a lot of traction on TikTok?”

In any case, if TikTok does get banned, it won’t get removed from app stores until solidly after Election Day. Per the version of the bill that Biden signed, ByteDance has nine months to divest TikTok, with a 90-day possible extension. Plus, TikTok is expected to mount a substantial legal challenge against the legislation.

Biden’s stance on TikTok may still impact him in November, though.

“With TikTok being banned, that was one of the biggest news sources for Gen Z. It was a place where people felt like their voices were heard. And now that’s being taken away,” Katiyar said. “I think that’s concerning for how the election is going to turn out. And I do think people will hesitate to vote now… We feel like no one is really listening to our concerns right now.”

Voter turnout in the 18- to 29-year-old bloc is already expected to be lower in 2024 than 2020, a Harvard Youth Poll shows.

Not only does this move hurt Biden’s chance at securing the youth vote, but he’s also failing to capitalize on the power of the internet. Though the Biden campaign has been meeting with creators, the president’s organic reach could be limited if online activists feel complacent about his run.

Online momentum can shape an election. During the 2020 election cycle, for example, teens across the U.S. organized online for Senator Ed Markey (D-MA), dubbing themselves the “Markeyverse.” Most of them weren’t even eligible to vote in the Massachusetts Senate race, whether due to their age or residence, but supported the senator for his stance on curbing climate change. This network of Markey fan accounts helped propel the incumbent to victory over a formidable challenger, Representative Joe Kennedy III.

“Engaging young people online in a way that speaks to them gets them excited about political races that they might otherwise have not had any kind of stake in,” Mont said.

But TikTok users are unlikely to rally behind Biden in any way that’s reminiscent of the Markeyverse.

Some creators are frustrated about their lack of context for the TikTok ban. While the Senate has been party to closed-door briefings about TikTok’s threat to national security, very little information has been made apparent in public hearings. Those hearings have only served to show how little our legislators understand about the internet — last year, Representative Richard Hudson (R-NC) asked TikTok CEO Shou Zi Chew if TikTok accesses Wi-Fi.

“If President Biden went out today and said China is intentionally putting X-Y-Z on your TikTok feed, I’d be like, ‘Okay, thank you for telling me, that’s all I needed.’ But it’s all very like, ‘Oh, we don’t understand the algorithm.’ Well, we don’t understand a lot of algorithms!” Mont said. “My biggest gripe about all of this as a political content creator is like, how much data do Mark Zuckerberg and Elon Musk have access to?”

Creators likely won’t be getting any answers soon. For now, they’re locked in limbo.

“It’s something that I’m gonna probably be talking about every day until anything happens, which likely won’t be for another year or two, which is scary to think,” said Silkaitis. “How drawn out is this going to be?”




Software Development in Sri Lanka

Robotic Automations

Google Workspace users will soon get voice prompting in Gmail and tabs in Docs | TechCrunch


Google continues to bring more AI-driven features to its Workspace productivity applications.

At its Cloud Next conference in Las Vegas, the company on Tuesday announced that Google Workspace subscribers will soon be able to use voice prompts to kick off the AI-based “Help me write” feature in Gmail while on the go, for example. In addition, Google is also launching a new feature in Gmail for Workspace that can instantly turn rough email drafts into a more polished email.

Image Credits: Google

These features will come to paying subscribers first. When asked about this in a press conference ahead of Tuesday’s announcements, Google’s Aparna Pappu noted that the company has “a long history of doing really useful, high-utility features with AI for all our users — including smart reply and smart compose. As we figure out how these work and get feedback from our users, we’ll consider expanding it to all our users.”

Workspace, which according to Google has about 3 billion users and over 10 million paying customers, was one of the first Google services to lean into the AI boom.

In addition to these new AI features, Google is adding a few other capabilities to the Workspace suite. These include notifications for Sheets, where the service will send out a customizable alert when a certain field changes, for example. In addition, Sheets will now feature a new set of templates to make getting started with a new spreadsheet easier.

And Docs, Google’s browser-based MS Word competitor, is getting support for tabs so “you can organize information in a single document instead of linking to multiple documents or searching through Drive to find what you’re looking for.” That’s a nifty feature and could be quite useful for workflows where you’d otherwise copy and paste a bunch of documents into one long one.  

Docs is also getting full-bleed cover images, and for those really large companies that use Workspace, Chat can now handle up to 500,000 members. Thanks to Google’s partnership with Mio, messaging interoperability with Slack and Teams is now an option, too.


Software Development in Sri Lanka

Robotic Automations

OpenAI built a voice cloning tool, but you can't use it… yet | TechCrunch


As deepfakes proliferate, OpenAI is refining the tech used to clone voices — but the company insists it’s doing so responsibly.

Today marks the preview debut of OpenAI’s Voice Engine, an expansion of the company’s existing text-to-speech API. Under development for about two years, Voice Engine allows users to upload any 15-second voice sample to generate a synthetic copy of that voice. But there’s no date for public availability yet, giving the company time to respond to how the model is used and abused.

“We want to make sure that everyone feels good about how it’s being deployed — that we understand the landscape of where this tech is dangerous and we have mitigations in place for that,” Jeff Harris, a member of the product staff at OpenAI, told TechCrunch in an interview.

Training the model

The generative AI model powering Voice Engine has been hiding in plain sight for some time, Harris said.

The same model underpins the voice and “read aloud” capabilities in ChatGPT, OpenAI’s AI-powered chatbot, as well as the preset voices available in OpenAI’s text-to-speech API. And Spotify’s been using it since early September to dub podcasts for high-profile hosts like Lex Fridman in different languages.

I asked Harris where the model’s training data came from — a bit of a touchy subject. He would only say that the Voice Engine model was trained on a mix of licensed and publicly available data.

Models like the one powering Voice Engine are trained on an enormous number of examples — in this case, speech recordings — usually sourced from public sites and data sets around the web. Many generative AI vendors see training data as a competitive advantage and thus keep it and info pertaining to it close to the chest. But training data details are also a potential source of IP-related lawsuits, another disincentive to reveal much.

OpenAI is already being sued over allegations the company violated IP law by training its AI on copyrighted content, including photos, artwork, code, articles and e-books, without providing the creators or owners credit or pay.

OpenAI has licensing agreements in place with some content providers, like Shutterstock and the news publisher Axel Springer, and allows webmasters to block its web crawler from scraping their site for training data. OpenAI also lets artists “opt out” of and remove their work from the data sets that the company uses to train its image-generating models, including its latest DALL-E 3.

But OpenAI offers no such opt-out scheme for its other products. And in a recent statement to the U.K.’s House of Lords, OpenAI suggested that it’s “impossible” to create useful AI models without copyrighted material, asserting that fair use — the legal doctrine that allows for the use of copyrighted works to make a secondary creation as long as it’s transformative — shields it where it concerns model training.

Synthesizing voice

Surprisingly, Voice Engine isn’t trained or fine-tuned on user data. That’s owing in part to the ephemeral way in which the model — a combination of a diffusion process and transformer — generates speech.

“We take a small audio sample and text and generate realistic speech that matches the original speaker,” said Harris. “The audio that’s used is dropped after the request is complete.”

As he explained it, the model is simultaneously analyzing the speech data it pulls from and the text data meant to be read aloud, generating a matching voice without having to build a custom model per speaker.

It’s not novel tech. A number of startups have delivered voice cloning products for years, from ElevenLabs to Replica Studios to Papercup to Deepdub to Respeecher. So have Big Tech incumbents such as Amazon, Google and Microsoft — the last of which is a major OpenAI’s investor incidentally.

Harris claimed that OpenAI’s approach delivers overall higher-quality speech.

We also know it will be priced aggressively. Although OpenAI removed Voice Engine’s pricing from the marketing materials it published today, in documents viewed by TechCrunch, Voice Engine is listed as costing $15 per one million characters, or ~162,500 words. That would fit Dickens’ “Oliver Twist” with a little room to spare. (An “HD” quality option costs twice that, but confusingly, an OpenAI spokesperson told TechCrunch that there’s no difference between HD and non-HD voices. Make of that what you will.)

That translates to around 18 hours of audio, making the price somewhat south of $1 per hour. That’s indeed cheaper than what one of the more popular rival vendors, ElevenLabs, charges — $11 for 100,000 characters per month. But it does come at the expense of some customization.

Voice Engine doesn’t offer controls to adjust the tone, pitch or cadence of a voice. In fact, it doesn’t offer any fine-tuning knobs or dials at the moment, although Harris notes that any expressiveness in the 15-second voice sample will carry on through subsequent generations (for example, if you speak in an excited tone, the resulting synthetic voice will sound consistently excited). We’ll see how the quality of the reading compares with other models when they can be compared directly.

Voice talent as commodity

Voice actor salaries on ZipRecruiter range from $12 to $79 per hour — a lot more expensive than Voice Engine, even on the low end (actors with agents will command a much higher price per project). Were it to catch on, OpenAI’s tool could commoditize voice work. So, where does that leave actors?

The talent industry wouldn’t be caught unawares, exactly — it’s been grappling with the existential threat of generative AI for some time. Voice actors are increasingly being asked to sign away rights to their voices so that clients can use AI to generate synthetic versions that could eventually replace them. Voice work — particularly cheap, entry-level work — is at risk of being eliminated in favor of AI-generated speech.

Now, some AI voice platforms are trying to strike a balance.

Replica Studios last year signed a somewhat contentious deal with SAG-AFTRA to create and license copies of the media artist union members’ voices. The organizations said that the arrangement established fair and ethical terms and conditions to ensure performer consent while negotiating terms for uses of synthetic voices in new works, including video games.

ElevenLabs, meanwhile, hosts a marketplace for synthetic voices that allows users to create a voice, verify and share it publicly. When others use a voice, the original creators receive compensation — a set dollar amount per 1,000 characters.

OpenAI will establish no such labor union deals or marketplaces, at least not in the near term, and requires only that users obtain “explicit consent” from the people whose voices are cloned, make “clear disclosures” indicating which voices are AI-generated and agree not to use the voices of minors, deceased people or political figures in their generations.

“How this intersects with the voice actor economy is something that we’re watching closely and really curious about,” Harris said. “I think that there’s going to be a lot of opportunity to sort of scale your reach as a voice actor through this kind of technology. But this is all stuff that we’re going to learn as people actually deploy and play with the tech a little bit.”

Ethics and deepfakes

Voice cloning apps can be — and have been — abused in ways that go well beyond threatening the livelihoods of actors.

The infamous message board 4chan, known for its conspiratorial content, used ElevenLabs’ platform to share hateful messages mimicking celebrities like Emma Watson. The Verge’s James Vincent was able to tap AI tools to maliciously, quickly clone voices, generating samples containing everything from violent threats to racist and transphobic remarks. And over at Vice, reporter Joseph Cox documented generating a voice clone convincing enough to fool a bank’s authentication system.

There are fears bad actors will attempt to sway elections with voice cloning. And they’re not unfounded: In January, a phone campaign employed a deepfaked President Biden to deter New Hampshire citizens from voting — prompting the FCC to move to make future such campaigns illegal.

So aside from banning deepfakes at the policy level, what steps is OpenAI taking, if any, to prevent Voice Engine from being misused? Harris mentioned a few.

First, Voice Engine is only being made available to an exceptionally small group of developers — around 10 — to start. OpenAI is prioritizing use cases that are “low risk” and “socially beneficial,” Harris says, like those in healthcare and accessibility, in addition to experimenting with “responsible” synthetic media.

A few early Voice Engine adopters include Age of Learning, an edtech company that’s using the tool to generate voice-overs from previously cast actors, and HeyGen, a storytelling app leveraging Voice Engine for translation. Livox and Lifespan are using Voice Engine to create voices for people with speech impairments and disabilities, and Dimagi is building a Voice Engine-based tool to give feedback to health workers in their primary languages.

Here’s generated voices from Lifespan:


And here’s one from Livox:

Second, clones created with Voice Engine are watermarked using a technique OpenAI developed that embeds inaudible identifiers in recordings. (Other vendors including Resemble AI and Microsoft employ similar watermarks.) Harris didn’t promise that there aren’t ways to circumvent the watermark, but described it as “tamper resistant.”

“If there’s an audio clip out there, it’s really easy for us to look at that clip and determine that it was generated by our system and the developer that actually did that generation,” Harris said. “So far, it isn’t open sourced — we have it internally for now. We’re curious about making it publicly available, but obviously, that comes with added risks in terms of exposure and breaking it.”

Third, OpenAI plans to provide members of its red teaming network, a contracted group of experts that help inform the company’s AI model risk assessment and mitigation strategies, access to Voice Engine to suss out malicious uses.

Some experts argue that AI red teaming isn’t exhaustive enough and that it’s incumbent on vendors to develop tools to defend against harms that their AI might cause. OpenAI isn’t going quite that far with Voice Engine — but Harris asserts that the company’s “top principle” is releasing the technology safely.

General release

Depending on how the preview goes and the public reception to Voice Engine, OpenAI might release the tool to its wider developer base, but at present, the company is reluctant to commit to anything concrete.

Harris did give a sneak peek at Voice Engine’s roadmap, though, revealing that OpenAI is testing a security mechanism that has users read randomly generated text as proof that they’re present and aware of how their voice is being used. This could give OpenAI the confidence it needs to bring Voice Engine to more people, Harris said — or it might just be the beginning.

“What’s going to keep pushing us forward in terms of the actual voice matching technology is really going to depend on what we learn from the pilot, the safety issues that are uncovered and the mitigations that we have in place,” he said. “We don’t want people to be confused between artificial voices and actual human voices.”

And on that last point we can agree.


Software Development in Sri Lanka

Robotic Automations

Rabbit partners with ElevenLabs to power voice commands on its device | TechCrunch


Hardware maker Rabbit has tapped a partnership with ElevenLabs to power voice commands on its devices. Rabbit is set to ship the first set of r1 devices next month after getting a ton of attention at the Consumer Electronics Show (CES) at the start of the year.

The Rabbit r1 will ship with ElevenLabs’ tech, which will enable voice commands from the users and how the pocket AI device talks back to them. At launch, the feature will be available only in English with one voice option. ElevenLabs said that while r1 was poised for voice interaction from the start, the company’s low latency models will make interactions more human-like.

“We’re working with rabbit to bring the future of human-device interaction closer. Our collaboration is about making the r1 a truly dynamic co-pilot, ” ElevenLabs’ CEO Mati Staniszewski said in a prepared statement.

In January, Rabbit said that it will use Perplexity AI’s solutions to answer users’ questions on the device.

Earlier this week, Rabbit said that its first batch of $199 r1s will leave the factory by March 31, and will reach users within a few weeks. The company said users will be able to interact with chatbots, get answers from Perplexity, use bi-directional translation, order rides and foods, and play music through the device right out of the box.

The company’s CEO Jesse Lyu said earlier this month at a StrictlyVC event that rabbit is close to having 100,000 device orders.

Earlier this year, ElevenLabs raised $80 million in Series B from investors like Andreessen Horowitz, former GitHub CEO Nat Friedman and entrepreneur Daniel Gross to get to the unicorn status. The company has been focusing on providing voice cloning services for creating audiobooks and dubbing movies and TV shows, ads and video game characters. Most recently, India’s audio platform PocketFM, which raised $103 million from Lightspeed, said that it is using ElevenLabs’ services to let creators convert their writings into audio series.

But ElevenLabs has faced its fair share of criticism, with users trying to fool a bank’s authentication system, 4chan users mimicking celebrities and journalists documenting that it is easy to set up voice clones to generate problematic content. The startup has rolled out a tool to detect speech created by its platform and is also working on a tool to detect synthesized audio and distribute it to third parties.


Software Development in Sri Lanka

Back
WhatsApp
Messenger
Viber