Crypto? AI? Internet co-creator Robert Kahn already did it… decades ago | TechCrunch

Robert Kahn has been a consistent presence on the Internet since its creation — obviously, since he was its co-creator. But like many tech pioneers his resumé is longer than that and in fact his work prefigured such ostensibly modern ideas as AI agents and blockchain. TechCrunch chatted with Kahn about how, really, nothing has changed since the ’70s.

The interview was conducted on the occasion of Kahn (who goes by Bob in conversation) being awarded the IEEE Medal of Honor this week — you can watch the ceremony and speeches here.

Sound familiar? Last year the IEEE gave that that Vint Cerf, Kahn’s partner in creating the protocols underpinning the internet and web. They’ve taken different paths but share a tempered optimism about the world of technology, and a sense that everything old is new again.

This interview has been edited for length and clarity.

A lot of some of the problems, technical and otherwise, that we’re facing now in computing and the internet, they’re problems that we’ve seen and maybe even solved before. I’m curious whether you find anything particularly familiar about the challenges that we’re facing today.

Kahn: Well, I don’t think anything really surprises me. I mean, I was concerned right from the get-go that the internet had the potential to be misused. But in the early days it was a very willing set of collaborators from the research community who all principally knew each other, or at least knew of each other. And so there wasn’t much that went wrong. If you have only 100 people that don’t know each other, maybe that’s workable, but if you’ve got a billion people, you know, you get a little bit of everything in society.

[CERN leadership] actually approached me with the possibility of setting up a consortium, which they later set up at MIT… and I had too many questions, most probably off-putting, like what about misinformation or disinformation? How are you going to control what goes on this? I thought there were approaches; in fact, we were working on some. And so, in some ways, I’m not terribly surprised — I am disappointed that approaches that could have made a difference were not adopted.

I was reading about your “knowbots” — this is a very similar thing to an AI agent, that is empowered to go and interact in a less structured way than an API call or a simple crawl.

The whole idea was launched in the form of a mobile program [i.e. the program is mobile, not for mobiles]; we called them know bots, which was short for knowledge robots. You told it what you wanted to do and launched it — you know, make airplane reservations, check your email, look at the news, let you know about things that might affect you, just freed you up; it would be doing your bidding on the internet.

We essentially made it available at the time, it couldn’t have been more unfortunate, just about when the very first cybersecurity threat was occurring: the Morris worm, back in the late 80s. It was done by accident by some guy, but you know, people looked and said, Hey, when you’re gonna have these bad things happen, we don’t want other people’s programs showing up on our machines. As a formality, we just kind of put it on the back burner.

But out of that came something that was I think, very useful. We called it the digital object architecture. You probably follow some of the work on cryptocurrency. Well, cryptocurrency is like taking $1 Bill and getting rid of the paper, right, then being able to work with the value of money on the net. The digital object architecture was like taking the mobile programs and getting rid of the mobility. The same information is there, except you get to it in different ways.

Robert Kahn accepting the IEEE Medal of Honor.

It’s interesting that you bring up the the digital object architecture and crypto in the same sort of sentence. We have the DOI system, I see it primarily in scientific literature, of course, it’s tremendously useful there. But as a general system, I saw a lot of similarities with the idea of the cryptographically signed ledgers and sort of canonical locations for digital objects.

You know, it’s a shame that people think that these digital objects have to be only be copyrighted material. I wrote a paper called representing values in digital objects… I think we called them digital entities, just for technical reasons. I believe it was the first paper that actually talked about the equivalent of cryptocurrency.

But we’ve been talking about linking blocks for the last… going back to the space age, when you wanted to communicate with the distant parts of the out of space, you didn’t want to have to come back and wait for minutes or hours through transmission delays back to Earth to get something corrected. You want to have blocks that are in transit linked together. So you know, when the next block that might arrive the millisecond later, you can figure out what went wrong with the block before it was released. And that’s what blockchains are about.

In the digital object architecture, we’re talking about digital objects being able to communicate with other digital objects. That’s not people sitting at keyboards. You know, you can send a digital object or mobile program into a machine and ask it to interact with another digital object that may be representative of a book, to get inside that book, do work, and interact with that system. Or you know, like an airplane — people think airplanes need to interact with other airplanes for the purposes of collision avoidance and the like, and cars need to talk to cars because they don’t want to bang into each other. But what if cars need to talk with airplanes? Since these objects can be anything you can represent in digital form, you’ve potentially got everything interacting with everything. That’s a different notion of the internet than, you know, a high-speed telecommunications circuit.

Vint Cerf on the ‘exhilarating mix’ of thrill and hazard at the frontiers of tech

Right, it’s about whether objects need to talk with objects, and enabling that as a protocol, whether it’s an airplane in a car. In the so called Internet of Things you have a connected doorbell, connected oven, a connected fridge, but they’re all connected via private APIs to private servers. It’s not about a protocol, it’s just about having a really bad software service living inside your fridge.

I really believe that most of the entities that would have had a natural interest in the internet had hopes that their own approach would be the thing that took over [rather than TCP/IP]. Whether it was Bell Systems or IBM or Xerox, Hewlett Packard, everybody had their own approach. But what happened was they kind of bottomed out. You had to be able to show interoperability; you couldn’t go in and ask for everybody to get rid of all their old stuff and take your stuff. So they couldn’t pick one company’s approach — so they were sort of stuck with the stuff we did at DARPA. That’s an interesting story in its own right, but I don’t think you should write about that (laughs).

If every house you walked into had a different power plug, you have a major problem. But the real issue is you can’t see it until you implement it.

I don’t think you can rely on government to take the lead. I don’t think he can rely on industry to take the lead. Because you might have 5 or 10 different industries that are all competing with each other. They can’t agree on whether there should be a standard until they’ve exhausted all other options. And who’s going to take the lead? It needs to be rethought at the national level. And I think the universities have a role to play here. But they may not necessarily know it yet.

We’re seeing a big reinvestment in the US chip industry. I know that you were closely involved in the late ’70s, early ’80s, with some of the nuts and bolts, and working with people who helped define computing architecture of the period, which has informed, of course, future architectures. I’m curious what you think about the evolution of the hardware industry.

I think the big problem right now, which the the administration has clearly noted, is we don’t have we haven’t maintained a leadership role in manufacturing of semiconductors here. It’s come from Taiwan, South Korea, China. We’re trying to fix that, and I applaud that. But the bigger issue is probably going to be personnel. Who’s going to man those sites? I mean, you build manufacturing capability, but do you need to import the people from Korea and Taiwan? OK, let’s teach it in schools… who knows enough to teach it in schools, are you going to import people to teach in the schools? Workforce development is going to be big part of the problem. But I think we were there before, we can get there again.

Software Development in Sri Lanka

OpenAI built a voice cloning tool, but you can't use it… yet | TechCrunch

April 13, 2024 2:30 AM0

As deepfakes proliferate, OpenAI is refining the tech used to clone voices — but the company insists it’s doing so responsibly.

Today marks the preview debut of OpenAI’s Voice Engine, an expansion of the company’s existing text-to-speech API. Under development for about two years, Voice Engine allows users to upload any 15-second voice sample to generate a synthetic copy of that voice. But there’s no date for public availability yet, giving the company time to respond to how the model is used and abused.

“We want to make sure that everyone feels good about how it’s being deployed — that we understand the landscape of where this tech is dangerous and we have mitigations in place for that,” Jeff Harris, a member of the product staff at OpenAI, told TechCrunch in an interview.

Training the model

The generative AI model powering Voice Engine has been hiding in plain sight for some time, Harris said.

The same model underpins the voice and “read aloud” capabilities in ChatGPT, OpenAI’s AI-powered chatbot, as well as the preset voices available in OpenAI’s text-to-speech API. And Spotify’s been using it since early September to dub podcasts for high-profile hosts like Lex Fridman in different languages.

I asked Harris where the model’s training data came from — a bit of a touchy subject. He would only say that the Voice Engine model was trained on a mix of licensed and publicly available data.

Models like the one powering Voice Engine are trained on an enormous number of examples — in this case, speech recordings — usually sourced from public sites and data sets around the web. Many generative AI vendors see training data as a competitive advantage and thus keep it and info pertaining to it close to the chest. But training data details are also a potential source of IP-related lawsuits, another disincentive to reveal much.

OpenAI is already being sued over allegations the company violated IP law by training its AI on copyrighted content, including photos, artwork, code, articles and e-books, without providing the creators or owners credit or pay.

OpenAI has licensing agreements in place with some content providers, like Shutterstock and the news publisher Axel Springer, and allows webmasters to block its web crawler from scraping their site for training data. OpenAI also lets artists “opt out” of and remove their work from the data sets that the company uses to train its image-generating models, including its latest DALL-E 3.

But OpenAI offers no such opt-out scheme for its other products. And in a recent statement to the U.K.’s House of Lords, OpenAI suggested that it’s “impossible” to create useful AI models without copyrighted material, asserting that fair use — the legal doctrine that allows for the use of copyrighted works to make a secondary creation as long as it’s transformative — shields it where it concerns model training.

Synthesizing voice

Surprisingly, Voice Engine isn’t trained or fine-tuned on user data. That’s owing in part to the ephemeral way in which the model — a combination of a diffusion process and transformer — generates speech.

“We take a small audio sample and text and generate realistic speech that matches the original speaker,” said Harris. “The audio that’s used is dropped after the request is complete.”

As he explained it, the model is simultaneously analyzing the speech data it pulls from and the text data meant to be read aloud, generating a matching voice without having to build a custom model per speaker.

It’s not novel tech. A number of startups have delivered voice cloning products for years, from ElevenLabs to Replica Studios to Papercup to Deepdub to Respeecher. So have Big Tech incumbents such as Amazon, Google and Microsoft — the last of which is a major OpenAI’s investor incidentally.

Harris claimed that OpenAI’s approach delivers overall higher-quality speech.

We also know it will be priced aggressively. Although OpenAI removed Voice Engine’s pricing from the marketing materials it published today, in documents viewed by TechCrunch, Voice Engine is listed as costing $15 per one million characters, or ~162,500 words. That would fit Dickens’ “Oliver Twist” with a little room to spare. (An “HD” quality option costs twice that, but confusingly, an OpenAI spokesperson told TechCrunch that there’s no difference between HD and non-HD voices. Make of that what you will.)

That translates to around 18 hours of audio, making the price somewhat south of $1 per hour. That’s indeed cheaper than what one of the more popular rival vendors, ElevenLabs, charges — $11 for 100,000 characters per month. But it does come at the expense of some customization.

Voice Engine doesn’t offer controls to adjust the tone, pitch or cadence of a voice. In fact, it doesn’t offer any fine-tuning knobs or dials at the moment, although Harris notes that any expressiveness in the 15-second voice sample will carry on through subsequent generations (for example, if you speak in an excited tone, the resulting synthetic voice will sound consistently excited). We’ll see how the quality of the reading compares with other models when they can be compared directly.

Voice talent as commodity

Voice actor salaries on ZipRecruiter range from $12 to $79 per hour — a lot more expensive than Voice Engine, even on the low end (actors with agents will command a much higher price per project). Were it to catch on, OpenAI’s tool could commoditize voice work. So, where does that leave actors?

The talent industry wouldn’t be caught unawares, exactly — it’s been grappling with the existential threat of generative AI for some time. Voice actors are increasingly being asked to sign away rights to their voices so that clients can use AI to generate synthetic versions that could eventually replace them. Voice work — particularly cheap, entry-level work — is at risk of being eliminated in favor of AI-generated speech.

Now, some AI voice platforms are trying to strike a balance.

Replica Studios last year signed a somewhat contentious deal with SAG-AFTRA to create and license copies of the media artist union members’ voices. The organizations said that the arrangement established fair and ethical terms and conditions to ensure performer consent while negotiating terms for uses of synthetic voices in new works, including video games.

The writers’ strike is over; here’s how AI negotiations shook out

ElevenLabs, meanwhile, hosts a marketplace for synthetic voices that allows users to create a voice, verify and share it publicly. When others use a voice, the original creators receive compensation — a set dollar amount per 1,000 characters.

OpenAI will establish no such labor union deals or marketplaces, at least not in the near term, and requires only that users obtain “explicit consent” from the people whose voices are cloned, make “clear disclosures” indicating which voices are AI-generated and agree not to use the voices of minors, deceased people or political figures in their generations.

“How this intersects with the voice actor economy is something that we’re watching closely and really curious about,” Harris said. “I think that there’s going to be a lot of opportunity to sort of scale your reach as a voice actor through this kind of technology. But this is all stuff that we’re going to learn as people actually deploy and play with the tech a little bit.”

Ethics and deepfakes

Voice cloning apps can be — and have been — abused in ways that go well beyond threatening the livelihoods of actors.

The infamous message board 4chan, known for its conspiratorial content, used ElevenLabs’ platform to share hateful messages mimicking celebrities like Emma Watson. The Verge’s James Vincent was able to tap AI tools to maliciously, quickly clone voices, generating samples containing everything from violent threats to racist and transphobic remarks. And over at Vice, reporter Joseph Cox documented generating a voice clone convincing enough to fool a bank’s authentication system.

There are fears bad actors will attempt to sway elections with voice cloning. And they’re not unfounded: In January, a phone campaign employed a deepfaked President Biden to deter New Hampshire citizens from voting — prompting the FCC to move to make future such campaigns illegal.

FCC officially declares AI-voiced robocalls illegal

So aside from banning deepfakes at the policy level, what steps is OpenAI taking, if any, to prevent Voice Engine from being misused? Harris mentioned a few.

First, Voice Engine is only being made available to an exceptionally small group of developers — around 10 — to start. OpenAI is prioritizing use cases that are “low risk” and “socially beneficial,” Harris says, like those in healthcare and accessibility, in addition to experimenting with “responsible” synthetic media.

A few early Voice Engine adopters include Age of Learning, an edtech company that’s using the tool to generate voice-overs from previously cast actors, and HeyGen, a storytelling app leveraging Voice Engine for translation. Livox and Lifespan are using Voice Engine to create voices for people with speech impairments and disabilities, and Dimagi is building a Voice Engine-based tool to give feedback to health workers in their primary languages.

Here’s generated voices from Lifespan:

And here’s one from Livox:

Second, clones created with Voice Engine are watermarked using a technique OpenAI developed that embeds inaudible identifiers in recordings. (Other vendors including Resemble AI and Microsoft employ similar watermarks.) Harris didn’t promise that there aren’t ways to circumvent the watermark, but described it as “tamper resistant.”

“If there’s an audio clip out there, it’s really easy for us to look at that clip and determine that it was generated by our system and the developer that actually did that generation,” Harris said. “So far, it isn’t open sourced — we have it internally for now. We’re curious about making it publicly available, but obviously, that comes with added risks in terms of exposure and breaking it.”

OpenAI launches a red teaming network to make its models more robust

Third, OpenAI plans to provide members of its red teaming network, a contracted group of experts that help inform the company’s AI model risk assessment and mitigation strategies, access to Voice Engine to suss out malicious uses.

Some experts argue that AI red teaming isn’t exhaustive enough and that it’s incumbent on vendors to develop tools to defend against harms that their AI might cause. OpenAI isn’t going quite that far with Voice Engine — but Harris asserts that the company’s “top principle” is releasing the technology safely.

General release

Depending on how the preview goes and the public reception to Voice Engine, OpenAI might release the tool to its wider developer base, but at present, the company is reluctant to commit to anything concrete.

Harris did give a sneak peek at Voice Engine’s roadmap, though, revealing that OpenAI is testing a security mechanism that has users read randomly generated text as proof that they’re present and aware of how their voice is being used. This could give OpenAI the confidence it needs to bring Voice Engine to more people, Harris said — or it might just be the beginning.

“What’s going to keep pushing us forward in terms of the actual voice matching technology is really going to depend on what we learn from the pilot, the safety issues that are uncovered and the mitigations that we have in place,” he said. “We don’t want people to be confused between artificial voices and actual human voices.”

And on that last point we can agree.

Software Development in Sri Lanka

Tag: it..

Crypto? AI? Internet co-creator Robert Kahn already did it… decades ago | TechCrunch

OpenAI built a voice cloning tool, but you can't use it… yet | TechCrunch

Training the model

Synthesizing voice

Voice talent as commodity

Ethics and deepfakes

General release

Categories

Recent Posts

Our Brochures

Archives

About company

Recent Articles

Office Location

Contact info

DMCA Protected

Tag: it..

Crypto? AI? Internet co-creator Robert Kahn already did it… decades ago | TechCrunch

OpenAI built a voice cloning tool, but you can't use it… yet | TechCrunch

Training the model

Synthesizing voice

Voice talent as commodity

Ethics and deepfakes

General release

Categories

Recent Posts

Do you have any questions?

Tag Cloud

Our Brochures

Archives

About company

Recent Articles

Office Location

Contact info

DMCA Protected