From Digital Age to Nano Age. WorldWide.

Tag: OpenAI

Robotic Automations

Humane’s Ai Pin considers life beyond the smartphone | TechCrunch


Nothing lasts forever. Nowhere is the truism more apt than in consumer tech. This is a land inhabited by the eternally restless — always on the make for the next big thing. The smartphone has, by all accounts, had a good run. Seventeen years after the iPhone made its public debut, the devices continue to reign. Over the last several years, however, the cracks have begun to show.

The market plateaued, as sales slowed and ultimately contracted. Last year was punctuated by stories citing the worst demand in a decade, leaving an entire industry asking the same simple question: what’s next? If there was an easy answer, a lot more people would currently be a whole lot richer.

Smartwatches have had a moment, though these devices are largely regarded as accessories, augmenting the smartphone experience. As for AR/VR, the best you can really currently say is that — after a glacial start — the jury is still very much out on products like the Meta Quest and Apple Vision Pro.

When it began to tease its existence through short, mysterious videos in the summer of 2022, Humane promised a glimpse of the future. The company promised an approach every bit as human-centered as its name implied. It was, at the very least, well-funded, to the tune of $100 million+ (now $230 million), and featured an AI element.

The company’s first product, the Humane Ai Pin, arrives this week. It suggests a world where being plugged in doesn’t require having one’s eyes glued to a screen in every waking moment. It’s largely — but not wholly — hands-free. A tap to the front touch panel wakes up the system. Then it listens — and learns.

Beyond the smartphone

Image Credits: Darrell Etherington/TechCrunch

Humane couldn’t ask for better timing. While the startup has been operating largely in stealth for the past seven years, its market debut comes as the trough of smartphone excitement intersects with the crest of generative AI hype. The company’s bona fides contributed greatly to pre-launch excitement. Founders Bethany Bongiorno and Imran Chaudhri were previously well-placed at Apple. OpenAI’s Sam Altman, meanwhile, was an early and enthusiastic backer.

Excitement around smart assistants like Siri, Alexa and Google Home began to ebb in the last few years, but generative AI platforms like OpenAI’s ChatGPT and Google’s Gemini have flooded that vacuum. The world is enraptured with plugging a few prompts into a text field and watching as the black box spits out a shiny new image, song or video. It’s novel enough to feel like magic, and consumers are eager to see what role it will play in our daily lives.

That’s the Ai Pin’s promise. It’s a portal to ChatGPT and its ilk from the comfort of our lapels, and it does this with a meticulous attention to hardware design befitting its founders’ origins.

Press coverage around the startup has centered on the story of two Apple executives having grown weary of the company’s direction — or lack thereof. Sure, post-Steve Jobs Apple has had successes in the form of the Apple Watch and AirPods, but while Tim Cook is well equipped to create wealth, he’s never been painted as a generational creative genius like his predecessor.

If the world needs the next smartphone, perhaps it also needs the next Apple to deliver it. It’s a concept Humane’s founders are happy to play into. The story of the company’s founding, after all, originates inside the $2.6 trillion behemoth.

Start spreading the news

Image Credits: Alexander Spatari (opens in a new window) / Getty Images

In late March, TechCrunch paid a visit to Humane’s New York office. The feeling was tangibly different than our trip to the company’s San Francisco headquarters in the waning months of 2023. The earlier event buzzed with the manic energy of an Apple Store. It was controlled and curated, beginning with a small presentation from Bongiorno and Chaudhri, and culminating in various stations staffed by Humane employees designed to give a crash course on the product’s feature set and origins.

Things in Manhattan were markedly subdued by comparison. The celebratory buzz that accompanies product launches has dissipated into something more formal, with employees focused on dotting I’s and crossing T’s in the final push before product launch. The intervening months provided plenty of confirmation that the Ai Pin wasn’t the only game in town.

January saw the Rabbit R1’s CES launch. The startup opted for a handheld take on generative AI devices. The following month, Samsung welcomed customers to “the era of Mobile AI.”  The “era of generative AI” would have been more appropriate, as the hardware giant leveraged a Google Gemini partnership aimed at relegating its bygone smart assistant Bixby to a distant memory. Intel similarly laid claim to the “AI PC,” while in March Apple confidently labeled the MacBook Air the “world’s best consumer laptop for AI.”

At the same time, Humane’s news standing stumbled through reports of a small layoff round and small delay in preorder fulfillment. Both can be written off as products of immense difficulties around launching a first-generation hardware product — especially under the intense scrutiny few startups see.

For the second meeting with Bongiorno and Chaudhri, we gathered around a conference table. The first goal was an orientation with the device, ahead of review. I’ve increasingly turned down these sorts of meeting requests post-pandemic, but the Ai Pin represents a novel enough paradigm to justify a sit-down orientation with the device. Humane also sent me home with a 30-minute intro video designed to familiarize users — not the sort of thing most folks require when, say, upgrading a phone.

More interesting to me, however, was the prospect of sitting down with the founders for the sort of wide-ranging interview we weren’t able to do during last year’s San Francisco event. Now that most of the mystery is gone, Chaudhri and Bongiorno were more open about discussing the product — and company — in-depth.

Origin story

Humane co-founders Bethany Bongiorno and Imran Chaudhri.

One Infinite Loop is the only place one can reasonably open the Humane origin story. The startup’s founders met on Bongiorno’s first day at Apple in 2008, not long after the launch of the iPhone App Store. Chaudhri had been at the company for 13 years at that point, having joined at the depths of the company’s mid-90s struggles. Jobs would return to the company two years later, following its acquisition of NeXT.

Chaudhri’s 22 years with the company saw him working as director of Design on both the hardware and software sides of projects like the Mac and iPhone. Bongiorno worked as project manager for iOS, macOS and what would eventually become iPadOS. The pair married in 2016 and left Apple the same year.

“We began our new life,” says Bongiorno, “which involves thinking a lot about where the industry was going and what we were passionate about.” The pair started consulting work. However, Bongiorno describes a seemingly mundane encounter that would change their trajectory soon after.

Image Credits: Humane

“We had gone to this dinner, and there was a family sitting next to us,” she says. “There were three kids and a mom and dad, and they were on their phones the entire time. It really started a conversation about the incredible tool we built, but also some of the side effects.”

Bongiorno adds that she arrived home one day in 2017 to see Chaudhri pulling apart electronics. He had also typed out a one-page descriptive vision for the company that would formally be founded as Humane later the same year.

According to Bongiorno, Humane’s first hardware device never strayed too far from Chaudhri’s early mockups. “The vision is the same as what we were pitching in the early days,” she says. That’s down to Ai Pin’s most head-turning feature, a built-in projector that allows one to use the surface of their hand as a kind of makeshift display. It’s a tacit acknowledgement that, for all of the talk about the future of computing, screens are still the best method for accomplishing certain tasks.

Much of the next two years were spent exploring potential technologies and building early prototypes. In 2018, the company began discussing the concept with advisors and friends, before beginning work in earnest the following year.

Staring at the sun

In July 2022, Humane tweeted, “It’s time for change, not more of the same.” The message, which reads as much like a tagline as a mission statement, was accompanied by a minute-long video. It opens in dramatic fashion on a rendering of an eclipse. A choir sings in a bombastic — almost operatic — fashion, as the camera pans down to a crowd. As the moon obscures the sunlight, their faces are illuminated by their phone screens. The message is not subtle.

The crowd opens to reveal a young woman in a tank top. Her head lifts up. She is now staring directly into the eclipse (not advised). There are lyrics now, “If I had everything, I could change anything,” as she pushes forward to the source of the light. She holds her hand to the sky. A green light illuminates her palm in the shape of the eclipse. This last bit is, we’ll soon discover, a reference to the Ai Pin’s projector. The marketing team behind the video is keenly aware that, while it’s something of a secondary feature, it’s the most likely to grab public attention.

As a symbol, the eclipse has become deeply ingrained in the company’s identity. The green eclipse on the woman’s hand is also Humane’s logo. It’s built into the Ai Pin’s design language, as well. A metal version serves as the connection point between the pin and its battery packs.

Image Credits: Brian Heater

The company is so invested in the motif that it held an event on October 14, 2023, to coincide with a solar eclipse. The device comes in three colors: Eclipse, Equinox and Lunar, and it’s almost certainly no coincidence that this current big news push is happening a mere days after another North American solar eclipse.

However, it was on the runway of a Paris fashion show in September that the Ai Pin truly broke cover. The world got its first good look at the product as it was magnetically secured to the lapels of models’ suit jackets. It was a statement, to be sure. Though its founders had left Apple a half-dozen years prior, they were still very much invested in industrial design, creating a product designed to be a fashion accessory (your mileage will vary).

The design had evolved somewhat since conception. For one thing, the top of the device, which houses the sensors and projector, is now angled downward, so the Pin’s vantage point is roughly the same as its wearer. An earlier version with a flatter service would unintentionally angle the pin upward when worn on certain chest types. Nailing down a more universal design required a lot of trial and error with a lot of different people in different shapes and sizes.

“There’s an aspect of this particular hardware design that has to be compassionate to who’s using it,” says Chaudhri. “It’s very different when you have a handheld aspect. It feels more like an instrument or a tool […] But when you start to have a more embodied experience, the design of the device has to be really understanding of who’s wearing it. That’s where the compassion comes from.”

Year of the Rabbit?

Image Credits: rabbit

Then came competition. When it was unveiled at CES on January 9, the Rabbit R1 stole the show.

“The phone is an entertainment device, but if you’re trying to get something done it’s not the highest efficiency machine,” CEO and founder Jesse Lyu noted at the time. “To arrange dinner with a colleague we needed four-five different apps to work together. Large language models are a universal solution for natural language, we want a universal solution for these services — they should just be able to understand you.”

While the R1’s product design is novel in its own right, it’s arguably a more traditional piece of consumer electronics than Ai Pin. It’s handheld and has buttons and a screen. At its heart, however, the functionality is similar. Both are designed to supplement smartphone usage and are built around a core of LLM-trained AI.

The device’s price point also contributed to its initial buzz. At $200, it’s a fraction of the Ai Pin’s $699 starting price. The more familiar form factor also likely comes with a smaller learning curve than Humane’s product.

Asked about the device, Bongiorno makes the case that another competitor only validates the space. “I think it’s exciting that we kind of sparked this new interest in hardware,” she says. “I think it’s awesome. Fellow builders. More of that, please.”

She adds, however, that the excitement wasn’t necessarily there at Humane from the outset. “We talked about it internally at the company. Of course people were nervous. They were like, ‘what does this mean?’ Imran and I got in front of the company and said, ‘guys, if there weren’t people who followed us, that means we’re not doing the right thing. Then something’s wrong.”

Bongiorno further suggests that Rabbit is focused on a different use case, as its product requires focus similar to that of a smartphone — though both Bongiorno and Chaudhri have yet to use the R1.

A day after Rabbit unveiled the product, Humane confirmed that it had laid off 10 employees — amounting to 4% of its workforce. It’s a small fraction of a company with a small headcount, but the timing wasn’t great, a few months ahead of the product’s official launch. The news also found its long-time CTO, Patrick Gates, exiting the C-suite role for an advisory job.

“The honest truth is we’re a company that is constantly going through evolution,” Bongiorno says of the layoffs. “If you think about where we were five years ago, we were in R&D. Now we are a company that’s about to ship to customers, that’s about to have to operate in a different way. Like every growing and evolving company, changes are going to happen. It’s actually really healthy and important to go through that process.”

The following month, the company announced that its pins would now be shipping in mid-April. It was a slight delay from the original March ship date, though Chaudhri offers something of a Bill Clinton-style “it depends on what your definition of ‘is’ is” answer. The company, he suggests, defines “shipping” as leaving the factory, rather than the more industry-standard definition of shipping to customers.

“We said we were shipping in March and we are shipping in March,” he says. The devices leave the factory. The rest is on the U.S. government and how long they take when they hold things in place — tariffs and regulations and other stuff.”

Money moves

Image Credits: Brian Heater

No one invests $230 million in a startup out of the goodness of their heart. Sooner or later, backers will be looking for a return. Integral to Humane’s path to positive cashflow is a subscription service that’s required to use the thing. The $699 price tag comes with 90 days free, then after that, you’re on the hook for $24 a month.

That fee brings talk, text and data from T-Mobile, cloud storage and — most critically — access to the Ai Bus, which is foundational to the device’s operation. Humane describes it thusly, “An entirely new AI software framework, the Ai Bus, brings Ai Pin to life and removes the need to download, manage, or launch apps. Instead, it quickly understands what you need, connecting you to the right AI experience or service instantly.”

Investors, of course, love to hear about subscriptions. Hell, even Apple relies on service revenue for growth as hardware sales have slowed.

Bongiorno alludes to internal projections for revenue, but won’t go into specifics for the timeline. She adds that the company has also discussed an eventual path to IPO even at this early stage in the process.

“If we weren’t, that would not be responsible for any company,” she says. “These are things that we care deeply about. Our vision for Humane from the beginning was that we wanted to build a company where we could build a lot of things. This is our first product, and we have a large roadmap that Imran is really passionate about of where we want to go.”

Chaudhri adds that the company “graduated beyond sketches” for those early products. “We’ve got some early photos of things that we’re thinking about, some concept pieces and some stuff that’s a lot more refined than those sketches when it was a one-man team. We are pretty passionate about the AI space and what it actually means to productize AI.”




Software Development in Sri Lanka

Robotic Automations

Watch: Meta's new Llama 3 models give open-source AI a boost


New AI models from Meta are making waves in technology circles. The two new models, part of the Facebook parent company’s Llama line of artificial intelligence tools, are both open-source, helping them stand apart from competing offerings from OpenAI and other well-known names.

Meta’s new Llama models have differently sized underlying datasets, with the Llama 3 8B model featuring eight billion parameters, and the Llama 3 70B model some seventy billion parameters. The more parameters, the more powerful the model, but not every AI task needs the largest possible dataset.

The company’s new models, which were trained on 24,000 GPU clusters, perform well across benchmarks that Meta put them up against, besting some rivals’ models that were already in the market. What matters for those of us not competing to build and release the most capable, or largest AI models, what we care about is that they are still getting better with time. And work. And a lot of compute.

While Meta takes an open-source approach to AI work, its competitors are often prefer more closed-source work. OpenAI, despite its name and history, offers access to its models, but not their source code. There’s a healthy debate in the world of AI regarding which approach is better, for both speed of development and also safety. After all, some technologists — and some computing doomers, to be clear — are worried that AI tech is developing too fast and could prove dangerous to democracies and more.

For now, Meta is keeping the AI fires alight, offering a new challenge to its peers and rivals to best their latest. Hit play, and let’s talk about it!


Software Development in Sri Lanka

Robotic Automations

ChatGPT is a squeeze away with Nothing’s upgraded earbuds | TechCrunch


Nothing today announced a pair of refreshes to its earbud line. The naming conventions are a touch convoluted here, but the Nothing Ear is an update to the Nothing Ear (2), while the Nothing Ear (a) is more of a spiritual successor to the Nothing Ear Stick.

The most notable bit of today’s news, however, is probably Nothing’s embrace of ChatGPT this time out. As the “AI smartphones” of the world are battling with devices like Humane’s Ai Pin and the Rabbit R1 for mindshare, the London-based mobile company seems to have skipped a step by integrating the tech into their new earbuds.

More specifically, if you have the ChatGPT app installed on a connected Nothing handset, you’ll be able to ask the generative AI program questions with a pinch of the headphones’ stem. Think Siri/Google Assistant/Alexa-style access on a pair of earbuds, only this one taps directly into OpenAI’s wildly popular platform.

“Nothing will also improve the Nothing smartphone user experience in Nothing OS by embedding system-level entry points to ChatGPT, including screenshot sharing and Nothing-styled widgets,” the company notes. The feature will be available exclusively for both of the new earbuds.

Preorders for both buds start today. Nothing says the Ear buds bring improved sound over their predecessors, courtesy of a new driver system. That, meanwhile, has allowed for more space for the battery, with up to 25% more life than the Ear (2). A “smart” active noise-canceling system adapts accordingly to environmental noise and checks for “leakage” between the buds and the ear canal.

The Ear (a) also brings noise-canceling improvements, but what Nothing really seems to be pushing is the bright yellow colorway, which bucks the black and white aesthetic that has defined its devices.

The Ear and Ear (a) are both reasonably priced at $149 and $99, respectively. Shipping starts April 22.


Software Development in Sri Lanka

Robotic Automations

OpenAI plans new Tokyo office, Tesla lays offs thousands | TechCrunch


It’s only Monday morning, but it already feels like Thursday given the sheer amount of news that’s flowing in.

We have two critical headlines for you today:

  1. OpenAI is planning to open an office in Tokyo and launch a new GPT-4 model for the Japanese language. The U.S., EU and China are all racing for leadership in AI, and OpenAI’s foray into Japan could expand the list of leading blocs and nations.
  2. Tesla is cutting more than 10% of its total global workforce. CEO Elon Musk told employees in an internal email that the cuts were aimed at eliminating role duplications, but the company has been seeing its sales start to slow down, and some concern around waning demand for EVs could be playing a part in the decision to slash costs.

There’s lots more going on: The price ranges for Rubrik’s IPO have been leaked; ShareChat has suffered a valuation beheading haircut and global smartphone sales are picking up again. Hit play to catch up on what’s going to be the talk of Tech Twitter this week:

Equity is TechCrunch’s flagship podcast and airs every Monday, Wednesday and Friday. You can subscribe to us on Apple Podcasts, Overcast, Spotify and all the casts.

You also can follow Equity on X and Threads at @EquityPod.

For the full interview transcript, for those who prefer reading over listening, read on, or check out our full archive of episodes over at Simplecast.




Software Development in Sri Lanka

Robotic Automations

OpenAI announces Tokyo office and GPT-4 model optimized for the Japanese language | TechCrunch


OpenAI is expanding to Japan, announcing today a new Tokyo hub and plans for a GPT-4 model optimized specifically for the Japanese language.

The ChatGPT-maker opened its first international office in London last year, followed by its inaugural European Union (EU) office in Dublin a few months later. Tokyo will represent OpenAI’s first office in Asia and fourth globally (including its San Francisco HQ), with CEO Sam Altman highlighting Japan’s “rich history of people and technology coming together to do more” among its reasons for setting up a formal presence in the region.

OpenAI’s global expansion efforts so far have been strategic, insofar as the U.K. is a major hub for AI talent while the EU is currently driving the AI regulatory agenda. Japan, meanwhile, is also positioned prominently in the AI revolution, most recently as the G7 chair and President of the G7’s Hiroshima AI Process, an initiative centered around AI governance and pushing for safe and trustworthy AI.

Its choice on who will head up its new Japanese business is also notable. OpenAI Japan will be led by Tadao Nagasaki, who joins the company from Amazon Web Services (AWS), where he headed up Amazon’s cloud computing subsidiary in the region for the past 12 years — so it’s clear that OpenAI is really targeting the enterprise segment with this latest expansion.

Enterprising

As President of OpenAI Japan, Nagasaki will be tasked with building a local team on the ground and double down on OpenAI’s recent growth in Japan which has seen it secure customers including Daikin, Rakuten, and Toyota which are using ChatGPT’s enterprise-focused incarnation which sports additional privacy, data analysis, and customization options on top of the standard consumer-grade ChatGPT.

OpenAI says ChatGPT is also already being used by local governments to “improve the efficiency of public services in Japan.”

GPT-4 customized for Japanese Image Credits: OpenAI

While ChatGPT has long been conversant in multiple languages, including Japanese, optimizing the latest version of the underlying GPT large language model (LLM) for Japanese will give it enhanced understanding of the nuances within the Japanese language, including cultural comprehension which should make it more effective particularly in business settings such as customer service and content creation. OpenAI also says that the custom model comes with improved performance, which means it should perform faster and be more cost effective than its predecessor.

For now, OpenAI is giving early access to the GPT-4 custom model to some local businesses, with access gradually opened up via the OpenAI API “in the coming months.”


Software Development in Sri Lanka

Robotic Automations

Sam Altman gives up control of OpenAI Startup Fund, resolving unusual corporate venture structure | TechCrunch


OpenAI CEO Sam Altman has transferred formal control of the eponymously firm’s named corporate venture fund to Ian Hathaway, OpenAI confirmed to TechCrunch. 

The OpenAI Startup Fund, launched in 2021, was initially set up with Altman as its named controller. The arrangement could have presented a major issue to the company if he had not been reinstated as OpenAI’s CEO following his brief ouster in November. The fund’s initial GP structure was intended as a temporary arrangement, and Altman made no personal investment, nor did he have any financial interest, a spokesperson explained. 

The news was earlier reported by Axios.

Hathaway joined OpenAI in 2021 and played a key role managing the Startup Fund, leading investments in Ambience Healthcare, Cursor, Harvey and Speak. He was previously an investor with Haystack, according to his LinkedIn profile.

Last year, the fund had $175 million in commitments, and now holds $325 million in gross net asset value, according to an SEC filing. Investors included Microsoft and other external backers. The unit invests in early-stage AI-driven companies in fields like healthcare, law and education.

The Startup Fund has backed at least 16 other startups, according to PitchBook data. They include Descript, a collaborative editing platform valued at $553 million last year, and Ghost Autonomy, which develops software for autonomous driving.


Software Development in Sri Lanka

Robotic Automations

OpenAI expands its custom model training program | TechCrunch


OpenAI is expanding a program, Custom Model, to help enterprise customers develop tailored generative AI models using its technology for specific use cases, domains and applications.

Custom Model launched last year at OpenAI’s inaugural developer conference, DevDay, offering companies an opportunity to work with a group of dedicated OpenAI researchers to train and optimize models for specific domains. “Dozens” of customers have enrolled in Custom Model since. But OpenAI says that, in working with this initial crop of users, it’s come to realize the need to grow the program to further “maximize performance.”

Hence assisted fine-tuning and custom-trained models.

Assisted fine-tuning, a new component of the Custom Model program, leverages techniques beyond fine-tuning — such as “additional hyperparameters and various parameter efficient fine-tuning methods at a larger scale,” in OpenAI’s words — to enable organizations to set up data training pipelines, evaluation systems and other supporting infrastructure toward bolstering model performance on particular tasks.

As for custom-trained models, they’re custom models built with OpenAI — using OpenAI’s base models and tools (e.g. GPT-4) — for customers that “need to more deeply fine-tune their models” or “imbue new, domain-specific knowledge,” OpenAI says.

OpenAI gives the example of SK Telecom, the Korean telecommunications giant, who worked with OpenAI to fine-tune GPT-4 to improve its performance in “telecom-related conversations” in Korean. Another customer, Harvey — which is building AI-powered legal tools with support from the OpenAI Startup Fund, OpenAI’s AI-focused venture arm — teamed up with OpenAI to create a custom model for case law that incorporated hundreds of millions of words of legal text and feedback from licensed expert attorneys.

“We believe that in the future, the vast majority of organizations will develop customized models that are personalized to their industry, business, or use case,” OpenAI writes in a blog post. “With a variety of techniques available to build a custom model, organizations of all sizes can develop personalized models to realize more meaningful, specific impact from their AI implementations.”

Image Credits: OpenAI

OpenAI is flying high, reportedly nearing an astounding $2 billion in annualized revenue. But there’s surely internal pressure to maintain pace, particularly as the company plots a $100 billion data center co-developed with Microsoft (if reports are to be believed). The cost of training and serving flagship generative AI models isn’t coming down anytime soon after all, and consulting work like custom model training might just be the thing to keep revenue growing while OpenAI plots its next moves.

Fine-tuned and custom models could also lessen the strain on OpenAI’s model serving infrastructure. Tailored models are in many cases smaller and more performant than their general-purpose counterparts, and — as the demand for generative AI reaches a fever pitch — no doubt present an attractive solution for a historically compute-capacity-challenged OpenAI.

Alongside the expanded Custom Model program and custom model building, OpenAI today unveiled new model fine-tuning features for developers working with GPT-3.5, including a new dashboard for comparing model quality and performance, support for integrations with third-party platforms (starting with the AI developer platform Weights & Biases) and enhancements to tooling. Mum’s the word on fine-tuning for GPT-4, however, which launched in early access during DevDay.


Software Development in Sri Lanka

Robotic Automations

OpenAI built a voice cloning tool, but you can't use it… yet | TechCrunch


As deepfakes proliferate, OpenAI is refining the tech used to clone voices — but the company insists it’s doing so responsibly.

Today marks the preview debut of OpenAI’s Voice Engine, an expansion of the company’s existing text-to-speech API. Under development for about two years, Voice Engine allows users to upload any 15-second voice sample to generate a synthetic copy of that voice. But there’s no date for public availability yet, giving the company time to respond to how the model is used and abused.

“We want to make sure that everyone feels good about how it’s being deployed — that we understand the landscape of where this tech is dangerous and we have mitigations in place for that,” Jeff Harris, a member of the product staff at OpenAI, told TechCrunch in an interview.

Training the model

The generative AI model powering Voice Engine has been hiding in plain sight for some time, Harris said.

The same model underpins the voice and “read aloud” capabilities in ChatGPT, OpenAI’s AI-powered chatbot, as well as the preset voices available in OpenAI’s text-to-speech API. And Spotify’s been using it since early September to dub podcasts for high-profile hosts like Lex Fridman in different languages.

I asked Harris where the model’s training data came from — a bit of a touchy subject. He would only say that the Voice Engine model was trained on a mix of licensed and publicly available data.

Models like the one powering Voice Engine are trained on an enormous number of examples — in this case, speech recordings — usually sourced from public sites and data sets around the web. Many generative AI vendors see training data as a competitive advantage and thus keep it and info pertaining to it close to the chest. But training data details are also a potential source of IP-related lawsuits, another disincentive to reveal much.

OpenAI is already being sued over allegations the company violated IP law by training its AI on copyrighted content, including photos, artwork, code, articles and e-books, without providing the creators or owners credit or pay.

OpenAI has licensing agreements in place with some content providers, like Shutterstock and the news publisher Axel Springer, and allows webmasters to block its web crawler from scraping their site for training data. OpenAI also lets artists “opt out” of and remove their work from the data sets that the company uses to train its image-generating models, including its latest DALL-E 3.

But OpenAI offers no such opt-out scheme for its other products. And in a recent statement to the U.K.’s House of Lords, OpenAI suggested that it’s “impossible” to create useful AI models without copyrighted material, asserting that fair use — the legal doctrine that allows for the use of copyrighted works to make a secondary creation as long as it’s transformative — shields it where it concerns model training.

Synthesizing voice

Surprisingly, Voice Engine isn’t trained or fine-tuned on user data. That’s owing in part to the ephemeral way in which the model — a combination of a diffusion process and transformer — generates speech.

“We take a small audio sample and text and generate realistic speech that matches the original speaker,” said Harris. “The audio that’s used is dropped after the request is complete.”

As he explained it, the model is simultaneously analyzing the speech data it pulls from and the text data meant to be read aloud, generating a matching voice without having to build a custom model per speaker.

It’s not novel tech. A number of startups have delivered voice cloning products for years, from ElevenLabs to Replica Studios to Papercup to Deepdub to Respeecher. So have Big Tech incumbents such as Amazon, Google and Microsoft — the last of which is a major OpenAI’s investor incidentally.

Harris claimed that OpenAI’s approach delivers overall higher-quality speech.

We also know it will be priced aggressively. Although OpenAI removed Voice Engine’s pricing from the marketing materials it published today, in documents viewed by TechCrunch, Voice Engine is listed as costing $15 per one million characters, or ~162,500 words. That would fit Dickens’ “Oliver Twist” with a little room to spare. (An “HD” quality option costs twice that, but confusingly, an OpenAI spokesperson told TechCrunch that there’s no difference between HD and non-HD voices. Make of that what you will.)

That translates to around 18 hours of audio, making the price somewhat south of $1 per hour. That’s indeed cheaper than what one of the more popular rival vendors, ElevenLabs, charges — $11 for 100,000 characters per month. But it does come at the expense of some customization.

Voice Engine doesn’t offer controls to adjust the tone, pitch or cadence of a voice. In fact, it doesn’t offer any fine-tuning knobs or dials at the moment, although Harris notes that any expressiveness in the 15-second voice sample will carry on through subsequent generations (for example, if you speak in an excited tone, the resulting synthetic voice will sound consistently excited). We’ll see how the quality of the reading compares with other models when they can be compared directly.

Voice talent as commodity

Voice actor salaries on ZipRecruiter range from $12 to $79 per hour — a lot more expensive than Voice Engine, even on the low end (actors with agents will command a much higher price per project). Were it to catch on, OpenAI’s tool could commoditize voice work. So, where does that leave actors?

The talent industry wouldn’t be caught unawares, exactly — it’s been grappling with the existential threat of generative AI for some time. Voice actors are increasingly being asked to sign away rights to their voices so that clients can use AI to generate synthetic versions that could eventually replace them. Voice work — particularly cheap, entry-level work — is at risk of being eliminated in favor of AI-generated speech.

Now, some AI voice platforms are trying to strike a balance.

Replica Studios last year signed a somewhat contentious deal with SAG-AFTRA to create and license copies of the media artist union members’ voices. The organizations said that the arrangement established fair and ethical terms and conditions to ensure performer consent while negotiating terms for uses of synthetic voices in new works, including video games.

ElevenLabs, meanwhile, hosts a marketplace for synthetic voices that allows users to create a voice, verify and share it publicly. When others use a voice, the original creators receive compensation — a set dollar amount per 1,000 characters.

OpenAI will establish no such labor union deals or marketplaces, at least not in the near term, and requires only that users obtain “explicit consent” from the people whose voices are cloned, make “clear disclosures” indicating which voices are AI-generated and agree not to use the voices of minors, deceased people or political figures in their generations.

“How this intersects with the voice actor economy is something that we’re watching closely and really curious about,” Harris said. “I think that there’s going to be a lot of opportunity to sort of scale your reach as a voice actor through this kind of technology. But this is all stuff that we’re going to learn as people actually deploy and play with the tech a little bit.”

Ethics and deepfakes

Voice cloning apps can be — and have been — abused in ways that go well beyond threatening the livelihoods of actors.

The infamous message board 4chan, known for its conspiratorial content, used ElevenLabs’ platform to share hateful messages mimicking celebrities like Emma Watson. The Verge’s James Vincent was able to tap AI tools to maliciously, quickly clone voices, generating samples containing everything from violent threats to racist and transphobic remarks. And over at Vice, reporter Joseph Cox documented generating a voice clone convincing enough to fool a bank’s authentication system.

There are fears bad actors will attempt to sway elections with voice cloning. And they’re not unfounded: In January, a phone campaign employed a deepfaked President Biden to deter New Hampshire citizens from voting — prompting the FCC to move to make future such campaigns illegal.

So aside from banning deepfakes at the policy level, what steps is OpenAI taking, if any, to prevent Voice Engine from being misused? Harris mentioned a few.

First, Voice Engine is only being made available to an exceptionally small group of developers — around 10 — to start. OpenAI is prioritizing use cases that are “low risk” and “socially beneficial,” Harris says, like those in healthcare and accessibility, in addition to experimenting with “responsible” synthetic media.

A few early Voice Engine adopters include Age of Learning, an edtech company that’s using the tool to generate voice-overs from previously cast actors, and HeyGen, a storytelling app leveraging Voice Engine for translation. Livox and Lifespan are using Voice Engine to create voices for people with speech impairments and disabilities, and Dimagi is building a Voice Engine-based tool to give feedback to health workers in their primary languages.

Here’s generated voices from Lifespan:


And here’s one from Livox:

Second, clones created with Voice Engine are watermarked using a technique OpenAI developed that embeds inaudible identifiers in recordings. (Other vendors including Resemble AI and Microsoft employ similar watermarks.) Harris didn’t promise that there aren’t ways to circumvent the watermark, but described it as “tamper resistant.”

“If there’s an audio clip out there, it’s really easy for us to look at that clip and determine that it was generated by our system and the developer that actually did that generation,” Harris said. “So far, it isn’t open sourced — we have it internally for now. We’re curious about making it publicly available, but obviously, that comes with added risks in terms of exposure and breaking it.”

Third, OpenAI plans to provide members of its red teaming network, a contracted group of experts that help inform the company’s AI model risk assessment and mitigation strategies, access to Voice Engine to suss out malicious uses.

Some experts argue that AI red teaming isn’t exhaustive enough and that it’s incumbent on vendors to develop tools to defend against harms that their AI might cause. OpenAI isn’t going quite that far with Voice Engine — but Harris asserts that the company’s “top principle” is releasing the technology safely.

General release

Depending on how the preview goes and the public reception to Voice Engine, OpenAI might release the tool to its wider developer base, but at present, the company is reluctant to commit to anything concrete.

Harris did give a sneak peek at Voice Engine’s roadmap, though, revealing that OpenAI is testing a security mechanism that has users read randomly generated text as proof that they’re present and aware of how their voice is being used. This could give OpenAI the confidence it needs to bring Voice Engine to more people, Harris said — or it might just be the beginning.

“What’s going to keep pushing us forward in terms of the actual voice matching technology is really going to depend on what we learn from the pilot, the safety issues that are uncovered and the mitigations that we have in place,” he said. “We don’t want people to be confused between artificial voices and actual human voices.”

And on that last point we can agree.


Software Development in Sri Lanka

Robotic Automations

Why Amazon is betting $4B on Anthropic’s AI success


The current AI wave is a never-ending barrage of news items. To understand what I mean, ask yourself how long you spent considering the fact that Amazon put another $2.75 billion into Anthropic AI last week. Right?

We’ve become inured to the capital influx that is now common in AI, even as the headline numbers get bigger. Sure, Amazon is slinging cash at Anthropic, but single-digit billions are chump change compared to what some companies have planned. Hell, even smaller tech companies — compared to the true giants — are spending to stay on the cutting edge.

So as we digest Amazon’s latest, let’s do a quick rewind through some of the largest AI rounds in the last few quarters and ask ourselves why some Big Tech corps get busy with their checkbooks.


Software Development in Sri Lanka

Robotic Automations

Understanding humanoid robots | TechCrunch


Robots made their stage debut the day after New Year’s 1921. More than half-a-century before the world caught its first glimpse of George Lucas’ droids, a small army of silvery humanoids took to the stages of the First Czechoslovak Republic. They were, for all intents and purposes, humanoids: two arms, two legs, a head — the whole shebang.

Karel Čapek’s play, R.U.R (Rossumovi Univerzální Roboti), was a hit. It was translated into dozens of languages and played across Europe and North America. The work’s lasting legacy, however, was its introduction of the word “robot.” The meaning of the term has evolved a good bit in the intervening century, as Čapek’s robots were more organic than machine.

Decades of science fiction have, however, ensured that the public image of robots hasn’t strayed too far from its origins. For many, the humanoid form is still the platonic robot ideal — it’s just that the state of technology hasn’t caught up to that vision. Earlier this week, Nvidia held its own on-stage robot parade at its GTC developer conference, as CEO Jensen Huang was flanked by images of a half-dozen humanoids.

While the notion of the concept of the general-purpose humanoid has, in essence, been around longer than the word “robot,” until recently, the realization of the concept has seemed wholly out of grasp. We’re very much not there yet, but for the first time, the concept has appeared over the horizon.

What is a “general-purpose humanoid?”

Image Credits: Nvidia

Before we dive any deeper, let’s get two key definitions out of the way. When we talk about “general-purpose humanoids,” the fact is that both terms mean different things to different people. In conversations, most people take a Justice Potter “I know it when I see it” approach to both in conversation.

For the sake of this article, I’m going to define a general-purpose robot as one that can quickly pick up skills and essentially do any task a human can do. One of the big sticking points here is that multi-purpose robots don’t suddenly go general-purpose overnight.

Because it’s a gradual process, it’s difficult to say precisely when a system has crossed that threshold. There’s a temptation to go down a bit of a philosophical rabbit hole with that latter bit, but for the sake of keeping this article under book length, I’m going to go ahead and move on to the other term.

I received a bit of (largely good-natured) flack when I referred to Reflex Robotics’ system as a humanoid. People pointed out the plainly obvious fact that the robot doesn’t have legs. Putting aside for a moment that not all humans have legs, I’m fine calling the system a “humanoid” or more specifically a “wheeled humanoid.” In my estimation, it resembles the human form closely enough to fit the bill.

A while back, someone at Agility took issue when I called Digit “arguably a humanoid,” suggesting that there was nothing arguable about it. What’s clear is that robot isn’t as faithful an attempt to recreate the human form as some of the competition. I will admit, however, that I may be somewhat biased having tracked the robot’s evolution from its precursor Cassie, which more closely resembled a headless ostrich (listen, we all went through an awkward period).

Another element I tend to consider is the degree to which the humanlike form is used to perform humanlike tasks. This element isn’t absolutely necessary, but it’s an important part of the spirit of humanoid robots. After all, proponents of the form factor will quickly point out the fact that we’ve built our worlds around humans, so it makes sense to build humanlike robots to work in that world.

Adaptability is another key point used to defend the deployment of bipedal humanoids. Robots have had factory jobs for decades now, and the vast majority of them are single-purpose. That is to say, they were built to do a single thing very well a lot of times. This is why automation has been so well-suited for manufacturing — there’s a lot of uniformity and repetition, particularly in the world of assembly lines.

Brownfield vs. greenfield

Image Credits: Brian Heater

The terms “greenfield” and “brownfield” have been in common usage for several decades across various disciplines. The former is the older of two, describing undeveloped land (quite literally, a green field). Developed to contrast the earlier term, brownfield refers to development on existing sites. In the world of warehouses, it’s the difference between building something from scratch or working with something that’s already there.

There are pros and cons of both. Brownfields are generally more time and cost-effective, as they don’t require starting from scratch, while greenfields afford to opportunity to built a site entirely to spec. Given infinite resources, most corporations will opt for a greenfield. Imagine the performance of a space built ground-up with automated systems in mind. That’s a pipedream for most organizers, so when it comes time to automate, a majority of companies seek out brownfield solutions — doubly so when they’re first dipping their toes into the robotic waters.

Given that most warehouses are brownfield, it ought come as no surprise that the same can be said for the robots designed for these spaces. Humanoids fit neatly into this category — in fact, in a number of respects, they are among the brownest of brownfield solutions. This gets back to the earlier point about building humanoid robots for their environments. You can safely assume that most brownfield factories were designed with human workers in mind. That often comes with elements like stairs, which present an obstacle for wheeled robots. How large that obstacle ultimately is depends on a lot of factors, including layout and workflow.

Baby steps

Image Credits: Figure

Call me a wet blanket, but I’m a big fan of setting realistic expectations. I’ve been doing this job for a long time and have survived my share of hype cycles. There’s an extent to which they can be useful, in terms of building investor and customer interest, but it’s entirely too easy to fall prey to overpromises. This includes both stated promises around future functionality and demo videos.

I wrote about the latter last month in a post cheekily titled, “How to fake a robotics demo for fun and profit.” There are a number of ways to do this, including hidden teleoperation and creative editing. I’ve heard whispers that some firms are speeding up videos, without disclosing the information. In fact, that’s the origin of humanoid firm 1X’s name — all of their demos are run in 1X speed.

Most in the space agree that disclosure is important — even necessary — on such products, but there aren’t strict standards in place. One could argue that you’re wading into a legal gray area if such videos play a role in convincing investors to plunk down large sums of money. At the very least, they set wildly unrealistic expectations among the public — particularly those who are inclined to take truth-stretching executives’ words as gospel.

That can only serve to harm those who are putting in the hard work while operating in reality with the rest of us. It’s easy to see how hope quickly diminishes when systems fail to live up to those expectations.

The timeline to real-world deployment contains two primary constraints. The first is mechatronic: i.e. what the hardware is capable of. The second is software and artificial intelligence. Without getting into a philosophical debate around what qualifies as artificial general intelligence (AGI) in robots, one thing we can certainly say is that progress has — and will continue to be gradual.

As Huang noted at GTC the other week, “If we specified AGI to be something very specific, a set of tests where a software program can do very well — or maybe 8% better than most people — I believe we will get there within five years.” That’s on the optimistic end of the timeline I’ve heard from most experts in the field. A range of five to 10 years seems common.

Before hitting anything resembling AGI, humanoids will start as single-purpose systems, much like their more traditional counterparts. Pilots are designed to prove out that these systems can do one thing well at scale before moving onto the next. Most people are looking at tote moving for that lowest-hanging fruit. Of course, your average Kiva/Locus AMR can move totes around all day, but those systems lack the mobile manipulators required to move payloads on and off themselves. That’s where robot arms and end effectors come in, whether or not they happen to be attached to something that looks human.

Speaking to me the other week at the Modex show in Atlanta, Dexterity founding engineer Robert Sun floated an interesting point: humanoids could provide a clever stopgap on the way to lights out (fully automated) warehouses and factories. Once full automation is in place, you won’t necessarily require the flexibility of a humanoid. But can we reasonably expect these systems to be fully operational in time?

“Transitioning all logistics and warehousing work to roboticized work, I thought humanoids could be a good transition point,” Sun said. “Now we don’t have the human, so we’ll put the humanoid there. Eventually, we’ll move to this automated lights-out factory. Then the issue of humanoids being very difficult makes it hard to put them in the transition period.”

Take me to the pilot

Image Credits: Apptronik/Mercedes

The current state of humanoid robotics can be summed up in one word: pilot. It’s an important milestone, but one that doesn’t necessarily tell us everything. Pilot announcements arrive as press releases announcing the early stage of a potential partnership. Both parties love them.

For the startup, they represent real, provable interest. For the big corporation, they signal to shareholders that the firm is engaging with the state of the art. Rarely, however, are real figures mentioned. Those generally enter the picture when we start discussing purchase orders (and even then, often not).

The past year has seen a number of these announced. BMW is working with Figure, while Mercedes has enlisted Apptronik. Once again, Agility has a head start on the rest, having completed its pilots with Amazon — we are, however, still waiting for word on the next step. It’s particularly telling that — in spite of the long-term promise of general-purpose systems, just about everyone in the space is beginning with the same basic functionality.

Two legs to stand on

Image Credits: Brian Heater

At this point, the clearest path to AGI should look familiar to anyone with a smartphone. Boston Dynamics’ Spot deployment provides a clear real-world example of how the app store model can work with industrial robots. While there’s a lot of compelling work being done in the world of robot learning, we’re a ways off from systems that can figure out new tasks and correct mistakes on the fly at scale. If only robotics manufacturers could leverage third-party developers in a manner similar to phonemakers.

Interest in the category has increased substantially in recent months, but speaking personally, the needle hasn’t moved too much in either direction for me since late last year. We’ve seen some absolutely killer demos, and generative AI presents a promising future. OpenAI is certainly hedging its bets, first investing in 1X and — more recently — Figure.

A lot of smart people have faith in the form factor and plenty of others remain skeptical. One thing I’m confident saying, however, is that whether or not future factories will be populated with humanoid robots on a meaningful scale, all of this work will amount to something. Even the most skeptical roboticists I’ve spoken to on the subject have pointed to the NASA model, where the race to land humans on the moon led to the invention of products we use on Earth to this day.

We’re going to see continued breakthroughs in robotic learning, mobile manipulation and locomotion (among others) that will impact the role automation plays in our daily life one way or another.


Software Development in Sri Lanka

Back
WhatsApp
Messenger
Viber