From Digital Age to Nano Age. WorldWide.

Tag: Sora

Robotic Automations

Creators of Sora-powered short explain AI-generated video's strengths and limitations | TechCrunch


OpenAI’s video generation tool Sora took the AI community by surprise in February with fluid, realistic video that seems miles ahead of competitors. But the carefully stage-managed debut left out a lot of details — details that have been filled in by a filmmaker given early access to create a short using Sora.

Shy Kids is a digital production team based in Toronto that was picked by OpenAI as one of a few to produce short films essentially for OpenAI promotional purposes, though they were given considerable creative freedom in creating “air head.” In an interview with visual effects news outlet fxguide, post-production artist Patrick Cederberg described “actually using Sora” as part of his work.

Perhaps the most important takeaway for most is simply this: While OpenAI’s post highlighting the shorts lets the reader assume they more or less emerged fully formed from Sora, the reality is that these were professional productions, complete with robust storyboarding, editing, color correction, and post work like rotoscoping and VFX. Just as Apple says “shot on iPhone” but doesn’t show the studio setup, professional lighting, and color work after the fact, the Sora post only talks about what it lets people do, not how they actually did it.

Cederberg’s interview is interesting and quite non-technical, so if you’re interested at all, head over to fxguide and read it. But here are some interesting nuggets about using Sora that tell us that, as impressive as it is, the model is perhaps less of a giant leap forward than we thought.

Control is still the thing that is the most desirable and also the most elusive at this point. … The closest we could get was just being hyper-descriptive in our prompts. Explaining wardrobe for characters, as well as the type of balloon, was our way around consistency because shot to shot / generation to generation, there isn’t the feature set in place yet for full control over consistency.

In other words, matters that are simple in traditional filmmaking, like choosing the color of a character’s clothing, take elaborate workarounds and checks in a generative system, because each shot is created independent of the others. That could obviously change, but it is certainly much more laborious at the moment.

Sora outputs had to be watched for unwanted elements as well: Cederberg described how the model would routinely generate a face on the balloon that the main character has for a head, or a string hanging down the front. These had to be removed in post, another time-consuming process, if they couldn’t get the prompt to exclude them.

Precise timing and movements of characters or the camera aren’t really possible: “There’s a little bit of temporal control about where these different actions happen in the actual generation, but it’s not precise … it’s kind of a shot in the dark,” said Cederberg.

For example, timing a gesture like a wave is a very approximate, suggestion-driven process, unlike manual animations. And a shot like a pan upward on the character’s body may or may not reflect what the filmmaker wants — so the team in this case rendered a shot composed in portrait orientation and did a crop pan in post. The generated clips were also often in slow motion for no particular reason.

Example of a shot as it came out of Sora and how it ended up in the short. Image Credits: Shy Kids

In fact, using the everyday language of filmmaking, like “panning right” or “tracking shot” were inconsistent in general, Cederberg said, which the team found pretty surprising.

“The researchers, before they approached artists to play with the tool, hadn’t really been thinking like filmmakers,” he said.

As a result, the team did hundreds of generations, each 10 to 20 seconds, and ended up using only a handful. Cederberg estimated the ratio at 300:1 — but of course we would probably all be surprised at the ratio on an ordinary shoot.

The team actually did a little behind-the-scenes video explaining some of the issues they ran into, if you’re curious. Like a lot of AI-adjacent content, the comments are pretty critical of the whole endeavor — though not quite as vituperative as the AI-assisted ad we saw pilloried recently.

The last interesting wrinkle pertains to copyright: If you ask Sora to give you a “Star Wars” clip, it will refuse. And if you try to get around it with “robed man with a laser sword on a retro-futuristic spaceship,” it will also refuse, as by some mechanism it recognizes what you’re trying to do. It also refused to do an “Aronofsky type shot” or a “Hitchcock zoom.”

On one hand, it makes perfect sense. But it does prompt the question: If Sora knows what these are, does that mean the model was trained on that content, the better to recognize that it is infringing? OpenAI, which keeps its training data cards close to the vest — to the point of absurdity, as with CTO Mira Murati’s interview with Joanna Stern — will almost certainly never tell us.

As for Sora and its use in filmmaking, it’s clearly a powerful and useful tool in its place, but its place is not “creating films out of whole cloth.” Yet. As another villain once famously said, “that comes later.”




Software Development in Sri Lanka

Robotic Automations

Former Snap AI chief launches Higgsfield to take on OpenAI's Sora video generator | TechCrunch


OpenAI captivated the tech world a few months back with a generative AI model, Sora, that turns scene descriptions into original videos — no cameras or film crews required. But Sora has so far been tightly gated, and the firm seems to be aiming it toward well-funded creatives like Hollywood directors — not hobbyists or small-time marketers, necessarily.

Alex Mashrabov, the former head of generative AI at Snap, sensed an opportunity. So he launched Higgsfield AI, an AI-powered video creation and editing platform designed for more tailored, personalized applications.

Powered by a custom text-to-video model, Higgsfield’s first app, Diffuse, can generate videos from scratch or take a selfie and generate a clip starring that person.

“Our target audience is creators of all types,” Mashrabov told TechCrunch in an interview, “from regular users who want to create fun content with their friends to social content creators looking to try a new content format to social media marketers who want their brand to stand out.”

Mashrabov came to Snap by way of AI Factory, his previous startup, which Snap acquired in 2020 for $166 million. While at Snap, Mashrabov helped build products like AR effects and filters for Snapchat, including Cameos, as well as Snapchat’s controversial MyAI chabot.

Higgsfield — which Mashrabov co-launched several months ago with Yerzat Dulat, an AI researcher specializing in generative video — offers a curated set of pre-generated clips, a tool to upload reference media (i.e. images and videos) and a prompt editor that lets users describe the characters, actions and scenes they wish to depict. Using Diffuse, users can insert themselves directly into an AI-generated scene, or have their digital likeness mimic things — like dance moves — captured in other videos.

Image Credits: Higgsfield

“Our model supports highly realistic movements and expressions,” Mashrabov said. “We’re pioneering ‘world models’ for consumers, which will allow us to build best-in-class video generation and editing with a great level of control.”

Higgsfield isn’t the only generative video startup going head to head with OpenAI. Runway was one of the first on the scene, and its tools continue to improve. There’s also Haiper, which has the backing of two DeepMind alums and over $13 million in venture cash.

Mashrabov argues that Diffuse will stand out thanks to its mobile-first, social-forward go-to-market strategy.

“By prioritizing iOS and Android apps instead of desktop workflows, we enable creators to create compelling social media content anytime and anywhere,” Mashrabov said. “Indeed, by building on mobile, we’re able to prioritize ease of use and consumer-friendly features from day one.”

Higgsfield is also running lean. Mashrabov says that the generative models underpinning the platform were developed by a 16-person team in less than nine months and trained on a cluster of 32 GPUs (32 GPUs might sound like a lot, but considering OpenAI uses tens of thousands, it’s not really). And Higgsfield has only raised $8 million to date, the bulk of which came from a recent seed funding tranche led by Menlo Ventures.

Image Credits: Higgsfield

To stay one step ahead of rivals, Higgsfield plans to put the seed cash toward building an improved video editor that’ll let users modify characters and objects in videos, and toward training more powerful video generation models specifically for social media use cases. In fact, Mashrabov sees social media — and social media marketing — as Higgsfield’s principle money-making niche.

While Diffuse is currently free to use, Mashrabov envisions a future where marketers pay some sort of fee or subscription for premium features, or for volume or large-scale campaigns.

“We believe Higgsfield unlocks an incredible level of realism and content production use cases for social media marketers,” he said. “We constantly hear from CMOs and creative directors that they need to optimize content production budgets and shorten timelines while still delivering impactful content. So we believe video generative AI solutions will be a core solution in helping them to achieve it.”

Of course, Higgsfield isn’t immune from the broader challenges facing generative AI startups.

It’s well-established that generative AI models like the kind powering Diffuse can “regurgitate” training data. Why’s that problematic? Well, if the models were trained on copyrighted content without permission or some sort of licensing agreement in place, those models’ users could unwittingly generate a copyright-infringing work — exposing them to lawsuits.

Image Credits: Higgsfield

Mashrabov wouldn’t reveal the source of Higgsfield’s training data (other than to say it comes from “multiple publicly available” places), and also wouldn’t say whether Higgsfield would retain user data to train future models, which might not sit right with some business customers. He did note that Diffuse users can request that their data be deleted at any time through the app.

Digital “cloning” platforms like Higgsfield are also ripe for abuse, as the wildfire spread of deepfakes on social media in recent months has shown.

In a similar vein, Higgsfield could make it easier to steal creators’ content. For instance, one need only upload a video of someone’s choreography to generate a video of themselves performing that same choreography.

I asked Mashrabov about what safeguards or protections Higgsfield might be using to attempt to prevent abuse, and — while he wouldn’t go into specifics — he claimed that the platform employs a mix of automated and manual moderation.

“We’ve decided to gradually roll out the product and test in select markets first, so that we can monitor where there’s the potential for abuse and evolve the product as necessary,” Mashrabov added.

We’ll have to wait and see how well that works in practice.


Software Development in Sri Lanka

Back
WhatsApp
Messenger
Viber