OpenAI inks deal to train AI on Reddit data | TechCrunch

May 17, 2024 3:53 AM0

OpenAI has reached a deal with Reddit to use the social news site’s data for training AI models. In a blog post on OpenAI’s press relations site, the company said that the Reddit partnership will provide it access to “real-time, structured and unique content” — e.g. posts and replies — from Reddit, allowing its tools […]

Software Development in Sri Lanka

Robotic Automations

Sony Music warns tech companies over 'unauthorized' use of its content to train AI | TechCrunch

May 16, 2024 11:06 PM0

Sony Music Group has sent letters to more than 700 tech companies and music streaming services to warn them not to use its music to train AI without explicit permission. The letter, which was obtained by TechCrunch, says Sony Music has “reason to believe” that the recipients of the letter have “may already have made […]

Software Development in Sri Lanka

Robotic Automations

Photo-sharing community EyeEm will license users photos to train AI if they don't delete them | TechCrunch

April 27, 2024 2:43 AM0

EyeEm, the Berlin-based photo-sharing community that exited last year to Spanish company Freepik, after going bankrupt, is now licensing its users’ photos to train AI models. Earlier this month, the company informed users via email that it was adding a new clause to its Terms & Conditions that would grant it the rights to upload users’ content to “train, develop, and improve software, algorithms, and machine-learning models.” Users were given 30 days to opt out by removing all their content from EyeEm’s platform. Otherwise, they were consenting to this use case for their work.

At the time of its 2023 acquisition, EyeEm’s photo library included 160 million images and nearly 150,000 users. The company said it would merge its community with Freepik’s over time.

Once thought of as a possible challenger to Instagram — or at least “Europe’s Instagram” — EyeEm had dwindled to a staff of three before selling to Freepik, TechCrunch’s Ingrid Lunden previously reported. Joaquin Cuenca Abela, CEO of Freepik, hinted at the company’s possible plans for EyeEm, saying it would explore how to bring more AI into the equation for creators on the platform.

As it turns out, that meant selling their work to train AI models.

Now, EyeEm’s updated Terms & Conditions reads as follows:

8.1 Grant of Rights – EyeEm Community

By uploading Content to EyeEm Community, you grant us regarding your Content the non-exclusive, worldwide, transferable and sublicensable right to reproduce, distribute, publicly display, transform, adapt, make derivative works of, communicate to the public and/or promote such Content.

This specifically includes the sublicensable and transferable right to use your Content for the training, development and improvement of software, algorithms and machine learning models. In case you do not agree to this, you should not add your Content to EyeEm Community.

The rights granted in this section 8.1 regarding your Content remains valid until complete deletion from EyeEm Community and partner platforms according to section 13. You can request the deletion of your Content at any time. The conditions for this can be found in section 13.

Section 13 details a complicated process for deletions that begins with first deleting photos directly — which would not impact content that had been previously shared to EyeEm Magazine or social media, the company notes. To delete content from the EyeEm Market (where photographers sold their photos) or other content platforms, users would have to submit a request to [email protected] and provide the Content ID numbers for those photos they wanted to delete and whether it should be removed from their account, as well, or the EyeEm market only.

Of note, the notice says that these deletions from EyeEm market and partner platforms could take up to 180 days. Yes, that’s right: requested deletions take up to 180 days but users only have 30 days to opt out. That means the only option is manually deleting photos one by one.

Worse still, the company adds that:

You hereby acknowledge and agree that your authorization for EyeEm to market and license your Content according to sections 8 and 10 will remain valid until the Content is deleted from EyeEm and all partner platforms within the time frame indicated above. All license agreements entered into before complete deletion and the rights of use granted thereby remain unaffected by the request for deletion or the deletion.

Section 8 is where licensing rights to train AI are detailed. In Section 10, EyeEm informs users they will forgo their right to any payouts for their work if they delete their account — something users may think to do to avoid having their data fed to AI models. Gotcha!

EyeEm’s move is an example of how AI models are being trained on the back of users’ content, sometimes without their explicit consent. Though EyeEm did offer an opt-out procedure of sorts, any photographer who missed the announcement would have lost the right to dictate how their photos were to be used going forward. Given that EyeEm’s status as a popular Instagram alternative had significantly declined over the years, many photographers may have forgotten they had ever used it in the first place. They certainly may have ignored the email, if it wasn’t already in a spam folder somewhere.

Those who did notice the changes were upset they were only given a 30-day notice and no options to bulk delete their contributions, making it more painful to opt out.

Has anyone figured out a way to batch delete their photos from #EyeEm. I got this email yesterday. While I only have 60 photos there, I’d prefer not to feed the training data beast for free… pic.twitter.com/lUuDR5BnGb

— Powen Shiah @[email protected] (@polexa) April 5, 2024

Suggest existing @EyeEm users run away fast. They’ve sneaked in this destructive rights grab as an opt out: “These rights now include the sublicensable and transferable right to use your Content to train, develop, and improve software, algorithms, and machine-learning models.”

— Joel Goodman (@pixel8foto) April 3, 2024

Requests for comment sent to EyeEm weren’t immediately confirmed, but given this countdown had a 30-day deadline, we’ve opted to publish before hearing back.

This sort of dishonest behavior is why users today are considering a move to the open social web. The federated platform, Pixelfed, which runs on the same ActivityPub protocol that powers Mastodon, is capitalizing on the EyeEm situation to attract users.

In a post on its official account, Pixelfed announced “We will never use your images to help train AI models. Privacy First, Pixels Forever.”

EyeEm, the photo marketplace, changes hands as Freepik picks it up out of bankruptcy

Software Development in Sri Lanka

Robotic Automations

Vana plans to let users rent out their Reddit data to train AI | TechCrunch

April 13, 2024 10:13 PM0

In the generative AI boom, data is the new oil. So why shouldn’t you be able to sell your own?

From big tech firms to startups, AI makers are licensing e-books, images, videos, audio and more from data brokers, all in the pursuit of training up more capable (and more legally defensible) AI-powered products. Shutterstock has deals with Meta, Google, Amazon and Apple to supply millions of images for model training, while OpenAI has signed agreements with several news organizations to train its models on news archives.

In many cases, the individual creators and owners of that data haven’t seen a dime of the cash changing hands. A startup called Vana wants to change that.

Anna Kazlauskas and Art Abal, who met in a class at the MIT Media Lab focused on building tech for emerging markets, co-founded Vana in 2021. Prior to Vana, Kazlauskas studied computer science and economics at MIT, eventually leaving to launch a fintech automation startup, Iambiq, out of Y Combinator. Abal, a corporate lawyer by training and education, was an associate at The Cadmus Group, a Boston-based consulting firm, before heading up impact sourcing at data annotation company Appen.

With Vana, Kazlauskas and Abal set out to build a platform that lets users “pool” their data — including chats, speech recordings and photos — into data sets that can then be used for generative AI model training. They also want to create more personalized experiences — for instance, daily motivational voicemail based on your wellness goals, or an art-generating app that understands your style preferences — by fine-tuning public models on that data.

“Vana’s infrastructure in effect creates a user-owned data treasury,” Kazlauskas told TechCrunch. “It does this by allowing users to aggregate their personal data in a non-custodial way … Vana allows users to own AI models and use their data across AI applications.”

Here’s how Vana pitches its platform and API to developers:

The Vana API connects a user’s cross-platform personal data … to allow you to personalize your application. Your app gains instant access to a user’s personalized AI model or underlying data, simplifying onboarding and eliminating compute cost concerns … We think users should be able to bring their personal data from walled gardens, like Instagram, Facebook and Google, to your application, so you can create amazing personalized experience from the very first time a user interacts with your consumer AI application.

Creating an account with Vana is fairly simple. After confirming your email, you can attach data to a digital avatar (like selfies, a description of yourself and voice recordings) and explore apps built using Vana’s platform and data sets. The app selection ranges from ChatGPT-style chatbots and interactive storybooks to a Hinge profile generator.

Image Credits: Vana

Now why, you might ask — in this age of increased data privacy awareness and ransomware attacks — would someone ever volunteer their personal info to an anonymous startup, much less a venture-backed one? (Vana has raised $20 million to date from Paradigm, Polychain Capital and other backers.) Can any profit-driven company really be trusted not to abuse or mishandle any monetizable data it gets its hands on?

Image Credits: Vana

In response to that question, Kazlauskas stressed that the whole point of Vana is for users to “reclaim control over their data,” noting that Vana users have the option to self-host their data rather than store it on Vana’s servers and control how their data’s shared with apps and developers. She also argued that, because Vana makes money by charging users a monthly subscription (starting at $3.99) and levying a “data transaction” fee on devs (e.g. for transferring data sets for AI model training), the company is disincentivized to exploit users and the troves of personal data they bring with them.

“We want to create models owned and governed users who all contribute their data,” Kazlauskas said, “and allow users to bring their data and models with them to any application.”

Now, while Vana isn’t selling users’ data to companies for generative AI model training (or so it claims), it wants to allow users to do this themselves if they choose — starting with their Reddit posts.

This month, Vana launched what it’s calling the Reddit Data DAO (Digital Autonomous Organization), a program that pools multiple users’ Reddit data (including their karma and post history) and lets them to decide together how that combined data is used. After joining with a Reddit account, submitting a request to Reddit for their data and uploading that data to the DAO, users gain the right to vote alongside other members of the DAO on decisions like licensing the combined data to generative AI companies for a shared profit.

We have crunched the numbers and r/datadao is now largest data DAO in history: Phase 1 welcomed 141,000 reddit users with 21,000 full data uploads.

— r/datadao (@rdatadao) April 11, 2024

It’s an answer of sorts to Reddit’s recent moves to commercialize data on its platform.

Reddit previously didn’t gate access to posts and communities for generative AI training purposes. But it reversed course late last year, ahead of its IPO. Since the policy change, Reddit has raked in over $203 million in licensing fees from companies including Google.

“The broad idea [with the DAO is] to free user data from the major platforms that seek to hoard and monetize it,” Kazlauskas said. “This is a first and is part of our push to help people pool their data into user-owned data sets for training AI models.”

Unsurprisingly, Reddit — which isn’t working with Vana in any official capacity — isn’t pleased about the DAO.

Reddit banned Vana’s subreddit dedicated to discussion about the DAO. And a Reddit spokesperson accused Vana of “exploiting” its data export system, which is designed to comply with data privacy regulations like the GDPR and California Consumer Privacy Act.

“Our data arrangements allow us to put guardrails on such entities, even on public information,” the spokesperson told TechCrunch. “Reddit does not share non-public, personal data with commercial enterprises, and when Redditors request an export of their data from us, they receive non-public personal data back from us in accordance with applicable laws. Direct partnerships between Reddit and vetted organizations, with clear terms and accountability, matters, and these partnerships and agreements prevent misuse and abuse of people’s data.”

But does Reddit have any real reason to be concerned?

Kazlauskas envisions the DAO growing to the point where it impacts the amount Reddit can charge customers for its data. That’s a long ways off, assuming it ever happens; the DAO has just over 141,000 members, a tiny fraction of Reddit’s 73-million-strong user base. And some of those members could be bots or duplicate accounts.

Then there’s the matter of how to fairly distribute payments that the DAO might receive from data buyers.

Currently, the DAO awards “tokens” — cryptocurrency — to users corresponding to their Reddit karma. But karma might not be the best measure of quality contributions to the data set — particularly in smaller Reddit communities with fewer opportunities to earn it.

Kazlauskas floats the idea that members of the DAO could choose to share their cross-platform and demographic data, making the DAO potentially more valuable and incentivizing sign-ups. But that would also require users to place even more trust in Vana to treat their sensitive data responsibly.

Personally, I don’t see Vana’s DAO reaching critical mass. The roadblocks standing in the way are far too many. I do think, however, that it won’t be the last grassroots attempt to assert control over the data increasingly being used to train generative AI models.

Startups like Spawning are working on ways to allow creators to impose rules guiding how their data is used for training while vendors like Getty Images, Shutterstock and Adobe continue to experiment with compensation schemes. But no one’s cracked the code yet. Can it even be cracked? Given the cutthroat nature of the generative AI industry, it’s certainly a tall order. But perhaps someone will find a way — or policymakers will force one.

Software Development in Sri Lanka

Robotic Automations

Modal raises $25M to train corporate workers on data and AI | TechCrunch

April 12, 2024 1:12 PM0

A few years ago, Darren Shimkus, ex-president of Udemy, had a conversation with Dennis Yang about skills building.

Shimkus was of the belief that building skills in the corporate sector was a difficult, but not intractable, challenge — one that could perhaps be solved with the right technology. He brought it up to Yang, who had been Udemy’s CEO.

“At Udemy, Yang and I solved the ‘access’ problem to learning — anyone at any company can find great video content on the skills they want to pick up,” Shimkus told TechCrunch. “But it turns out that solving access to video isn’t enough on its own.”

One thing led to another, and soon, Shimkus and Yang had a new startup on their hands: Modal.

Modal provides personalized technical skills training for a company’s staff, offering on-demand coaching and a pedagogical approach that groups users into semi-structured online learning communities.

“Our offering assesses every member of a team, identifies gaps in their skill sets, and creates a custom plan for each team member that minimizes the ‘skill risk’ companies face in hitting their strategic goals,” Shimkus explained. “We support all kinds of goals — modernization efforts, digital transformation and even training new employees from an acquisition.”

The upskilling market is a crowded space, occupied by startups like GrowthSpace, Learnsoft, Pollen, Scaler, Workera and others. So how does Modal plan to make a splash?

First, Shimkus says, by honing in on hot trends: data and AI. Modal’s initial set of e-learning courses focus exclusively on these, which seems like a wise strategic choice given today’s market.

“The rise of AI is bringing more visibility to data teams than ever before,” Shimkus said. “Expectations are through the roof, and many teams are realizing they need to rapidly develop their AI capabilities, broad technical acumen and even the business skills of their teams.”

Image Credits: Modal

Modal’s second advantage is its emphasis on real-world application, asserts Shimkus. As learners make their way through Modal’s courses, the coach with which they’re paired helps contextualize and assist with key concepts, Shimkus says.

“From the perspective of the learner, our inclusion of applied practice and coaching really set us apart from traditional e-learning platforms,” Shimkus continued. “We think our direct competitors are relatively few.”

It seems to be an effective sales pitch. Modal, which only charges companies when a staffer completes a course, has over 100 clients at present, with the majority coming from the Fortune 1000, per Shimkus.

Modal recently raised $25 million in a funding round led by Left Lane Capital, bringing its total raised to $32 million. Now it plans to grow its team to “support incoming demand and expand Modal’s offering to organizations worldwide.”

“We’ve been fortunate that the rise of generative AI has driven a critical need for upskilling in companies — no one can afford to miss out on transforming their teams and businesses,” Shimkus said. “It’s hard in today’s ever-changing workplace landscape to predict what your teams need, meaning most leaders don’t have a reliable way to plan for and improve their team’s skills. Modal is built for this scenario.”

Software Development in Sri Lanka

Tag: train

OpenAI inks deal to train AI on Reddit data | TechCrunch

Sony Music warns tech companies over 'unauthorized' use of its content to train AI | TechCrunch

Photo-sharing community EyeEm will license users photos to train AI if they don't delete them | TechCrunch

Vana plans to let users rent out their Reddit data to train AI | TechCrunch

Categories

Recent Posts

Our Brochures

Archives

About company

Recent Articles

Office Location

Contact info

DMCA Protected

Tag: train

OpenAI inks deal to train AI on Reddit data | TechCrunch

Sony Music warns tech companies over 'unauthorized' use of its content to train AI | TechCrunch

Photo-sharing community EyeEm will license users photos to train AI if they don't delete them | TechCrunch

Vana plans to let users rent out their Reddit data to train AI | TechCrunch

Modal raises $25M to train corporate workers on data and AI | TechCrunch

Categories

Recent Posts

Do you have any questions?

Tag Cloud

Our Brochures

Archives

About company

Recent Articles

Office Location

Contact info

DMCA Protected