Xorte logo

News Markets Groups

USA | Europe | Asia | World| Stocks | Commodities



Add a new RSS channel

 
 


Keywords

2024-07-16 19:08:27| Engadget

Some of the worlds largest tech companies trained their AI models on a dataset that included transcripts of more than 173,000 YouTube videos without permission, a new investigation from Proof News has found. The dataset, which was created by a nonprofit company called EleutherAI, contains transcripts of YouTube videos from more than 48,000 channels and was used by Apple, NVIDIA and Anthropic among other companies. The findings of the investigation spotlight AIs uncomfortable truth: the technology is largely built on the backs of data siphoned from creators without their consent or compensation. The dataset doesnt include any videos or images from YouTube, but contains video transcripts from the platform's biggest creators including Marques Brownlee and MrBeast, as well as large news publishers like The New York Times, the BBC, and ABC News. Subtitles from videos belonging to Engadget are also part of the dataset. Apple has sourced data for their AI from several companies, Brownlee posted on X. One of them scraped tons of data/transcripts from YouTube videos, including mine, he added. This is going to be an evolving problem for a long time. Apple has sourced data for their AI from several companiesOne of them scraped tons of data/transcripts from YouTube videos, including mineApple technically avoids "fault" here because they're not the ones scrapingBut this is going to be an evolving problem for a long time https://t.co/U93riaeSlY Marques Brownlee (@MKBHD) July 16, 2024 YouTube, Apple, NVIDIA, Anthropic and EleutherAI did not respond to a request for comment from Engadget. So far, AI companies havent been transparent about the data used to train their models. Earlier this month, artists and photographers criticized Apple for failing to reveal the source of training data for Apple Intelligence, the company own spin on generative AI coming to millions of Apple devices this year. YouTube, the worlds largest repository of videos, in particular, is a goldmine of not only transcripts but also audio, video, and images, making it an attractive dataset for training AI models. Earlier this year, OpenAIs chief technology officer, Mira Murati, evaded questions from The Wall Street Journal about whether the company used YouTube videos to train Sora, OpenAIs upcoming AI video generation tool. Im not going to go into the details of the data that was used, but it was publicly available or licensed data, Murati said at the time. Both YouTube CEO Neal Mohan and Alphabet CEO Sundar Pichai have said that companies using data from YouTube to train their AI models was a violation of the platforms terms of service.If you want to see if subtitles from your YouTube videos or from your favorite channels are part of the dataset, head over to the Proof News' lookup tool. This article originally appeared on Engadget at https://www.engadget.com/apple-nvidia-and-anthropic-reportedly-used-youtube-transcripts-without-permission-to-train-ai-models-170827317.html?src=rss


Category: Marketing and Advertising

 

Latest from this category

07.09Theres a Stranger Things Polly Pocket set, and its design is really clever
07.09Over 1.4 million Ram 1500 trucks recalled to fix a bug in the anti-lock brake system
07.09Meta shares how WhatsApp and Messenger will interact with other messaging apps in the EU
07.09How to use a VPN on Roku
07.09Boeing's Starliner is back without the astronauts it flew to the ISS
07.09An Apple Store in Oklahoma City is close to approving an union agreement for its workers
06.09YouTubers built a six foot tall working replica of Apples iPhone 15 Pro Max
06.09YouTube terminates five right-wing channels linked to the DOJs Russia indictments
Marketing and Advertising »

All news

08.09Today's Headlines
08.09Chinese giant Chery could build cars in UK
07.09Theres a Stranger Things Polly Pocket set, and its design is really clever
07.09Over 1.4 million Ram 1500 trucks recalled to fix a bug in the anti-lock brake system
07.09Contaminated eggs sold in Illinois recalled after causing Salmonella infections
07.09Meta shares how WhatsApp and Messenger will interact with other messaging apps in the EU
07.09Body Shop's remaining stores rescued from administration
07.09How to use a VPN on Roku
More »
Privacy policy . Copyright . Contact form .