Oddbean new post about | logout
 Inside Big Tech's underground race to buy AI training data
==========

Tech giants like Google, Meta, and Microsoft-backed OpenAI are racing to buy AI training data. They initially used scraped data from the internet for free to train generative AI models like ChatGPT. They now face lawsuits from copyright holders. Photobucket, a former image-hosting site, is in talks with multiple tech companies to license its 13 billion photos and videos for AI training. Rates discussed range from 5 cents to $1 per photo and more than $1 per video. Tech companies are also quietly paying for content behind paywalls and login screens. The opaque AI data market is estimated to be around $2.5 billion and could grow to $30 billion in a decade. AI model makers are securing data-supply chains through deals with content owners and data brokers. Dedicated AI data firms are emerging, licensing real-world content and building networks of contract workers. Concerns are raised about user privacy and the risk of personal data being used in AI models without consent. Licensing old internet archives raises ethical issues.

#AiTrainingData #TechGiants #GenerativeAiModels #Photobucket #DataDeals #DataBrokers #UserPrivacy #EthicalConcerns

https://m.economictimes.com/tech/technology/inside-big-techs-underground-race-to-buy-ai-training-data/articleshow/109079254.cms