As AI technologies like ChatGPT explode in popularity, a quiet battle is unfolding over who controls access to the data that feeds these models. Major media, publishing, and tech giants are taking steps to lock down their content, wary of it being used to train AI without permission or compensation.
This is being expressed by multiple data owners:
✺ News/Media: The New York Times, The Washington Post, CNN, Reuters, ABC, Bloomberg, etc. have blocked AI scrapers.
✺ Publishers: Hearst, Condé Nast, The Walt Disney Company
✺ Tech platforms: Reddit, Inc., Stack Overflow
✺ Creators: Fan fiction writers, photographers, artists.
The core argument is that their intellectual property and creative work have untapped value as AI training data. By restricting access, they can force licensing deals or collaboration with AI firms, ensuring that their IP and creativity is monetized.
The result may be SPIKING DATA COSTS for AI companies as quality sources dry up or charge fees. Startups with limited resources could struggle to access broad training data.
Of course, this will be passed down to the consumer via a direct cost or some form of harvesting the viewers' personal data.
Innovation Risk? We may see a consolidation of power around tech giants already amassed proprietary datasets.
Yet the trajectory of AI also depends on the pace of innovation in synthetic data and unsupervised training techniques. As models become more sample-efficient and robust, their reliance on huge datasets could diminish. More collaborative win-win relationships between AI firms and content creators may also emerge.
Either way, the era of freely scraping the internet for AI training looks to be ending. Data is becoming an increasingly precious asset. Those who control rich repositories of high-quality data are poised to shape the future development of AI.
The question is whether we end up with more closed, privatized AI systems or more creative solutions that allow these technologies to benefit all.
► What are your thoughts on this emerging tension between content creators and AI companies?
► Might this lead to a new model in which artists and creators can recognize more value by disintermedating the traditional players. (I believe Adobe is exploring options to support the creative community).
► What impact could rising data costs have on the speed of AI innovation?
I'd love to hear different perspectives of what you're thinking.
Comments