What makes a “good” generation AI product? How do one generation AI product distinguish between another generation product? As more AI products come into the market and more competition increases, these are important questions that many people are asking. The answers to these questions will depend on the quality of the content used to train AI models that enhance these generated AI products.
For an AI model to produce the desired output, it must first train the model with a large, comprehensive, accurate, reliable content set. Many models are trained using content available on the Internet, but are not necessarily unexamined. This can lead to product development problems. For example, if multiple AI models are trained with the same readily available content, the algorithms for these AI models may start to produce comparable inferences and produce similar or identical outputs; The AI model and output are no longer unique and therefore less valuable to the end user.
Furthermore, using content for training that has not been evaluated in any way about its origin can create risk from many angles. For example, does using certain content for AI training involve third party intellectual property rights? These complex IP issues are currently being considered by the courts. Also, what about the issues with content bias and other inaccuracies, and how is the risk explained?
This has opened the door to a growing licensing market where AI developers can source licensed content for training from more reliable sources. In this overcrowded AI product market, AI developers not only aim to mitigate risk during the training process, but also to distinguish AI models from peer technology. By training AI models with different content and data, we aim to create algorithms that produce unique and more valuable outputs by training data that are likely not in use.
At the same time, content creators and owners are exploring ways to monetize content on the AI marketplace, restricting the use of intellectual property to train AI models without explicit permission.
High quality content is usually premium, and the AI training space is no exception. The relatively untapped quality content source that drives the goals of both AI developers and content creators and owners is B-roll and other archival footage. Broll and archived content provide new revenue streams to content owners who previously had no way of monetizing this content. Also, for AI developers, this content provides a large cache of potentially unique training materials that are less risky.
Content creators on platforms such as YouTube, Instagram and Tiktok have already joined this valuable market by selling unused video content to AI developers. According to Bloomberg News, each of these transactions wins thousands of dollars, with AI developers paying between $1-4 per minute. The value of this footage is determined by a variety of factors, including the quality of the video, whether it is unique or how easily it is available to other AI developers.
This licensing model offers benefits to both content owners and AI developer licensees. Meanwhile, content owners can now monetize unused or underutilized assets while maintaining control over their content through selective licensing. They may negotiate contractual provisions that limit use cases and protect against unnecessary use or reproduction of Premium Content. For example, content creators may try to limit the use of content in AI models that are used to harm their business or to drive the business of competitors forward.
On the other hand, for AI developers, this licensed content offers four important benefits: The first is differentiating from competitors through exclusive, private training content from landscapes where many AI models are trained with the same public data. This can provide a competitive advantage for AI developers looking to create the next cutting-edge AI model.
The second is to maintain quality assurance using human-generated content. This helps avoid the growing problems of AI models that reduce output quality and originality, as AI models are trained with content previously generated by AI, .
The third is mitigating potential security risks through content with known data sources. This is a detailed record of the individuals involved in the origin of the data, the creation process, transformation, and manipulation of the data. This helps prevent malicious training data from eroding your AI models.
Fourth, licensing content for training provides considerable certainty regarding intellectual property and other third party risk concerns. These benefits from exclusive content are offered at a price to increase the cost of development and training.
The one who’s been making headlines recently is DeepSeek. This is a Chinese AI model similar to ChatGpt, which has become one of the most downloaded apps in the world. Like other AI models on the market, DeepSeek has given its own name because of its low development and training costs. However, there is speculation that DeepSeek was trained using distillation. This is a fairly common training technique where the data is extracted from larger, more capable AI models and allows the model to be trained at a fraction of the cost. Despite DeepSeek’s popularity in the AI industry, distillation practices emphasize the value of searching for exclusive content and data for training purposes.
This newly developed B-roll licensing market represents a unique, common foundation for AI developers and content owners to create mutually beneficial relationships. To take advantage of this opportunity, content creators and owners develop procedures to review and identify appropriate assets, incorporate clear license restrictions to protect premium content, and provide appropriate rights clearance. It is necessary to ensure and establish a pricing strategy based on factors such as exclusivity, quality, type and more. of content.
Meanwhile, AI developers manage due diligence procedures for content validation and quality assessment, mechanisms for handling potential content replication of output, and scalable to protect content by managing multiple license relationships. We need to focus on establishing a data management system. If both sides of the equation are able to carry out these steps, it could mean enhancing an environment where both content owners and AI developers can thrive.
Adrian Perry and Robin Polaschuk are partners of Covington & Burling and its Entertainment and Media Industry Group.