Chinese internet search provider Baidu has updated its Wikipedia-like encyclopedia service to prevent Google and Microsoft’s Bing from viewing its content.
This change was observed in the latest update to the Baidu Baike robots.txt file, which denies access to Googlebot and Bingbot crawlers.
According to the Wayback Machine, the change took effect on August 8. Previously, search engines Google and Bing were allowed to index Baidu Baike’s central repository, which contains around 30 million entries, although some targeted subdomains on the website were restricted.
Baidu’s move comes amid growing demand for large datasets used to train artificial intelligence models and applications. It follows similar steps taken by other companies to protect online content. In July, Reddit blocked various search engines, except Google, from indexing its posts and discussions. Like Reddit, Google has a financial deal with Reddit for access to data for training its AI services.
Microsoft last year considered restricting rival search engine operators’ access to internet search data, according to sources, a move that was most relevant for those who use the data for chatbots and generative AI services.
Meanwhile, the Chinese Wikipedia, with its 1.43 million entries, remains available to search engine crawlers. According to an investigation by the South China Morning Post, Baidu Baike entries still appear in search results on both Bing and Google. Search engines may be continuing to use old cached content.
The move comes as generative AI developers around the world are increasingly teaming up with content publishers to access the highest quality content for their projects. For example, more recently, OpenAI inked a deal with Time magazine to access the magazine’s entire archive dating back to the magazine’s first day over a century ago. A similar deal was also struck with the Financial Times in April.
Baidu’s decision to restrict access to Baidu Baipeda content from major search engines highlights the growing importance of data in the age of AI. As companies invest heavily in AI development, the value of large, curated datasets has increased significantly. This is changing how online platforms manage access to content, with many choosing to restrict access to data or monetize it.
As the AI industry continues to evolve, more companies will likely reevaluate their data-sharing policies, leading to further changes in how information is indexed and accessed on the internet.
(Photo by Kelly McClintock)
Read also: Google advances mobile AI with Pixel 9 smartphone
Want to learn more about AI and Big Data from industry leaders? Check out the AI & Big Data Expo in Amsterdam, California, and London. This comprehensive event will be held alongside other major events such as Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.
Find out about upcoming enterprise technology events and webinars hosted by TechForge here.