Put “Google” and “Reddit” in the same sentence and you’re bound to get a cacophony of sighs from those in the online publishing biz. Well, we might now hear more sighs from the average internet user, too, as it looks like Google’s the only search engine that can currently scrape Reddit to put new posts in its search results.
404 Media clocked on to this and yesterday pointed out that search engines other than Google, such as Bing and DuckDuckGo, aren’t showing any Reddit results from the last week in their search results. This does seem to be the case, and you can test it yourself by going to another search engine like DuckDuckGo, searching for “site:reddit.com” and setting it to only display results from the past week. As of the time of writing, no results come up for such a search on DuckDuckGo, but they do on Google.
This seems to be because of changes to Reddit’s robots.txt file. Robots.txt is a file that pretty much every website has which tells bots, such as search engine ones, which pages on the site they’re “disallowed” from scraping. In addition to preventing search engines from scraping some pages, this file has been useful for websites looking to prevent data being scraped for AI training by disallowing AI crawlers.
It looks like Reddit, however, has recently changed it to disallow any bot at all from scraping the website. You don’t have to take our word for this, either, you can check yourself by visiting https://www.reddit.com/robots.txt. The bottom couple of lines on the page essentially tell any bot that it’s not allowed to scrape any of Reddit’s pages. And if there’s no scraping, there’s no displaying in search results. That’s how search engines work—to simplify it, they scrape, they rank, and they display when users search for related terms.
But Google’s still managing to display new Reddit results in search results, which means it’s somehow able to access Reddit’s information despite the robots.txt disallow.
If we start to wonder whether the reason behind all this is to do with Google partnering with Reddit—a partnership that gives Google sole access to Reddit’s site content for AI training—we have the following reassurance.
Tim Rathschmidt, a spokesperson for Reddit, told The Verge “This is not at all related to our recent partnership with Google,” continuing, “We have been in discussions with multiple search engines. We have been unable to reach agreements with all of them, since some are unable or unwilling to make enforceable promises regarding their use of Reddit content, including their use for AI.”
(Image credit: Tayfun Coskun/Anadolu Agency via Getty Images)
However, to my ears, this makes it sound like this issue is only indirectly because of Reddit’s partnership with Google. If the partnership contract gives Google exclusive rights to Reddit’s data for AI training, then it would make sense that Reddit wouldn’t allow other search engines to scrape the website if it’s “unable to reach agreements.”
If anything, this explanation pushes things back a rung and makes me think there’s just one more mark against the original Google-Reddit partnership. This mark being that Reddit seemingly now can’t allow other search engines to scrape their site unless they’re willing to make “enforceable” promises about their use of Reddit content for AI. (Your guess is as good as mine on what “enforceable” means, here.)
Epic Games CEO Tim Sweeney says it’s “part of a disconcerting acceleration of monopolies expanding to further block competition and take from users.”
This is part of a disconcerting acceleration of monopolies expanding to further block competition and take from users. Search engines use to provide links to relevant content. Now they spam users with ads intermixed with scraped and content laundered by AI without attribution. https://t.co/xMvzir3DPAJuly 24, 2024
By the way, all this is months after when Google started pushing Reddit threads up in its search results rankings for various terms, a decision which Google’s Search Liaison Danny Sullivan explained on X (via SEO Roundtable) is because “actual searchers seem to like it. They proactively seek it out. It makes sense for us to be showing it to keep the search results relevant and satisfying for everyone.”
There’s been lots of talk in publishing land about how devastating this Reddit-boosting change has been or could be for some smaller publications and independent sites, but there hasn’t been much fuss from end-users. That might be because Google’s right and people do want Reddit high up in their search results.
Well, maybe that particular issue was mostly a problem for publishers, but this latest one is certainly more of an issue for end-users. That is, unless end-users are fine with ever-greater search monopolisation by Google.
Think of it this way: If Google’s right and end-users really do care so much about Reddit results in their searches, then it seems Google now has exclusive search engine access to one of the things its end-users most care about. Why would anyone go elsewhere?
None of this is even to mention the risks of the partnership already in place which allows Google exclusive access to Reddit for AI training, with Reddit being arguably one of the world’s biggest digital public squares. Regarding this, one can’t help but wonder what the US, UK, and EU governmental bodies who have just agreed to work to prevent monopoly in the AI industry make of it.