If you’ve been browsing the web and hit a 404 error page or an unexpected redirection, you’ve seen link rot in action. Over time, the links that hold the web together break, threatening our shared cultural history. Here’s a look at why that happens.
What Is Link Rot?
Link rot is when links in websites break over time, creating a broken or dead link. By “broken link,” we mean a link that no longer points to its intended target from when the link was first made. When you click one of these broken links, you get a 404 error or you see the wrong page or website.
Link Rot is common. A 2021 Harvard study examined hyperlinks in over 550,000 New York Times articles from 1996 to 2019 and found that 25% of links to specific pages were inaccessible, with the rate of decay growing dramatically depending on how old the links were (for example, about 6% of 2018 links were dead verses 72% of 1998 links). Another study found that out of a set of 360 links gathered in 1995, only 1.6% still worked in 2016.
Why Does Link Rot Happen?
The web is a fluid, decentralized medium with no centralized control, so content can become unavailable at any time without warning. Servers come and go, websites shut down, services migrate to new hosts, software gets updates, publications shift to new content management platforms and don’t migrate content, domains expire, and more.
There’s another related problem on the web called “content drift,” where the link remains functional but the content contained in the link has changed since the original link, which can cause trouble because the original author of the link intended to point to different information.
What’s So Bad About Losing Old Websites?
It’s the nature of the world that things decay and disappear. To keep information alive is an active process that takes time, energy, and effort. So the main problem with link rot is not necessarily that we need to store all information forever, but that electronic information and references have potentially become more fragile and vulnerable than paper ones primarily used in the past.
Many authors of journalistic articles, academic papers, and even court decisions use web links as a citation mechanism for providing vital sources of context to information presented. It’s been a problem with Wikipeda too. As Jonathan Zittrain explained in a 2021 article about link rot for The Atlantic, “Sourcing is the glue that holds humanity’s knowledge together. It’s what allows you to learn more about what’s only briefly mentioned in an article like this one, and for others to double-check the facts as I represent them to be.”
If links break and sources become unavailable, it’s much harder for a reader to judge whether the author has honestly and accurately represented the original source of information. And even beyond linking, some websites provide information online that can’t be found anywhere else. Losing those pages create gaps in humanity’s collective knowledge and holes in the fabric of our shared culture.
What’s the Solution to Link Rot?
Experts consider link rot and content drift to be endemic to the web as it is currently designed. That means it’s a part of the web’s fundamental nature that will not go away unless we try to actively correct or mitigate it.
One of the most effective solutions to the link rot problem so far emerged in 1996 with the Internet Archive, which has maintained a public archive of billions of websites for the past 25 years. If you find a broken link, visit the Internet Archive’s Wayback Machine and paste the link into its search bar. If the site has been captured, you’ll be able to browse the results. Or if the site recently went down, it may be possible to view the original content from a cached copy that Google stores.
Beyond the Internet Archive, a Harvard-led project called Perma.cc captures permanent versions of websites with the aim of long-term academic and legal citation. A consortium of libraries maintains the links, so they should stick around for a while. The goal is to create links that don’t rot—they should persist as long as the Perma.cc archive is maintained.
Other potential solutions to link rot are still on the bleeding edge, including potential Web 3.0 solutions and distributed data hosing thanks to protocols such as IPFS. Although ironically, hundreds of years from now, it’s possible that the only websites from this era that survive will be those that people printed out on paper. Stay safe out there!