Archive.org: Safe?

Here’s everything about archive.org being safe to use:

In general, archive.org is a safe site to use, and its online services are useful, interesting, and extremely unlikely to bring any harm to you.

Downloads are also safe.

Archive.org does not harbor malicious software and works hard to prevent any malicious code to live on the website.

It’s also legal to use.

So if you want to learn all about archive.org and why it’s safe, then this article is for you.

Let’s get started!

Archive.org: Safe? (What About Downloads?)

What Is Archive.org?

Archive.org is a nonprofit project that tries to archive a lot of things on the internet. What does that mean?

It means that archive.org creates copies of websites, videos, books, and a whole lot more so that people can browse these things as they choose.

That’s why the organization refers to itself as a digital library.

Archive.org was originally founded in 2001.

According to the website, it has archived over 681 billion web pages.

You can browse through specific websites or look through massive libraries of internet data at this website.

One particular point of note with archive.org is the Wayback Machine.

This is the nickname for the massive archive of websites you can find at archive.org.

You can view websites that are no longer in existence.

You can also see previous versions of websites.

This allows you to see how things have changed over time, and you can even view sites and stories from before corrections or changes were made.

The point of the project is to preserve information, and that’s exactly what it does.

Overall, it’s a fairly massive project, and it does all of this without any subscription fees, upfront payments, or even advertisements. It truly is free to use.

How Does Archive.org Work? (3 Ways)

If that sounds interesting, you might have some questions.

For instance, how can one organization back up the entire internet?

Isn’t that impossible?

Well, there are two answers to that question.

First, the entire internet isn’t backed up at archive.org.

I’ll explain this in more detail in a bit, but the archives only cover about 0.00002% of the internet.

So, that certainly makes things feel a little more attainable.

Perhaps more importantly, archive.org uses a few very clever techniques and systems that allow it to store so much information.

Add to that the substantial investments made by the organization, and you have a project that very much does what it claims to do.

#1 Web Crawlers

One of the most important tools used by archive.org is web crawlers.

These are bits of automated software that go from website to website looking at and collecting data.

Google uses web crawlers to help build its search engine databases.

Archive.com uses them to find and catalog sites across the internet.

This means that the web crawlers are building the archives even without the participation of website owners.

Despite that, website owners can participate in two important ways. If they want, they can manually archive their website with archive.org.

That ensures that they will be included in the library, even if the web crawlers haven’t visited the site yet.

The other option is to not be archived.

Archive.org has made it so that any website that wants to be excluded from its archives can be with very little effort.

#2 Archiving

Speaking of archiving, this is where clever programming is important.

Archiving techniques are not exactly new, but they are clever, and they’re essential to how archive.org operates.

Here’s the gist.

With archiving techniques, you can store data more efficiently.

Instead of storing every single one and zero found on every single website in the database, the archive can instead store specific information that allows it to reconstruct the website.

It means that archive.org is not exactly taking perfect snapshots of web pages. 

Instead, it’s looking at code and storing the instructions that would allow the site to be rebuilt, and it’s noting any differences that it finds as compared to other archives of the same website.

Ultimately, this allows archive.org to use substantially fewer data storage than would otherwise be necessary for such a project.

#3 Lots of Storage

Despite every effort to be efficient, archive.org still needs a lot of storage space to operate.

The organization has a dedicated data center (and possibly more than that) just for this purpose.

As of December 2020, archive.org was holding more than 70 petabytes of data.

That is millions of times more data than you will find on the average home computer.

At the same time, it’s nothing compared to the whole internet. The internet was estimated to consist of 40 zettabytes of data at the end of 2020.

A zettabyte is 1 million times bigger than a petabyte, so as you can see, archive.org isn’t even scratching the surface of archiving the entire internet.

All of that said, archive.org is holding an absolutely monstrous amount of data, and you can see that for yourself by browsing the archives.

You won’t get through half of it in a lifetime.

Is Archive.org Safe?

But before you do, you’ll want a better answer to the original question.

Is archive.org safe? 

I haven’t really gone into the motivations or organizational structure of archive.org up to this point.

So, first I’ll revisit the short answer.

Yes, archive.org is safe.

When you think about how the site operates and why it exists, this isn’t a surprising answer.

The entire point of archive.org is to preserve information in the digital library.

The organization wants to make archives reliable and accessible so that anyone can access the library any time they want.

Considering that motivation, you can reasonably assume that safety is important to archive.org, and you would be right to do so.

Still, I’ll cover some of the specific safety concerns that often come up and why I still think this is a safe website.

Is It Safe to Archive Your Website on Archive.org?

If you archive your website with the Wayback Machine, will it cause any harm?

The simple answer is no.

This practice is completely safe.

For starters, you’re only uploading data when you archive your own website.

So, there’s nothing malicious to try to download in the first place.

There is one minor consideration.

If you do archive your website, the source code will become available to anyone using archive.org.

So, if you have trade secrets that might be attached to that source code, archiving could be a problem for you.

Still, you don’t have to worry that creating an archive will harm the website in any way.

You’re not actually making any changes to your own website. You’re simply uploading archive data to a third-party server.

Is It Safe to Download Content From Archive.org?

Uploading is fine, but what if you download a movie?

Couldn’t that cause problems?

Again, the answer is that archive.org is typically safe, but this safety rating isn’t quite as strong as the one above.

Here’s the deal.

Any user can potentially upload content to archive.org.

So, someone with malicious intent could upload something bad to the site.

Then, you could conceivably come across that malicious download, and you would be in trouble.

The site is designed to try to mitigate such things, but considering the massive volumes of data at play, it’s possible for something to slip through the cracks and give you a problem.

Is It Safe to Visit Old Websites on Archive.org?

The answer is yes once more.

You can visit pretty much any websites you want at archive.org, and the experience will be pretty safe.

Let me nip something in the bud here. 

There have been instances in the past where old websites hosted malicious code, and that code made its way into the Wayback Machine.

But, archive.org has long addressed these issues.

Today, I can’t name you any concrete examples of archived websites that could harm you.

It’s not something archive.org wants for users, and it’s not really the most efficient way for malicious parties to try to attack people anyway.

Archive.org is far from being the most visited website on the internet.

Let me put this another way.

Browsing sites on archive.org is not more dangerous than simply using Google.

Is Archive.org Legal?

This is another important safety question, and for archive.org, you’re once again in the clear.

The organization works hard to adhere to copyright laws.

So, if you want to download content from the library, it should be perfectly legal for you.

There are instances where people upload things that they don’t have the right to share.

When that happens, copyright owners can lodge complaints with archive.org, and the site will take down any forbidden content.

Because of this, you aren’t participating in any known illicit or illegal behavior when you use the site.

You’re downloading things that you can reasonably presume are legal to download.

Archive.org is sincerely dedicated to preserving the internet.

It’s not trying to break copyright laws, and if you download from the site, you can reasonably trust that it’s permitted.

Just browse a few movies, and you’ll get a better idea.

This is not the site where you go and find the latest blockbuster movie for download.

It’s a lot more like the videos you might find in a dusty corner of your public library.

Author

  • Theresa McDonough

    Tech entrepreneur and founder of Tech Medic, who has become a prominent advocate for the Right to Repair movement. She has testified before the US Federal Trade Commission and been featured on CBS Sunday Morning, helping influence change within the tech industry.