Here’s everything about Amazon S3 having lost data permanently:
Yes, Amazon S3 has permanently lost data.
According to Amazon’s own numbers, the service likely loses small amounts of data every single day.
That said, S3 provides robust data protection that outperforms the vast majority of similar services, and the rate of data loss is exceedingly small for any user.
So if you want to learn all about Amazon’s S3 data safety, then you’re in the right place.
What Does Amazon Say About Amazon S3 Data Losses?
In the S3 FAQ, Amazon makes very strong statements about reliability and data protection.
Specifically, the company claims that S3 Glacier Deep Archive (an option for S3 data integrity) is 99.999999999% reliable for an entire year.
As Amazon puts it, “If you store 10,000,000 objects with Amazon S3, you can on average expect to incur a loss of a single object once every 10,000 years.”
That’s a bold claim, and it suggests that data loss on S3 is basically a non-issue.
Let’s look at this in another way.
How much data would S3 need to be holding to expect a single lost file each year?
That comes out to 100,000,000,000 (100 billion) objects.
That’s a lot of objects, and a whole lot more than any one user or company would store there.
But, if S3 has enough stuff stored, then data loss could become a regular object.
According to Jeff Bar (Vice President and Chief Evangelist at Amazon Web Services), S3 was storing 100 trillion objects in March of 2021.
That’s more than enough to expect some data loss every year.
In fact, that’s enough to expect a thousand lost objects every year across the entire system.
You could break that down to roughly three lost objects every day.
So, even using Amazon’s claimed metrics, S3 is definitely losing data, even permanently.
That said, when we put these numbers into perspective, S3 is offering extremely high levels of data protection.
Any one user storing fewer than 100 million objects can reasonably expect to lose absolutely no data through their lifetime.
What Do Amazon S3 Users Say?
But, all of that is only based on Amazon’s claims.
Do real-world applications, and use cases live up to that guarantee?
One way we can explore answers to this question is to look through user interactions and see if they have lost data.
With a deep dive into reviews and forums, you can find that this question has been posed plenty of times.
It would be an undertaking to fully examine every single review and post, but a cursory look reveals something significant.
Only one case of randomly lost data can readily be found using this search (see below).
You could potentially find more if you looked harder, but the fact that it takes so much effort is already telling.
If data loss was significantly more common than Amazon claims, you would probably see more complaints.
One Clear Example of Data Loss on Amazon S3
Let’s look at this one clear example.
This was posted by Scott Bonds on Quora quite some time ago.
According to this post, Amazon did, in fact, permanently lose some data.
The user was informed in December of 2012 that four files were lost.
Amazon cited a bug error, and of those four files, two were permanently lost while two were permanently truncated (lost some but not all data in each file).
This user claims to have hundreds of millions of files stored on S3, and this is the only overt case that is easy to find.
That suggests that by user review, Amazon is living up to its claims.
How Does Amazon S3 Protect Data?
S3 doesn’t really reinvent the wheel when it comes to data storage.
It adheres to the basic principle of redundancy.
Every file in the system is backed up on multiple devices.
So, if any one device has a problem, the others still contain the information.
When the failing device is replaced, the redundant devices rewrite the files to the new device.
So, in order for you to lose data, every device has to fail at the same time.
This is just the first step for S3.
You see, most data centers follow this same principle.
Because of that, most data centers only lose information when the entire center experiences a problem.
S3 has redundancy to counter this issue too.
Every file on S3 is backed up to multiple devices in different physical locations.
So, if there was a catastrophe at one data center and it lost everything, S3 would not permanently lose a single file that was in that data center because every last scrap of data is backed up in other centers.
It’s a level of robust design that smaller companies might not be able to afford.
Amazon, being such a massive organization, had the freedom to make these incredible investments in data protection, and that helped to establish S3 as one of the most reliable data storage options in the world.
Does Amazon Really Live Up to the Guarantee For Amazon S3?
Outside of user reviews, we can take a few more approaches to see if Amazon is keeping its promises relating to data integrity.
One thing we can explore is refunding related to AWS.
This is a little tricky.
Amazon is a publicly-traded company, so you can track a fair amount of money as it moves into and out of the company, but the average person is not privy to a detailed analysis of all refunds that occur within the organization.
What can be said is that S3 refunds are not significant enough to show up on public accounting records for Amazon.
The company does have a clear refund policy for S3, so if problems were prolific, it could lead to significant findings in Amazon’s financial reports.
Such findings aren’t in the reports, so we can reasonably conclude that S3 refunds are not commonplace.
Another approach that makes sense is to look at what kinds of major companies and data users put trust in S3.
Amazon provides a list of notable users, and they include companies like Sysco and GE Healthcare.
These are huge organizations with a lot at stake.
When you go through the whole list, it’s clear that companies with a lot to lose put trust in Amazon S3.
It indicates that Amazon is likely keeping its promises in terms of data integrity.
None of these methods alone can prove that S3 is as reliable as Amazon claims, but as you add more and more research to the pile, the conclusion doesn’t change.
What Are the Risks of Amazon S3? (2 Problems)
Despite the strong guarantees and apparent real-world capabilities of S3, it’s not a perfect system.
As you have seen, a few files are likely lost every single day.
On top of that, it’s possible to lose data without S3 really messing up.
So, what risks exist for your data that is stored on S3?
The one that causes files to go missing every day can be chalked up to software bugs.
S3 is a complicated system, and issues that arise within the software can and have lost files.
It’s worth considering, but as you saw in previous sections, the rate at which this happens is extremely low.
Any single file on S3 is quite safe from bugs.
There are two other potential problems: user deletion and “acts of God.”
#1 User Deletion
User deletion is easily the biggest risk to data stored on S3.
The data storage is a service, and you have the right to delete files that you store on S3.
That’s pretty normal.
But, if you delete something accidentally or later realize you should not have deleted it, it’s still gone.
S3 does offer some levels of data recovery, but they’re limited in this respect.
The system is designed to allow you to delete files as you see fit, and secretly backing them up would not serve that need very well.
Another common issue arises with large organizations.
When multiple people have access to the same S3 account, the risk of mistaken deletion goes up.
Miscommunication within the organization can lead to someone deleting files that should have been preserved.
It’s one of the most common ways that users lose data on S3, and it certainly happens more often than random bugs.
#2 “Acts of God”
The biggest fear some people have in regard to remote data storage is the larger-scale disaster.
This could be a natural disaster like a hurricane or earthquake.
It could also relate to manmade disasters like war or economic depression.
In any of these cases, S3 does become vulnerable, but it’s still a better system than most in this regard.
Because all data is backed up to multiple locations, a single natural disaster is unlikely to cause permanent data losses.
Losing a location could slow down S3 and make it harder to use for a while, but the data is still reliable in that scenario.
Even in the case of geopolitical issues, S3 servers exist on multiple continents.
So, the redundancy can still hold up against major fallout in a single server host country.
The only so-called act of God that represents a real risk to S3 data is Amazon canceling the service.
If the company chose to shut down the servers, then all data would be lost, but there is no reason to suspect that happening any time soon.
The servers have been live for roughly 15 years, and there are no announced plans to change that.