(This blog is part two of a six-part series on how HelioCampus uses AWS to support our data analytics platform)
Today’s data processes can generate massive amounts of data at various points in the data flow. Organizing this data can be a challenge without having an easy way to store it in the cloud. Amazon’s Simple Storage Service (Amazon S3) provides organizations of different sizes the ability to store data in a cost efficient, secure, scalable way, without compromising on the performance of data retrieval.
Having Amazon S3 for data storage allows us to build a data lake to retrieve and store copies of data, develop efficient processes to load our data warehouse, and have the ability to archive frozen copies of data. It even allows us to provide our internal teams and our customers with an easy and secure way to load their own data from any location, in order to overlay it with our data models.
Amazon S3 stores data as objects, instead of being stored as bits and bytes. An object is a file along with any optional metadata describing the file. These objects are stored within a resource called a “bucket,” sort of like a folder in the cloud. You can decide which region AWS S3 stores your bucket, and provision who gets access to your bucket. Access can be provided to users, groups or roles, who can access them either using long-term access keys, or temporary security credentials (more on this later). Having an object’s metadata, along with buckets, allows for various critical capabilities such as version control of objects, tagging objects for cost allocation, choosing encryption levels, controlling and logging access, and hosting static websites.
The cost of storing data in S3 depends on how much of data you have and how frequently you want to access it. Depending on performance, and how much you want to spend, there are 6 available classes of storage in S3. The more data you access, and the more frequently you access it, the more it will cost. For a small-tiering fee, you can even let AWS’ S3 Intelligent-Tiering automatically choose an optimal access tier for your objects by analyzing your access patterns.
This is just one of the AWS tools that HelioCampus uses to support data analytics. Continue to check our blog periodically as we will be sharing more posts on additional tools in the future.