October 15, 2021 | Analytics, Data Science, Featured

    Leveraging AWS Redshift for Data Warehousing

    (This blog is part three of a six-part series on how HelioCampus uses AWS to support our data analytics platform)

    Amazon Redshift is a scalable, fully managed data warehouse service in the cloud, allowing fast query performance on gigabytes to petabytes of data, and from a few users to thousands of users. It achieves this level of performance using a massive parallel processing architecture, where a leader node orchestrates data distribution and query execution over a series of compute nodes, which together form a data warehouse cluster. Client applications simply connect to the data warehouse cluster using standard SQL clients and drivers, while AWS manages the optimization and storage in the backend. Each compute node has its own dedicated CPU, memory and storage. As your workload grows, you can increase both compute and storage capacity using either a few simple clicks or programmatically through Redshift APIs.

    As opposed to traditional database systems which perform row-based storage, Redshift stores the data in a columnar fashion. This results in significantly higher compression of data, which combined with the MPP architecture, results in faster retrieval of massive amount of data.

    Compared to other competitors, out of the box AWS provides significantly higher performance measured in Price per TB per run. Tuned query price performance can even be improved further (even as the data grows) with simple performance tuning techniques such as materialized views for repeated queries, concurrency scaling to allow dynamic allocation of compute capacity for peak schedule jobs, and making good warehouse design choices with proper data distribution keys and data sort keys. The below charts provide benchmarking results of a cloud data warehouse model derived from the Transaction Processing Council’s benchmark methodology TPC Benchmark DS (TPC-DS) (http://tpc.org/tpcds/default5.asp).

    At HelioCampus, we have deployed more than 25 (and growing) production level Redshift clusters that power up the data analytics platform for our clients.

    Redshift is just one of the AWS tools that HelioCampus uses to support data analytics. Continue to check our blog periodically as we will be sharing more posts on additional tools in the future.

    Related Posts