In this post, we look at Redshift vs RDS and the 6 significant differences between the two.
To quote Lao Tzu, “A journey of a thousand miles, starts with a single step”. Similarly, implementing the right AWS services requires knowing that first step – choosing the right database service. AWS currently offers 7 different flavors of databases, ranging from relational, document, NoSQL, in-memory, graph and data warehouse. Redshift is Amazon’s data warehouse offering and is a very powerful petabyte scale relational database. This relational status of Redshift can make it a very tempting choice for transactional processing.
If you would like to test your understanding of Redshift, try our Amazon Sagemaker generated quiz.
Amazon Redshift is a Relational Database Management system (RDBMS) based on PostgreSQL 8.0.2. However, it not meant to be used as a relational database for Online Transaction Processing (OLTP). Redshift supports relational database like features, but its architectural components are optimized for analysis on large datasets, i.e., OLAP.
Amazon RDS on the other hand is a Relational Database service offering meant for Online Transaction Processing (OLTP), with database engines from Aurora, MySQL, MariaDB, PostgreSQL, Oracle Database and SQL Server. Except for Aurora, all other database engines are offered by third party vendors.
Redshift is a Fully Managed Service, which means management tasks such as hardware provisioning, software patching, setup, configuration, monitoring, failure recovery, and backups are all automatically handled for you.
Amazon RDS is a mix of Managed and Fully Managed Services. Aurora, the relational database by Amazon is fully managed, while the rest of the third-party database engines are Managed Services. As part of the managed services, RDS handles infrastructure capacity provisioning, backups, patching and data replication. All other management tasks need to be handled by an administrator.
Redshift uses Columnar model for data storage. In Columnar storage, data is stored and retrieved in columns as opposed to rows. This reduces the storage space required and increases data retrieval efficiency. In addition, since a column stores data of the same type, compression encodings specific to the data type can be applied further reducing disk space consumption and I/O.
Amazon RDS uses Row-Wise data storage. In this model, data is stored one row at a time, potentially across multiple data blocks. This is the typical storage model for relational databases. However, this is not a design flaw, rather meant to increase the efficiency of read and write transactions. RDS is meant for transaction processing systems.
Amazon Redshift sorts the data before storing it in a table. Sort Keys enable efficient use of range range-restricted predicates, which basically means more efficient table scans and faster return of query results. There are two types of sort keys in Redshift: Compound and Interleaved. For details on how to create tables in Redshift, click here.
Amazon RDS does not have the concept of Sort Keys.
Amazon Redshift does not enforce primary key, foreign key and unique constraints. You could define these constraints if they exist in the transactional database. In Redshift these constraints will be used for informational purposes only; however, they do get used by the query optimizer to create optimized queries.
All relational database constraints are enforced in Amazon RDS. However these are enforced based on the individual database engine rules.
Amazon Redshift does not support Read Replicas.
Read replica is a feature of Amazon RDS, where a copy of the source database instance is created in the same or separate AWS Region. This replicated instance is then asynchronously updated from the source database instance. This Read Replica instance only supports reads. Having a Read Replica enables better scaling, disaster recovery, reporting and data warehousing capabilities.
Currently Maria DB, MySQL, Oracle and PostgrSQL are the only database engines that support Read Replica in RDS.