obstkel.com logo

10 Redshift Create Table Examples to make you an expert

redshift create table

In this blog post, let us look at some Redshift Create Table Examples, 10 to be exact !

However, before we get started, what exactly is Redshift?

Amazon Redshift is a cloud based data warehouse service by AWS. For those of us who have used PostgreSQL, you will be surprised to know that Redshift is built on PostgreSQL. However, the similarities end there. Redshift is designed specifically for Online Analytical Processing (OLAP) and is not meant to be used for Online Transaction Processing (OLTP) applications.

A table in Redshift is similar to a table in a relational database. However, before you get started, make sure you understand the data types in Redshift, usage and limitations.  With over 23 parameters, you can create tables with different levels of complexity. So, let’s get started!

    1. Redshift CREATE TEMP Table
    2. Create Table with COLUMN Defaults
    3. Create Table with IDENTITY
    4. Create Table with Redshift Sort Key
    5. Create Table with ENCODING
    6. Create Table with Distribution
    7. Create Table with PRIMARY KEY
    8. Redshift Create Table LIKE
    9. Create Table IF NOT EXISTS
    10. Redshift Create Table AS



Related post
: Which Redshift data types should I use ? 

1. Redshift Temp table

In Amazon Redshift, temp (temporary) tables are useful in data processing because they let you store and process intermediate results without saving the data.

These tables exist only for the duration of the session in which they were created. At the end of the session the table structure and the data are purged.

Prefixing the table name with “#creates a redshift temp table. You could also choose to use keywords TEMPORARY or TEMP instead of using the hash/pound sign.

Redshift temp tables get created in a separate session-specific schema and lasts only for the duration of the session. For this reason, you can name a temporary table the same as a permanent table and still not generate any errors. 

The table below lists the Redshift Create temp table syntax in a database. At a minimum, parameters table_name, column_name and data_type are required to define a temp table.

Example1: Using hashtag (#) to create a Redshift temp table
CREATE TABLE #employees
(
employee_id      integer(30),
first_name       varchar(30),
last_name        varchar(30),
email            varchar(60),
phone            varchar(15),
hire_date        date
) ;
Example2: Using keyword TEMPOARY to create a Redshift temp table
CREATE TEMPORARY TABLE employees
(
employee_id      integer(30),
first_name       varchar(30),
last_name        varchar(30),
email            varchar(60),
phone            varchar(15),
hire_date        date
) ;
Example3: Using keyword TEMP to create a Redshift temp table
CREATE TEMP TABLE employees
(
employee_id      integer(30),
first_name       varchar(30),
last_name        varchar(30),
email            varchar(60),
phone            varchar(15),
hire_date        date
) ;

2. Create Table With COLUMN Defaults

If you don’t already, then get in the habit of setting Column Defaults during table creation. It is a good practice and avoids unnecessary NULL issues during analytics.

In Redshift, column defaults are set using the keyword DEFAULT, followed by the expression default_expr. The data type of default_expr has to match the column data type.


For example, the default for a DATE datatype should be a date value. Likewise, the default for an INTEGER should be a numeric value. 

The table below provides an example for creating a Redshift table with COLUMN defaults. At a minimum, parameters table_name, column_namedata_type, DEFAULT  and default_expr are required.

CREATE TABLE employees
(
employee_id      integer(30),
first_name       varchar(30) default 'John',
last_name        varchar(30) default 'Doe',
email            varchar(60) default 'john.doe@xyz.com',
phone            varchar(15) default '000-000-0000',
hire_date        date        default '1901-01-01',
sales_id         integer      default 0
);

3. Create Table With IDENTITY

An IDENTITY column in Redshift is similar to a KEY field in a relational database. It is used to auto generate unique values for a column.


The syntax for defining one is IDENTITY (seed, step), where seed is the starting value and step is the value to increment by. In order to define a column as an IDENTITY column, the data type has to be an INT or BIGINT.

The table below provides an example for creating a Redshift table with an IDENTITY Column. At a minimum, parameters table_name, column_namedata_type and keyword IDENTITY is required.


Related: Master the frequently used Redshift data functions by developers.

CREATE TABLE employees
(
employee_id      int(30)      identity(0,1),
first_name       varchar(30) default 'John',
last_name        varchar(30) default 'Doe',
email            varchar(60) default 'john.doe@xyz.com',
phone            varchar(15) default '000-000-0000',
hire_date        date        default '1901-01-01',
sales_id         integer      default 0
);

4. Create Table With Redshift Sort Key (SORTKEY)

Unlike relational databases, data in a Redshift table is stored in sorted order. The order of sort is determined by setting one or more columns in a table as the sort key. This architecture helps improve SQL query performance in Redshift.

A Redshift Sort Key (SORTKEY) can be set at the column level, or at the table level.

When a SORT key is set at the column level, it is called a single-column sort key. The syntax for setting such as a column is by appending the keyword “SORTKEY” at the end of the column name. Example 1 below illustrates how to do this.

 

A table level Redshift Sort Key (SORTKEY) is a bit trickier than a column level sort key. You have to choose a between a Redshift Compound SORTKEY or a Redshift Interleaved SORTKEY.

A Compound SORTKEY is a combination of multiple columns in sorted order. The order in which the columns are listed in the SORTKEY determines the performance benefit from a Compound SORTKEY.

The syntax for a Compound SORTKEY is “compound sortkey (column1, column2…)”.

Use a Compound SORTKEY if your SQL query depends on where clauses from multiple sorted fields. Example 2 below illustrates how to do this.

A Redshift Interleaved SORTKEY on the other hand is a collection of multiple columns in a table, with each column assigned equal weight regardless of the order in which they are listed. 

Keep the following points in mind when choosing an interleaved sort key in Redshift.

  • You have select queries filtering on multiple columns from a table ( example: where first_name like ‘Joh%’ and last_name like ‘Sm%’ ).

  • If you have multiple select queries filtering on single columns from a table ( example: where first_name like ‘Joh%’ , where last_name like ‘Sm%’ ).

  • You want a maximum of 8 columns to be part of the key.

  • The table stores large amounts of data ( 1 MB or more ).

  • Limited number of sort/merge operations required.

The syntax for an INTERLEAVED SORTKEY is “interleaved sortkey (column1,column2…)”. Example 3 below illustrates how to do this.

5. Redshift Create Table with ENCODING

Data Compression in Redshift helps reduce storage requirements and increases SQL query performance. In AWS Redshift, Compression is set at the column level. This allows more space in memory to be allocated for data analysis during SQL query execution.

Redshift currently supports eight column level compression encodings: Raw, Byte dictionary, Delta, LZO, Mostlyn, Run-length, Text and Zstandard.

Redshift recommends using Automatic Compression instead of manually setting Compression Encodings for columns. Automatic Compression can only be set when data is loaded into an empty table. You do this using the COPY command.

This does not mean you cannot set Automatic Compression on a table with data in it. You have one of two options.

  • Run an ANALYZE COMPRESSION command, which produces a compression analysis report for each column. Use this report to then manually set the Compression Encodings.

  • Create a backup of the existing table in your database using the CTAS command or in S3. Delete all the rows from the existing table using the TRUNCATE command. Run the COPY command to load the data from the backup table or backup S3 file.
CREATE TABLE employees
(
employee_id      int(30)      sortkey,
first_name       varchar(30) default 'John',
last_name        varchar(30) default 'Doe',
email            varchar(60) default 'john.doe@xyz.com',
phone            varchar(15) default '000-000-0000',
hire_date        date        default '1901-01-01',
sales_id         integer    encode mostly16
);

6. Create Table With Distribution

In Redshift, Distribution style defines how data is allocated across the compute nodes in a cluster. Data distribution across the compute nodes plays a key role in determining storage utilization, query and overall system performance. 

Redshift has 4 data distribution styles: AUTO, EVEN, KEY and ALL.

  • AUTO is the default distribution style in Redshift. When the distribution style is set to AUTO, Redshift assigns EVEN or ALL depending on the table data size.

  • In EVEN distribution, rows are distributed across the node slices in a round robin fashion. Use EVEN if you do not plan on joining tables.

  • In a KEY distribution style, one of the columns in the table is set as the distribution key (DISTKEY). The rows in the table are then distributed based on this column. Key distribution is great if you plan on joining tables.

  • In an ALL distribution style, all rows of a table are copied to every single compute node. ALL distribution increases the storage requirements for a table, and also slows down DML queries. The best use case for an ALL distribution style is when tables are not updated frequently.


The keyword and syntax for setting  distribution styles are listed below.

7. Redshift Create Table with PRIMARY KEY

In this Redshift Create table example, let us look at how to create one with a Primary Key.

Redshift Primary Key uniquely identifies all the records in a relational database. In a Redshift table, Primary Key constraints are for informational purposes only; they are not enforced. 

Primary Key constraints can be set at the column level or at the table level. If you need a single column to be set as a Primary Key, then set it at the column level. On the other hand, if you need multiple columns to form a composite Primary Key, set it at the table level. 

Listed below is a Redshift Create table example with PRIMARY KEY constraints set at the column level and field level.

8. Redshift Create Table LIKE

The Redshift Create Table LIKE clause is not the same as the Create Table AS clause. This is a very important difference you have to remember!

What is the difference you ask?

The Create table LIKE clause is used to create a table based on the structure of a parent table, whereas the Create table AS (CTAS) clause is used to create a table based on a SQL query.

Use the Redshift Create Table LIKE clause when you want to create a table that inherits attributes such as Distribution Style, Sort Keys, NULL and Backup.

However, the table you created does not inherit any PRIMARY Key and FOREIGN Key constraints.

The syntax for creating a table using the LIKE clause is –  “LIKE parent_table [ { INCLUDING | EXCLUDING } DEFAULTS ] “. At a minimum, the LIKE keyword and parent_table need to be specified. If you want the column level default values copied over from the parent table, then add the INCLUDING DEFAULTS clause. 

Listed  below is a Redshift Create Table example with the LIKE clause. In this example, we are creating a backup of the parent table  employees including its defaults. The child table is named employees_bkp.

CREATE TABLE employees_bkp (LIKE employees INCLUDING DEFAULTS);

Specifying “INCLUDE DEFAULTS” in your create table clause inherits the column level  defaults.

If you want to Create a table without the column defaults, i.e., you want to set NULL defaults, you could use “EXCLUDING DEFAULTS” or skip the statement as shown below

CREATE TABLE employees_bkp (LIKE employees);
or
CREATE TABLE employees_bkp (LIKE employees EXCLUDING DEFAULTS);

9. Create Table IF NOT EXISTS

In Redshift you can add a parameter to your Create table statement to check if the table already exists. If it does, then the statement does not create a table and returns a message instead of the standard abort error.

You add the “IF NOT EXISTS” clause to the create table statement to enable this check. Keep in mind, this check does not validate the table structure, rather just the table name.

For example, if you had a table “STUDENTS” in Redshift with columns FIRST_NAME, LAST_NAME, ADDRESS, ZIPCODE, PHONE and you were trying to create a new table with the same name “STUDENTS” but with columns NAME, STREET, POSTAL, CONTACT, the new table creation would fail and return a message.

CREATE TABLE IF NOT EXISTS employees
(
employee_id      int(30)      sortkey,
first_name       varchar(30) default 'John',
last_name        varchar(30) default 'Doe',
email            varchar(60) default 'john.doe@xyz.com',
phone            varchar(15) default '000-000-0000',
hire_date        date        default  '1901-01-01',
sales_id         integer   
);

10. Redshift Create Table as Select

In Redshift, Create Table As (CTAS) statements are used to create tables from the results of select queries. The CTAS statement inherits the table structure and the data from the SQL query.

However, default values, identity columns and constraints are not inherited.  Distribution Keys and Sort Keys get allocated based on the SQL query plan.

Listed below are two examples using CTAS. The first one (EXAMPLE 1) is based on a simple select statement and the second one (EXAMPLE 2) explicitly sets the Sort Keys and Distribution Keys.

Summing things up

Hopefully these Redshift Create Table examples helped you get a better understanding of the syntax and formatting. 

A key area where most of us run into issues is when loading data into Redshift tables. Often times the data needs to be prepared before loading them into tables. 

A good rule of thumb is to stage your data in Amazon S3, transform it using a tool like AWS Athena and then load them into Redshift. If you are new to Athena, we have an excellent blog post on Amazon Athena SQL.

Related: Redshift Coalesce: One for all and all for NULL

Table of Contents

Amazon Redshift helpful links

Amazon Redshift Documentation

This is the latest version of Amazon Redshift Documentation

Amazon Redshift & Analytics

Another great blog post by Nick Corbett, AWS Professional Services on Agile Analytics with Amazon Redshift

Building a Data Warehouse

This is an excellent blog post from AWS by Stephen Borg on Building a Data Warehouse using Amazon Redshift

Interested in our services ?

email us at : info@obstkel.com

Copyright 2022 © OBSTKEL LLC. All rights Reserved