10 Redshift Create Table Examples – updated for 2020

redshift create table image

In this blog post, lets look at some Redshift Create Table Examples. 10 to be exact ! A table in Redshift is similar to a table in a relational database. With over 23 parameters,  you can create tables with different levels of complexity.

In this post,  we look at 10 examples of Redshift Create Table statements.

1. Redshift Create TEMP table

You can create temporary tables using Redshift Create Table statements.  Temporary tables are useful in data processing because they let you store and process intermediate results without saving the data. These tables exist only for the duration of the session in which they were created. At the end of the session the table structure and the data is purged.

In Redshift, prefixing  a table with “#” creates a temporary table. Keywords TEMPORARY or TEMP can be used as well.This table is created in a separate session-specific schema and lasts only for the duration of the session. The temporary table can be named the same as a permanent table and will not generate any errors. 

The table below lists the syntax for creating a temporary table in a Redshift database. At a minimum, parameters table_name, column_name and data_type are required to define a temporary table.

CREATE TABLE #employees
(
employee_id      integer(30),
first_name       varchar(30),
last_name        varchar(30),
email            varchar(60),
phone            varchar(15),
hire_date        date
) ;

CREATE TEMPORARY TABLE employees
(
employee_id      integer(30),
first_name       varchar(30),
last_name        varchar(30),
email            varchar(60),
phone            varchar(15),
hire_date        date
) ;

CREATE TEMP TABLE employees
(
employee_id      integer(30),
first_name       varchar(30),
last_name        varchar(30),
email            varchar(60),
phone            varchar(15),
hire_date        date
) ;

2. Create Table With COLUMN Defaults

If you don’t already, then get in the habit of setting Column Defaults  during table creation. It is a good practice and  avoids unnecessary NULL issues during analytics.

In Redshift, Column Defaults are set using the keyword DEFAULT, followed by the expression default_expr. The data type of default_expr has to match the column data type. For example, the default for a DATE datatype should be a date value. Like wise, the default for an INTEGER should be a numeric value. 

The table below provides an example for creating a Redshift table with COLUMN defaults. At a minimum, parameters table_name, column_name, data_type, DEFAULT  and default_expr are required.

CREATE TABLE employees
(
employee_id      integer(30),
first_name       varchar(30) default ‘John’,
last_name        varchar(30) default ‘Doe’,
email            varchar(60) default ‘john.doe@xyz.com’,
phone            varchar(15) default ‘000-000-0000’,
hire_date        date        default ‘1901-01-01’,
sales_id         integer     default 0
);

3. Create Table With IDENTITY

An IDENTITY column in Redshift is similar to a KEY field in a relational database. It is used to auto generate unique values for a column. The syntax for defining one is IDENTITY(seed, step), where seed is the starting value and step is the value to increment by. In order to define a column as an IDENTITY column, the data type has to be an INT or BIGINT.

The table below provides an example for creating a Redshift table with an IDENTITY Column. At a minimum, parameters table_name, column_namedata_type and keyword  IDENTITY is required.

CREATE TABLE employees
(
employee_id      int(30)     identity(0,1),
first_name       varchar(30) default ‘John’,
last_name        varchar(30) default ‘Doe’,
email            varchar(60) default ‘john.doe@xyz.com’,
phone            varchar(15) default ‘000-000-0000’,
hire_date        date        default ‘1901-01-01’,
sales_id         integer     default 0
);

4. Create Table With SORTKEY

Unlike relational databases, data in a Redshift table is stored in sorted order. The order of sort is determined by setting one or more columns in a table as the sort key. This architecture helps improve SQL query performance.

A Redshift SORTKEY can be set at the column level, or at the table level. When a SORT key is set at the column level, it is called a single-column sort key. The syntax for setting such as a column is by appending the keyword “SORTKEY” at the end of the column name. Example 1 below illustrates how to do this.

A table level Redshift SORTKEY is a bit trickier than a column level sort key. There are two types of table level sort keys: Compound SORTKEY and Interleaved SORTKEY.

A Compound SORTKEY is a combination of multiple columns in sorted order. The order in which the columns are listed in the SORTKEY determines the performance benefit from a Compound SORTKEY. The syntax for a Compound SORTKEY is “compound sortkey (column1,column2…)”. Use a Compound SORTKEY if your SQL query depends on where clauses from multiple sorted fields. Example 2 below illustrates how to do this.

An Interleaved SORTKEY on the other hand is a collection of multiple columns in a table, with each column assigned equal weight.The syntax for an INTERLEAVED SORTKEY is “interleaved sortkey (column1,column2…)”. Use an Interleaved SORTKEY if your sql where clause goes against different column fields. Example 3 below illustrates how to do this.

CREATE TABLE employees
(
employee_id      int(30)     sortkey,
first_name       varchar(30) default ‘John’,
last_name        varchar(30) default ‘Doe’,
email            varchar(60) default ‘john.doe@xyz.com’,
phone            varchar(15) default ‘000-000-0000’,
hire_date        date        default ‘1901-01-01’,
sales_id         integer     default 0
) ;

CREATE TABLE employees
(
employee_id      int(30)     identity(0,1),
first_name       varchar(30) default ‘John’,
last_name        varchar(30) default ‘Doe’,
email            varchar(60) default ‘john.doe@xyz.com’,
phone            varchar(15) default ‘000-000-0000’,
hire_date        date        default ‘1901-01-01’,
sales_id         integer     default 0
) compound sortkey (employee_id, last_name);

CREATE TABLE employees
(
employee_id      int(30)     identity(0,1),
first_name       varchar(30) default ‘John’,
last_name        varchar(30) default ‘Doe’,
email            varchar(60) default ‘john.doe@xyz.com’,
phone            varchar(15) default ‘000-000-0000’,
hire_date        date        default ‘1901-01-01’,
sales_id         integer     default 0
) interleaved sortkey (hire_date, sales_id);

5. Create Table with ENCODING

Data Compression in Redshift helps reduce storage requirements and increases SQL query performance. In AWS Redshift, Compression is set at the column level. This allows more space in memory to be allocated for data analysis during SQL query execution. Redshift currently supports eight column level compression encodings: Raw, Byte dictionary, Delta, LZO, Mostlyn, Run-length, Text and Zstandard.

Redshift recommends using Automatic Compression instead of manually setting Compression Encodings for columns. Automatic Compression can only be set when data is loaded into an empty table. You do this using the COPY command.

This does not mean you cannot set Automatic Compression on a table with data in it. You have one of two options.

  • Run an ANALYZE COMPRESSION command, which produces a compression analysis report for each column. Use this report to then manually set the Compression Encodings.
  • Create a backup of the existing table in your database using the CTAS command or in S3. Delete all the rows from the existing table using the TRUNCATE command. Run the COPY command to load the data from the backup table or backup S3 file.

CREATE TABLE employees
(
employee_id      int(30)     sortkey,
first_name       varchar(30) default ‘John’,
last_name        varchar(30) default ‘Doe’,
email            varchar(60) default ‘john.doe@xyz.com’,
phone            varchar(15) default ‘000-000-0000’,
hire_date        date        default ‘1901-01-01’,
sales_id         integer    encode mostly16
);

6. Create Table With Distribution

In Redshift, Distribution style defines how data is allocated across the compute nodes in a cluster. Data distribution across the compute nodes plays a key role in determining storage utilization, query and overall system performance.  Redshift has 4 data distribution styles: AUTO, EVEN, KEY and ALL.

    • AUTO is the default distribution style in Redshift. When the distribution style is set to AUTO, Redshift assigns EVEN or ALL depending on the table data size.
    • In EVEN distribution, rows are distributed across the node slices in a round robin fashion. Use EVEN if you do not plan on joining tables.
    • In a KEY distribution style, one of the columns in the table is set as the distribution key (DISTKEY). The rows in the table are then distributed based on this column. Key distribution is great if you plan on joining tables.
    • In an ALL distribution style, all rows of a table are copied to every single compute node. ALL distribution increases the storage requirements for a table, and also slows down DML queries. The best use case for an ALL distribution style is when tables are not updated frequently.

The keyword and syntax for setting  distribution styles are listed below.

CREATE TABLE employees
(
employee_id      int(30)     sortkey,
first_name       varchar(30) default ‘John’,
last_name        varchar(30) default ‘Doe’,
email            varchar(60) default ‘john.doe@xyz.com’,
phone            varchar(15) default ‘000-000-0000’,
hire_date        date        default ‘1901-01-01’,
sales_id         integer     default 0
) diststyle AUTO;

CREATE TABLE employees
(
employee_id      int(30)     identity(0,1),
first_name       varchar(30) default ‘John’,
last_name        varchar(30) default ‘Doe’,
email            varchar(60) default ‘john.doe@xyz.com’,
phone            varchar(15) default ‘000-000-0000’,
hire_date        date        default ‘1901-01-01’,
sales_id         integer     default 0
) diststyle EVEN;

CREATE TABLE employees
(
employee_id      int(30)     identity(0,1),
first_name       varchar(30) default ‘John’,
last_name        varchar(30) default ‘Doe’,
email            varchar(60) default ‘john.doe@xyz.com’,
phone            varchar(15) default ‘000-000-0000’,
hire_date        date        default ‘1901-01-01’,
sales_id         integer    distkey
) diststyle KEY;

CREATE TABLE employees
(
employee_id      int(30)     identity(0,1),
first_name       varchar(30) default ‘John’,
last_name        varchar(30) default ‘Doe’,
email            varchar(60) default ‘john.doe@xyz.com’,
phone            varchar(15) default ‘000-000-0000’,
hire_date        date        default ‘1901-01-01’,
sales_id         integer     default 0
) diststyle ALL;

7. Create Table With PRIMARY KEY

A Primary Key uniquely identifies all the records in a relational database. In a Redshift table, Primary Key constraints are for informational purposes only; they are not enforced. 

Primary Key constraints can be set at the column level or at the table level. If you need a single column to be set as a Primary Key, then set it at the column level. On the other hand, if you need multiple columns to form a composite Primary Key, set it at the table level. 

Listed below is a Redshift Create table example with PRIMARY KEY constraints set at the column level and field level.

CREATE TABLE employees
(
employee_id      int(30)     PRIMARY KEY,
first_name       varchar(30) default ‘John’,
last_name        varchar(30) default ‘Doe’,
email            varchar(60) default ‘john.doe@xyz.com’,
phone            varchar(15) default ‘000-000-0000’,
hire_date        date        default ‘1901-01-01’,
sales_id         integer     default 0
);

CREATE TABLE employees
(
employee_id      int(30),
first_name       varchar(30) default ‘John’,
last_name        varchar(30) default ‘Doe’,
email            varchar(60) default ‘john.doe@xyz.com’,
phone            varchar(15) default ‘000-000-0000’,
hire_date        date        default ‘1901-01-01’,
sales_id         integer     default 0, 
primary key (last_name,hire_date)
);

8. Redshift Create Table LIKE

The LIKE clause in a Redshift Create table statement lets your create a table based on the structure of a parent table. When such a child table is created, it inherits some of the attributes like Distribution Style, Sort Keys, NULL and Backup. However, the child table does not inherit any PRIMARY Key and FOREIGN Key constraints.

The syntax for creating a table using the LIKE clause is “LIKE parent_table [ { INCLUDING | EXCLUDING } DEFAULTS ] “. At a minimum, the LIKE keyword and parent_table need to be specified. If you want the column level default values copied over from the parent table, then add the INCLUDING DEFAULTS clause. 

Listed  below is a Redshift Create Table example with the LIKE clause. In this example, we are creating a backup of the parent table  employees including its defaults. The child table is named employees_bkp.

CREATE TABLE employees_bkp (LIKE employees INCLUDING DEFAULTS);

9. Create Table IF NOT EXISTS

In Redshift you can add a parameter to your Create table statement to check if the table already exists. If it does, then the statement does not create a table and returns a message instead of the standard abort error. You add the “IF NOT EXISTS” clause to the create table statement to enable this check. Keep in mind, this check does not validate the table structure, rather just the table name.

For example, if you had a table “STUDENTS” in Redshift with columns FIRST_NAME, LAST_NAME, ADDRESS, ZIPCODE, PHONE and you were trying to create a new table with the same name “STUDENTS” but with columns NAME, STREET, POSTAL, CONTACT, the new table creation would fail and return a message. 

CREATE TABLE IF NOT EXISTS employees
(
employee_id      int(30)     sortkey,
first_name       varchar(30) default ‘John’,
last_name        varchar(30) default ‘Doe’,
email            varchar(60) default ‘john.doe@xyz.com’,
phone            varchar(15) default ‘000-000-0000’,
hire_date        date        default ‘1901-01-01’,
sales_id         integer   
);

10. Redshift Create Table AS

In Redshift, Create Table As (CTAS) statements are used to create tables from the results of select queries. The CTAS statement inherits the table structure and the data from the SQL query. However, default values, identity columns and constraints are not inherited.  Distribution Keys and Sort Keys get allocated based on the SQL query plan.

Listed below are two examples using CTAS. The first one (EXAMPLE 1) is based on a simple select statement and the second one (EXAMPLE 2) explicitly sets the Sort Keys and Distribution Keys.

CREATE TABLE employees_bkp AS Select * from employees;

CREATE TABLE employees_bkp
distkey(sales_id),
sortkey(employee_id)
AS
select * from employees;

Want to test your Redshift skills ?  Try a Quiz.

Amazon Redshift helpful links

Amazon Redshift Documentation

This is the latest version of Amazon Redshift Documentation

Amazon Redshift & Analytics

Another great blog post by Nick Corbett, AWS Professional Services on Agile Analytics with Amazon Redshift

Service Offerings by Obstkel

Get to know the AWS Cloud Services offered by Obstkel

Building a Data Warehouse

This is an excellent blog post from AWS by Stephen Borg on Building a Data Warehouse using Amazon Redshift

Email us at : info@obstkel.com

Copyright 2020 © OBSTKEL LLC. All rights Reserved.
Scroll to Top