
In this blog post, let us look at some Redshift Create Table Examples, 10 to be exact !
However, before we get started, what exactly is Redshift?
Amazon Redshift is a cloud based data warehouse service by AWS. For those of us who have used PostgreSQL, you will be surprised to know that Redshift is built on PostgreSQL. However, the similarities end there. Redshift is designed specifically for Online Analytical Processing (OLAP) and is not meant to be used for Online Transaction Processing (OLTP) applications.
A table in Redshift is similar to a table in a relational database. However, before you get started, make sure you understand the data types in Redshift, usage and limitations.  With over 23 parameters, you can create tables with different levels of complexity. So, let’s get started!
Related post: Which Redshift data types should I use ? 
In Amazon Redshift, temp (temporary) tables are useful in data processing because they let you store and process intermediate results without saving the data. 
These tables exist only for the duration of the session in which they were created. At the end of the session the table structure and the data are purged.
Prefixing the table name with “#” creates a redshift temp table. You could also choose to use keywords TEMPORARY or TEMP instead of using the hash/pound sign. 
Redshift temp tables get created in a separate session-specific schema and lasts only for the duration of the session. For this reason, you can name a temporary table the same as a permanent table and still not generate any errors. 
The table below lists the Redshift Create temp table syntax in a database. At a minimum, parameters table_name, column_name and data_type are required to define a temp table.
Example1: Using hashtag (#) to create a Redshift temp table
CREATE TABLE #employees
(
employee_id integer(30),
first_name varchar(30),
last_name varchar(30),
email varchar(60),
phone varchar(15),
hire_date date
) ;
Example2: Using keyword TEMPOARY to create a Redshift temp table
CREATE TEMPORARY TABLE employees
(
employee_id integer(30),
first_name varchar(30),
last_name varchar(30),
email varchar(60),
phone varchar(15),
hire_date date
) ;
Example3: Using keyword TEMP to create a Redshift temp table
CREATE TEMP TABLE employees
(
employee_id integer(30),
first_name varchar(30),
last_name varchar(30),
email varchar(60),
phone varchar(15),
hire_date date
) ;
If you don’t already, then get in the habit of setting Column Defaults during table creation. It is a good practice and avoids unnecessary NULL issues during analytics.
In Redshift, column defaults are set using the keyword DEFAULT, followed by the expression default_expr. The data type of default_expr has to match the column data type. 
For example, the default for a DATE datatype should be a date value. Likewise, the default for an INTEGER should be a numeric value. 
The table below provides an example for creating a Redshift table with COLUMN defaults. At a minimum, parameters table_name, column_name, data_type, DEFAULT and default_expr are required.
CREATE TABLE employees
(
employee_id integer(30),
first_name varchar(30) default 'John',
last_name varchar(30) default 'Doe',
email varchar(60) default 'john.doe@xyz.com',
phone varchar(15) default '000-000-0000',
hire_date date default '1901-01-01',
sales_id integer default 0
);
An IDENTITY column in Redshift is similar to a KEY field in a relational database. It is used to auto generate unique values for a column. 
The syntax for defining one is IDENTITY (seed, step), where seed is the starting value and step is the value to increment by. In order to define a column as an IDENTITY column, the data type has to be an INT or BIGINT.
The table below provides an example for creating a Redshift table with an IDENTITY Column. At a minimum, parameters table_name, column_name, data_type and keyword IDENTITY is required.
Related: Master the frequently used Redshift data functions by developers.
CREATE TABLE employees
(
employee_id int(30) identity(0,1),
first_name varchar(30) default 'John',
last_name varchar(30) default 'Doe',
email varchar(60) default 'john.doe@xyz.com',
phone varchar(15) default '000-000-0000',
hire_date date default '1901-01-01',
sales_id integer default 0
);
Unlike relational databases, data in a Redshift table is stored in sorted order. The order of sort is determined by setting one or more columns in a table as the sort key. This architecture helps improve SQL query performance in Redshift.
A Redshift Sort Key (SORTKEY) can be set at the column level, or at the table level.
When a SORT key is set at the column level, it is called a single-column sort key. The syntax for setting such as a column is by appending the keyword “SORTKEY” at the end of the column name. Example 1 below illustrates how to do this.
CREATE TABLE employees
(
employee_id int(30) sortkey,
first_name varchar(30) default 'John',
last_name varchar(30) default 'Doe',
email varchar(60) default 'john.doe@xyz.com',
phone varchar(15) default '000-000-0000',
hire_date date default '1901-01-01',
sales_id integer default 0
) ;
A table level Redshift Sort Key (SORTKEY) is a bit trickier than a column level sort key. You have to choose a between a Redshift Compound SORTKEY or a Redshift Interleaved SORTKEY.
A Compound SORTKEY is a combination of multiple columns in sorted order. The order in which the columns are listed in the SORTKEY determines the performance benefit from a Compound SORTKEY. 
The syntax for a Compound SORTKEY is “compound sortkey (column1, column2…)”. 
Use a Compound SORTKEY if your SQL query depends on where clauses from multiple sorted fields. Example 2 below illustrates how to do this.
CREATE TABLE employees
(
employee_id int(30) identity(0,1),
first_name varchar(30) default 'John',
last_name varchar(30) default 'Doe',
email varchar(60) default 'john.doe@xyz.com',
phone varchar(15) default '000-000-0000',
hire_date date default '1901-01-01',
sales_id integer default 0
) compound sortkey (employee_id, last_name);
A Redshift Interleaved SORTKEY on the other hand is a collection of multiple columns in a table, with each column assigned equal weight regardless of the order in which they are listed.  
Keep the following points in mind when choosing an interleaved sort key in Redshift.
The syntax for an INTERLEAVED SORTKEY is “interleaved sortkey (column1,column2…)”. Example 3 below illustrates how to do this.
CREATE TABLE employees
(
employee_id int(30) identity(0,1),
first_name varchar(30) default 'John',
last_name varchar(30) default 'Doe',
email varchar(60) default 'john.doe@xyz.com',
phone varchar(15) default '000-000-0000',
hire_date date default '1901-01-01',
sales_id integer default 0
) interleaved sortkey (hire_date, sales_id);
Data Compression in Redshift helps reduce storage requirements and increases SQL query performance. In AWS Redshift, Compression is set at the column level. This allows more space in memory to be allocated for data analysis during SQL query execution. 
Redshift currently supports eight column level compression encodings: Raw, Byte dictionary, Delta, LZO, Mostlyn, Run-length, Text and Zstandard.
Redshift recommends using Automatic Compression instead of manually setting Compression Encodings for columns. Automatic Compression can only be set when data is loaded into an empty table. You do this using the COPY command.
This does not mean you cannot set Automatic Compression on a table with data in it. You have one of two options.
CREATE TABLE employees
(
employee_id int(30) sortkey,
first_name varchar(30) default 'John',
last_name varchar(30) default 'Doe',
email varchar(60) default 'john.doe@xyz.com',
phone varchar(15) default '000-000-0000',
hire_date date default '1901-01-01',
sales_id integer encode mostly16
);
In Redshift, Distribution style defines how data is allocated across the compute nodes in a cluster. Data distribution across the compute nodes plays a key role in determining storage utilization, query and overall system performance.  
Redshift has 4 data distribution styles: AUTO, EVEN, KEY and ALL.
The keyword and syntax for setting distribution styles are listed below.
CREATE TABLE employees
(
employee_id int(30) sortkey,
first_name varchar(30) default 'John',
last_name varchar(30) default 'Doe',
email varchar(60) default 'john.doe@xyz.com',
phone varchar(15) default '000-000-0000',
hire_date date default '1901-01-01',
sales_id integer default 0
) diststyle AUTO;
CREATE TABLE employees
(
employee_id int(30) identity(0,1),
first_name varchar(30) default 'John',
last_name varchar(30) default 'Doe',
email varchar(60) default 'john.doe@xyz.com',
phone varchar(15) default '000-000-0000',
hire_date date default '1901-01-01',
sales_id integer default 0
) diststyle EVEN;
CREATE TABLE employees
(
employee_id int(30) identity(0,1),
first_name varchar(30) default 'John',
last_name varchar(30) default 'Doe',
email varchar(60) default 'john.doe@xyz.com',
phone varchar(15) default '000-000-0000',
hire_date date default '1901-01-01',
sales_id integer distkey
) diststyle KEY;
CREATE TABLE employees
(
employee_id int(30) identity(0,1),
first_name varchar(30) default 'John',
last_name varchar(30) default 'Doe',
email varchar(60) default 'john.doe@xyz.com',
phone varchar(15) default '000-000-0000',
hire_date date default '1901-01-01',
sales_id integer default 0
) diststyle ALL;
In this Redshift Create table example, let us look at how to create one with a Primary Key.
Redshift Primary Key uniquely identifies all the records in a relational database. In a Redshift table, Primary Key constraints are for informational purposes only; they are not enforced. 
Primary Key constraints can be set at the column level or at the table level. If you need a single column to be set as a Primary Key, then set it at the column level. On the other hand, if you need multiple columns to form a composite Primary Key, set it at the table level. 
Listed below is a Redshift Create table example with PRIMARY KEY constraints set at the column level and field level.
CREATE TABLE employees
(
employee_id int(30) PRIMARY KEY,
first_name varchar(30) default 'John',
last_name varchar(30) default 'Doe',
email varchar(60) default 'john.doe@xyz.com',
phone varchar(15) default '000-000-0000',
hire_date date default '1901-01-01',
sales_id integer default 0
);
CREATE TABLE employees
(
employee_id int(30),
first_name varchar(30) default 'John',
last_name varchar(30) default 'Doe',
email varchar(60) default 'john.doe@xyz.com',
phone varchar(15) default '000-000-0000',
hire_date date default '1901-01-01',
sales_id integer default 0,
primary key (last_name,hire_date)
);
The Redshift Create Table LIKE clause is not the same as the Create Table AS clause. This is a very important difference you have to remember!
What is the difference you ask?
The Create table LIKE clause is used to create a table based on the structure of a parent table, whereas the Create table AS (CTAS) clause is used to create a table based on a SQL query.
Use the Redshift Create Table LIKE clause when you want to create a table that inherits attributes such as Distribution Style, Sort Keys, NULL and Backup.
However, the table you created does not inherit any PRIMARY Key and FOREIGN Key constraints.
The syntax for creating a table using the LIKE clause is –  “LIKE parent_table [ { INCLUDING | EXCLUDING } DEFAULTS ] “. At a minimum, the LIKE keyword and parent_table need to be specified. If you want the column level default values copied over from the parent table, then add the INCLUDING DEFAULTS clause. 
Listed below is a Redshift Create Table example with the LIKE clause. In this example, we are creating a backup of the parent table employees including its defaults. The child table is named employees_bkp.
CREATE TABLE employees_bkp (LIKE employees INCLUDING DEFAULTS);
Specifying “INCLUDE DEFAULTS” in your create table clause inherits the column level defaults.
If you want to Create a table without the column defaults, i.e., you want to set NULL defaults, you could use “EXCLUDING DEFAULTS” or skip the statement as shown below
CREATE TABLE employees_bkp (LIKE employees);
or
CREATE TABLE employees_bkp (LIKE employees EXCLUDING DEFAULTS);
In Redshift you can add a parameter to your Create table statement to check if the table already exists. If it does, then the statement does not create a table and returns a message instead of the standard abort error. 
You add the “IF NOT EXISTS” clause to the create table statement to enable this check. Keep in mind, this check does not validate the table structure, rather just the table name.
For example, if you had a table “STUDENTS” in Redshift with columns FIRST_NAME, LAST_NAME, ADDRESS, ZIPCODE, PHONE and you were trying to create a new table with the same name “STUDENTS” but with columns NAME, STREET, POSTAL, CONTACT, the new table creation would fail and return a message.
CREATE TABLE IF NOT EXISTS employees
(
employee_id int(30) sortkey,
first_name varchar(30) default 'John',
last_name varchar(30) default 'Doe',
email varchar(60) default 'john.doe@xyz.com',
phone varchar(15) default '000-000-0000',
hire_date date default '1901-01-01',
sales_id integer
);
In Redshift, Create Table As (CTAS) statements are used to create tables from the results of select queries. The CTAS statement inherits the table structure and the data from the SQL query. 
However, default values, identity columns and constraints are not inherited.  Distribution Keys and Sort Keys get allocated based on the SQL query plan.
Listed below are two examples using CTAS. The first one (EXAMPLE 1) is based on a simple select statement and the second one (EXAMPLE 2) explicitly sets the Sort Keys and Distribution Keys.
CREATE TABLE employees_bkp AS Select * from employees;
CREATE TABLE employees_bkp
distkey(sales_id),
sortkey(employee_id)
AS
select * from employees;
Hopefully these Redshift Create Table examples helped you get a better understanding of the syntax and formatting.
A key area where most of us run into issues is when loading data into Redshift tables. Often times the data needs to be prepared before loading them into tables.
A good rule of thumb is to stage your data in Amazon S3, transform it using a tool like AWS Athena and then load them into Redshift. If you are new to Athena, we have an excellent blog post on Amazon Athena SQL.
Related: Redshift Coalesce: One for all and all for NULL
 This is the latest version of Amazon Redshift Documentation 
Another great blog post by Nick Corbett, AWS Professional Services on Agile Analytics with Amazon Redshift
This is an excellent blog post from AWS by Stephen Borg on Building a Data Warehouse using Amazon Redshift
| Cookie | Duration | Description | 
|---|---|---|
| cookielawinfo-checkbox-analytics | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics". | 
| cookielawinfo-checkbox-functional | 11 months | The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional". | 
| cookielawinfo-checkbox-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". | 
| cookielawinfo-checkbox-others | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other. | 
| cookielawinfo-checkbox-performance | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance". | 
| viewed_cookie_policy | 11 months | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |