obstkel.com logo

Spark SQL Date Functions – Complete list with examples

spark sql date functons

In this post we will address Spark SQL Date Functions, its syntax and what it does. There are 28 Spark SQL Date functions, meant to address string to date, date to timestamp, timestamp to date, date additions, subtractions and current date conversions.

Spark SQL is the Apache Spark module for processing structured data. There are a couple of different ways to to execute Spark SQL queries.

  • API: When writing and executing Spark SQL from Scala, Java, Python or R, a SparkSession is still the entry point. Once a SparkSession has been established, a DataFrame or a Dataset needs to be created on the data before Spark SQL can be executed.

  • Spark SQL CLI: This Spark SQL Command Line interface is a lifesaver for writing and testing out SQL. However, the SQL is executed against Hive, so make sure test data exists in some capacity.

For experimenting with the various Spark SQL Date Functions, using the Spark SQL CLI is definitely the recommended approach. Listed below are 28 Spark Date functions as of Spark 3.0.0 documentation release.

1. add_months

Syntax: add_months(start_date, num_months)

What it does: Returns a date with months specified by num_months added.

SELECT add_months('2020-08-01',4);
Output from SQL statement: 2020-12-01

2. current_date

Syntax: current_date()

What it does: The Spark SQL current date function returns the date as of the beginning of your query execution.  There are two variations for the spark sql current date syntax. You can specify it with the parenthesis as current_date()or as current_date. They both return the current date in the default format ‘YYYY-MM-DD’.

SELECT current_date();
or
SELECT current_date;
Output from SQL statement: 2020-08-14

3. current_timestamp

Syntax: current_timestamp()

What it does: Returns the current timestamp.

SELECT current_timestamp();
Output from SQL statement: 2020-08-16 12:20:09.507

4. date

Syntax: date(expr)

What it does: Converts an expression in string format to a date.

SELECT date('2020-08-16');
Output from SQL statement: 2020-08-16

5. date_add

Syntax: date_add(start_date, num_days)

What it does: Returns the date plus number of days specified by integer num_days.

SELECT date_add('2020-08-16',5);
Output from SQL statement: 2020-08-21

6. date_sub

Syntax: date_sub(start_date, num_days)

What it does: Subtracts days specified by integer num_days from a given date.

SELECT date_sub('2020-08-16',5);
Output from SQL statement: 2020-08-11

7. date_format

Syntax: date_format(timestamp, fmt)

What it does: The Spark SQL date format function returns a given timestamp or date as a string, in the format specified. 


Example1: Return month from a given date using Spark date format function.

SELECT date_format('2020-08-15', "M");
  Output from SQL statement: 8


Example2: Return a date in string format YYYYMMDD using Spark SQL date format function.

SELECT date_format('2020-08-15', "yyyyMMDD");
  Output from SQL statement: 20200815


The table below lists all the date format elements for Spark SQL.

ElementDescription
aam-pm-of-day
Dday-of-year
dday-of-month
Eday-of-week
Fweek-of-month
Gera
hclock-hour-of-am-pm (1-12)
Hhour-of-day (0-23)
Khour-of-am-pm (0-11)
kclock-hour-of-day (1-24)
mminute-of-hour
M/Lmonth-of-year
Olocalized zone-offset
Q/qquarter-of-year
ssecond-of-minute
Sfraction-of-second
Vtime-zone ID
Xzone-offset ‘Z’ for zero
xzone-offset
yyear
ztime-zone name
Zzone-offset

8. date_trunc

Syntax: date_trunc(fmt,ts)

What it does: Returns a timestamp specified as (ts) truncated to the unit specified by format (fmt) [“YEAR”, “YYYY”, “YY”, “MON”, “MONTH”, “MM”, “DAY”, “DD”, “HOUR”, “MINUTE”, “SECOND”, “WEEK”, “QUARTER”].

SELECT date_trunc('YEAR', '2020-08-15T06:25:10.234'); 
Output from SQL statement: 2020-01-01 00:00:00

9. datediff

Syntax: datediff(endDate, startDate)

What it does: The Spark datediff  function returns the difference between two given dates, endDate and startDate

When using Spark datediff, make sure you specify the greater or max date first (endDate) followed by the lesser or minimum date (startDate). If not you will end up with a negative date.

Another point to keep in mind, DO NOT use the months_between function instead of the Spark SQL datediff function. The months_between function is calculated based on a 31 days per month assumption. The end result will not be the same!

SELECT date_trunc('YEAR', '2020-08-15T06:25:10.234'); 
Output from SQL statement: 2020-01-01 00:00:00

10. day

Syntax: day(date)

What it does: Returns the day from the date or timestamp. 

SELECT day('2020-08-15');
Output from SQL statement: 15

11. dayofmonth

Syntax: dayofmonth(date)

What it does: Returns the day from a given date or timestamp. This function is the same as the day function.

SELECT dayofmonth('2020-08-15');
Output from SQL statement: 15

12. dayofweek

Syntax: dayofweek(date)

What it does: Returns the day of week for a given date or timestamp. This can get a bit tricky if you don’t get the basics right. The 15th of September, 2020 for example falls on the third week of the month, and is the third day of the week.

SELECT dayofweek('2020-09-15');
Output from SQL statement: 3

13. dayofyear

Syntax: dayofyear(date)

What it does: Returns the day of year from the date or timestamp. Think 1-365.

SELECT dayofyear('2020-07-15');
Output from SQL statement: 228

14. month

Syntax: month(date)

What it does: Returns the month of the year from the date or timestamp.

SELECT month('2020-07-15');
Output from SQL statement: 7

15. minute

Syntax: minute(timestamp)

What it does: Returns the minute portion from a timestamp.

SELECT minute('2020-08-15 09:24:30');
Output from SQL statement: 24

16. months_between

Syntax: months_between(timestamp1, timestamp2)

What it does: Returns the number of months between the two timestamps based on 31 days in a month.

SELECT months_between('2020-05-01 09:35:02', '2020-01-01');
Output from SQL statement: 4

17. next_day

Syntax: next_day(start_date, day_of_week)

What it does: Returns the day after the start_date specified by day_of_week. Day of week can be specified as ‘MON’, ‘TUE’, ‘WED’, ‘THU’, ‘FRI’, ‘SAT’, ‘SUN’ or as ‘MO’, ‘TU’, ‘WE’, ‘TH’, ‘FR’, ‘SA’, ‘SU’.

SELECT next_day('2020-08-10', 'MON');
Output from SQL statement: 2020-08-17

18. now

Syntax: now()

What it does: Returns the current timestamp.

SELECT now();
Output from SQL statement: 2020-08-20 12:29:05.914

19. quarter

Syntax: quarter(date)

What it does: Given a date, this function returns the quarter in which the date falls.

SELECT quarter('2020-08-20');
Output from SQL statement: 3

20. second

Syntax: second(timestamp)

What it does: Given a timestamp, this function returns the second portion of it.

SELECT second('2020-08-20 10:18:25');
Output from SQL statement: 25

21. timestamp

Syntax: timestamp(expr)

What it does: Casts a given expression into a timestamp. This function works the same as to_timestamp.

SELECT timestamp('2020-08-15', 'yyyy-MM-dd');
Output from SQL statement: 2020-08-15 00:00:00

22. to_date

Syntax: to_date(date_str[, fmt])

What it does: Converts a string to date type in the format specified by fmt.

SELECT to_date('2020-08-14', 'yyyy-MM-dd');
Output from SQL statement: 2020-08-14

23. to_timestamp

Syntax: to_timestamp(timestamp[, fmt])

What it does: Converts a string to timestamp specified by fmt. For format syntax refer to the second tab named ” Format Patterns“.

ElementDescription
aam-pm-of-day
Dday-of-year
dday-of-month
Eday-of-week
Fweek-of-month
Gera
hclock-hour-of-am-pm (1-12)
Hhour-of-day (0-23)
Khour-of-am-pm (0-11)
kclock-hour-of-day (1-24)
mminute-of-hour
M/Lmonth-of-year
Olocalized zone-offset
Q/qquarter-of-year
ssecond-of-minute
Sfraction-of-second
Vtime-zone ID
Xzone-offset ‘Z’ for zero
xzone-offset
yyear
ztime-zone name
Zzone-offset
SELECT to_timestamp('2020-08-15', 'yyyy-MM-dd');
Output from SQL statement: 2020-08-15 00:00:00

24. to_unix_timestamp

Syntax: to_unix_timestamp(expr[, pattern])

What it does: Given a data or a timestamp, this function returns it as a UNIX timestamp. This function works the same as the to_unix_timestamp() function.

SELECT to_unix_timestamp('2020-08-15', 'yyyy-MM-dd');
Output from SQL statement: 1597937061

25. to_utc_timestamp

Syntax: to_utc_timestamp(timestamp, timezone)

What it does: Given a timestamp, this function returns it in the given time zone. For a list of timezones, please refer to tz database time zones.

SELECT to_utc_timestamp('2020-08-15', 'America/New_York');
Output from SQL statement: 2020-08-15 14:15:08

26. unix_timestamp

Syntax: unix_timestamp(timeEXP, [expr[, format]])

What it does: Given a date/timestamp, this function converts it into UNIX time. In case you are not sure, a UNIX timestamp is the number of seconds since the first of January 1970.

SELECT unix_timestamp('2020-08-15', 'yyyy-MM-dd');
Output from SQL statement: 1597937061

27. weekofyear

Syntax: weekofyear(date)

What it does: Returns the week of the year from the given date.

SELECT weekofyear('2020-08-20');
Output from SQL statement: 34

28. year

Syntax: year(date/timestamp)

What it does: For a given date or timestamp, this function returns the year portion of it.

SELECT year('2020-08-01');
Output from SQL statement: 2020

Table of Contents

Spark SQL Functions

Official documentation on Spark SQL functions, based on documentation version 2.3.0

Datasets & DataFrames

Link to the latest Window Function documentation from the Database Developers Guide

Interested in our services ?

email us at : info@obstkel.com

Copyright 2022 © OBSTKEL LLC. All rights Reserved