Spark SQL Date Functions – Complete list with examples

In this post we will address Spark SQL Date Functions, its syntax and what it does. There are 28 Spark SQL Date functions, meant to address string to date, date to timestamp, timestamp to date, date additions, subtractions and current date conversions. If you are a beginner to Spark SQL, please read our post on Spark tutorial for beginners: Apache Spark Concepts for a refresher. Spark SQL is the Apache Spark module for processing structured data. There are a couple of different ways to begin executing Spark SQL queries.

  • API : When writing and executing Spark SQL from Scala, Java, Python or R, a SparkSession is still the entry point. Once a SparkSession has been established, a DataFrame or a Dataset needs to be created on the data before Spark SQL can be executed.

  • Spark SQL CLI : This Spark SQL Command Line interface is a lifesaver for writing and testing out SQL. However, the SQL is executed against Hive, so make sure test data exists in some capacity.

For experimenting with the various Spark SQL Date Functions, using the Spark SQL CLI is definitely the recommended approach. The table below lists the 28 Spark Date functions as of Spark 3.0.0 documentation release.

If you need to familiarize your self with Spark basics, do read our blog post on Spark SQL Functions – Listed by Category and Spark Scala examples.

1. add_months

Syntax: add_months(start_date, num_months)

What it does: Returns a date with months specified by num_months added

SELECT add_months('2020-08-01',4);

Output from SQL statement: 2020-12-01

2. current_date

Syntax: current_date()

What it does: The Spark SQL current date function returns the date as of the beginning of your query execution. 

There are two variations for the spark sql current date syntax. You can specify it with the parenthesis as current_date() or as current_date. They both return the current date in the default format ‘YYYY-MM-DD’.

SELECT current_date();
or
SELECT current_date;

Output from SQL statement: 2020-08-14

3. current_timestamp

Syntax: current_timestamp()

What it does: Returns the current timestamp

SELECT current_timestamp();
Output from SQL statement: 2020-08-16 12:20:09.507

4. date

Syntax: date(expr)

What it does: Converts an expression in string format to a date

SELECT date('2020-08-16');
Output from SQL statement: 2020-08-16

5. date_add

Syntax: date_add(start_date, num_days)

What it does: Returns the date plus number of days specified by integer num_days

SELECT date_add('2020-08-16',5);
Output from SQL statement: 2020-08-21

6. date_sub

Syntax: date_sub(start_date, num_days)

What it does: Subtracts days specified by integer num_days from a given date

SELECT date_sub('2020-08-16',5);
Output from SQL statement: 2020-08-11

7. date_format

Syntax: date_format(timestamp, fmt)

What it does: Returns a given timestamp or date in the format specified.

SELECT date_format('2020-08-15', "M");

Output from SQL statement: 8
ElementDescription
aam-pm-of-day
Dday-of-year
dday-of-month
Eday-of-week
Fweek-of-month
Gera
hclock-hour-of-am-pm (1-12)
Hhour-of-day (0-23)
Khour-of-am-pm (0-11)
kclock-hour-of-day (1-24)
mminute-of-hour
M/Lmonth-of-year
Olocalized zone-offset
Q/qquarter-of-year
ssecond-of-minute
Sfraction-of-second
Vtime-zone ID
Xzone-offset ‘Z’ for zero
xzone-offset
yyear
ztime-zone name
Zzone-offset

8. date_trunc

Syntax: date_trunc(fmt,ts)

 

 

What it does: Returns a timestamp specified as (ts) truncated to the unit specified by format (fmt) [“YEAR”, “YYYY”, “YY”, “MON”, “MONTH”, “MM”, “DAY”, “DD”, “HOUR”, “MINUTE”, “SECOND”, “WEEK”, “QUARTER”]

SELECT date_trunc('YEAR', '2020-08-15T06:25:10.234'); 
Output from SQL statement: 2020-01-01 00:00:00

9. datediff

Syntax: datediff(endDate, startDate)

What it does: The Spark datediff  function returns the difference between two given dates, endDate and startDate. 

When using Spark datediff, make sure you specify the greater or max date first (endDate) followed by the lesser or minimum date (startDate). If not you will end up with a negative date.

Another point to keep in mind, DO NOT use the months_between function instead of the Spark SQL datediff function. The months_between function is calculated based on an a 31 days per month assumption. The end result will not be the same !

SELECT datediff('2020-08-15', '2020-08-01'); 
Output from SQL statement: 14

10. day

Syntax: day(date)

What it does: Returns the day from the date or timestamp 

SELECT day('2020-08-15');
Output from SQL statement: 15

11. dayofmonth

Syntax: dayofmonth(date)

What it does: Returns the day from a given date or timestamp. This function is the same as the day function

SELECT dayofmonth('2020-08-15');
Output from SQL statement: 15

12. dayofweek

Syntax: dayofweek(date)

 

 

What it does: Returns the day of week for a given date or timestamp. This can get a bit tricky if you don’t get the basics right. The 15th of September, 2020 for example falls on the third week of the month, and is the third day of the week

SELECT dayofweek('2020-09-15');
Output from SQL statement: 3

13. dayofyear

Syntax: dayofyear(date)

What it does: Returns the day of year from the date or timestamp. Think 1-365

SELECT dayofyear('2020-07-15');
Output from SQL statement: 228

14. month

Syntax: month(date)

What it does: Returns the month of the year from the date or timestamp

SELECT month('2020-07-15');
Output from SQL statement: 7

15. minute

Syntax: minute(timestamp)

What it does: Returns the minute portion from a timestamp

SELECT minute('2020-08-15 09:24:30');
Output from SQL statement: 24

16. months_between

Syntax: months_between(timestamp1, timestamp2)

What it does: Returns the number of months between the two timestamps based on 31 days in a month

SELECT months_between('2020-05-01 09:35:02', '2020-01-01');
Output from SQL statement: 4

17. next_day

Syntax: next_day(start_date, day_of_week)

 

What it does: Returns the day after the start_date specified by day_of_week. Day of week can be specified as ‘MON’, ‘TUE’, ‘WED’, ‘THU’, ‘FRI’, ‘SAT’, ‘SUN’ or as ‘MO’, ‘TU’, ‘WE’, ‘TH’, ‘FR’, ‘SA’, ‘SU’

SELECT next_day('2020-08-10', 'MON');
Output from SQL statement: 2020-08-17

18. now

Syntax: now()

What it does: Returns the current timestamp

SELECT now();
Output from SQL statement: 2020-08-20 12:29:05.914

19. quarter

Syntax: quarter(date)

What it does: Given a date, this function returns the quarter in which the date falls

SELECT quarter('2020-08-20');
Output from SQL statement: 3

20. second

Syntax: second(timestamp)

What it does: Given a timestamp, this function returns the second portion of it

SELECT second('2020-08-20 10:18:25');
Output from SQL statement: 25

21. timestamp

Syntax: timestamp(expr)

What it does: Casts a given expression into a timestamp. This function works the same as to_timestamp.

SELECT timestamp('2020-08-15', 'yyyy-MM-dd');
Output from SQL statement: 2020-08-15 00:00:00

22. to_date

Syntax: to_date(date_str[, fmt])

What it does: Converts a string to date type in the format specified by fmt

SELECT to_date('2020-08-14', 'yyyy-MM-dd');
Output from SQL statement: 2020-08-14

23. to_timestamp

Syntax: to_timestamp(timestamp[, fmt])

What it does: Converts a string to timestamp specified by fmt. For format syntax refer to the second tab named ” Format Patterns

SELECT to_timestamp('2020-08-15', 'yyyy-MM-dd');

Output from SQL statement: 2020-08-15 00:00:00
ElementDescription
aam-pm-of-day
Dday-of-year
dday-of-month
Eday-of-week
Fweek-of-month
Gera
hclock-hour-of-am-pm (1-12)
Hhour-of-day (0-23)
Khour-of-am-pm (0-11)
kclock-hour-of-day (1-24)
mminute-of-hour
M/Lmonth-of-year
Olocalized zone-offset
Q/qquarter-of-year
ssecond-of-minute
Sfraction-of-second
Vtime-zone ID
Xzone-offset ‘Z’ for zero
xzone-offset
yyear
ztime-zone name
Zzone-offset

24. to_unix_timestamp

Syntax: to_unix_timestamp(expr[, pattern])

 

What it does: Given a data or a timestamp, this function returns it as a UNIX timestamp. This function works the same as the to_unix_timestamp() function.

SELECT to_unix_timestamp('2020-08-15', 'yyyy-MM-dd');
Output from SQL statement: 1597937061

25. to_utc_timestamp

Syntax: to_utc_timestamp(timestamp, timezone)

 

What it does: Given a timestamp, this function returns it in the given time zone. For a list of timezones, please refer to tz database time zones

SELECT to_utc_timestamp('2020-08-15', 'America/New_York');
Output from SQL statement: 2020-08-15 14:15:08

26. unix_timestamp

Syntax: unix_timestamp(timeEXP, [expr[, format]])

 

What it does: Given a date/timestamp, this function converts it into UNIX time. In case you are not sure, a UNIX timestamp is the number of seconds since the first of January 1970

SELECT unix_timestamp('2020-08-15', 'yyyy-MM-dd');
Output from SQL statement: 1597937061

27. weekofyear

Syntax: weekofyear(date)

What it does: Returns the week of the year from the given date

SELECT weekofyear('2020-08-20');
Output from SQL statement: 34

28. year

Syntax: year(date/timestamp)

What it does: For a given date or timestamp, this function returns the year portion of it.

SELECT year('2020-08-01');
Output from SQL statement: 2020

Spark SQL Related Resources

Programming Guide

The official Apache Spark v3.0.0 Spark SQL Programming Guide.

Application Programming Interface

The official Apache Spark SQL API documentation

Email us at : info@obstkel.com

Copyright 2020 © OBSTKEL LLC. All rights Reserved.
Scroll to Top