There are 2 time formats that we deal with - Date and DateTime (timestamp).
In pySpark, we use:
to_date()
for generating Dateto_timestamp()
for generating DateTime(timestamp) upto microsecond precision.
to_date()
Documentation link - pyspark.sql.functions.to_date
Parameters:
- col or str - column values to convert.
- format: str (optional parameter) - format string used to convert timestamp values.
Returns: Column
. Date value as pyspark.sql.types.DateType
type.
BEST PRACTICE The format parameter is optional, but it is best practice to specify it whenever you use
to_date()
(or any time based casting function). Even if you're using the default format.
Example:
from pyspark.sql import functions as F
# create dataframe for example
df = spark.createDataFrame([("2024-01-28 10:30:00",)], ["datetime_str"])
# cast string col "datetime_str" to datetime col "datetime"
df_casted = df.select(
F.to_date(col=F.col("datetime_str"), format="yyyy-MM-dd HH:mm:ss")
.alias("date")
)
# show the dataframe
df_casted.show()
Output:
+----------+
| date|
+----------+
|2024-01-28|
+----------+
to_timestamp()
Documentation link - pyspark.sql.functions.to_timestamp
Parameters:
- col: col or str - input column of values to convert.
- format: str (optional parameter) - format string used to convert timestamp values.
Returns: Column
. Timestamp value as pyspark.sql.types.TimestampType
type.
BEST PRACTICE The format parameter is optional, but it is best practice to specify it whenever you use
to_timestamp()
(or any time based casting function). Even if you're using the default format.
Example:
from pyspark.sql import functions as F
# create dataframe for example
df = spark.createDataFrame([("2023-01-28 10:30:00",)], ["datetime_str"])
# cast string col "datetime_str" to datetime col "datetime"
format_string = "yyyy-MM-dd HH:mm:ss"
df_casted = df.select(
F.to_timestamp(col=F.col("datetime_str"), format=format_string)
.alias("datetime")
)
# show the dataframe
df_casted.show()
Output:
+-------------------+
| datetime|
+-------------------+
|2023-01-28 10:30:00|
+-------------------+
All date & time formatters
NOTE:
PySpark It uses Java datetime formatters. Not python date time formatters.
This is a list of all formatters that you can use to compile the format string for your data:
Symbol | Meaning | Presentation | Examples |
---|---|---|---|
G | era | text | AD; Anno Domini |
y | year | year | 2020; 20 |
D | day-of-year | number(3) | 189 |
M/L | month-of-year | month | 7; 07; Jul; July |
d | day-of-month | number(2) | 28 |
Q/q | quarter-of-year | number/text | 3; 03; Q3; 3rd quarter |
E | day-of-week | text | Tue; Tuesday |
F | aligned day of week in month | number(1) | 3 |
a | am-pm-of-day | am-pm | PM |
h | clock-hour-of-am-pm (1-12) | number(2) | 12 |
K | hour-of-am-pm (0-11) | number(2) | 0 |
k | clock-hour-of-day (1-24) | number(2) | 0 |
H | hour-of-day (0-23) | number(2) | 0 |
m | minute-of-hour | number(2) | 30 |
s | second-of-minute | number(2) | 55 |
S | fraction-of-second | fraction | 978 |
V | time-zone ID | zone-id | America/Los_Angeles; Z; -08:30 |
z | time-zone name | zone-name | Pacific Standard Time; PST |
O | localized zone-offset | offset-O | GMT+8; GMT+08:00; UTC-08:00; |
X | zone-offset ‘Z’ for zero | offset-X | Z; -08; -0830; -08:30; -083015; -08:30:15; |
x | zone-offset | offset-x | +0000; -08; -0830; -08:30; -083015; -08:30:15; |
Z | zone-offset | offset-Z | +0000; -0800; -08:00; |
‘ | escape for text | delimiter | |
’‘ | single quote | literal | ’ |
[ | optional section start | ||
] | optional section end |
Source - Documentation - Datetime Patterns for Formatting and Parsing
This section took more time to create than expected. The only place where you get a list of these formatters is in a url in a runtime exception's message.
SparkUpgradeException: [INCONSISTENT_BEHAVIOR_CROSS_VERSION.DATETIME_PATTERN_RECOGNITION] You may get a different result due to the upgrading to Spark >= 3.0:
Fail to recognize pattern in the DateTimeFormatter.
1) You can set "spark.sql.legacy.timeParserPolicy" to "LEGACY" to restore the behavior before Spark 3.0.
2) You can form a valid datetime pattern with the guide from https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html.