pySparkGuide tag-guide

Remove duplicate rows from dataframe
Remove duplicate rows using distinct() and dropDuplicates()
2024-09-18 1 min guide 12 cleaning 1 workflow 2

Cast string column to Date or DateTime using to_date() or to_timestamp()
Convert string column to Date or Datetime using to_date() and to_timestamp()
2024-01-28 3 min guide 12

GROUP BY in pySpark
Group BY clause in the form of pySpark functions
2024-01-22 4 min guide 12 sql 5

CASE in pySpark
SQL CASE clause in the form of pySpark functions
2024-01-17 5 min guide 12 sql 5

Filter rows in pySpark
SQL WHERE clause in the form of pySpark functions
2024-01-17 3 min guide 12 sql 5

Window functions with pySpark
How to use window functions in pySpark
2023-12-19 13 min guide 12 sql 5

JSON - read, set schema and write with pySpark
Load data, set schema, save data using the DataFrameReader & DataFrameWriter APIs. Using various file formats.
2023-12-18 13 min guide 12 file-io 4

Cache and persist - why and how
Improve execution speed by caching/persisting intermediate dataframes
2023-11-18 17 min guide 12 performance 3

CSV - read, set schema and write with pySpark
Load data, set schema, save data using the DataFrameReader & DataFrameWriter APIs. Using various file formats.
2023-11-16 17 min guide 12 file-io 4

Tag: guide (12)