Style guide - Chaining functions in python

By Niraj Zade | 2023-11-16   | Tags: guide style guide


Background

If you've worked with Java or Scala, you know that these languages use the function chaining pattern A LOT.

And since these languages use chaining a lot, they allow converting the chains into multiline code without having to use escape characters etc.

Example:

// CHAINING LOOKS GOOD IN JAVA OR SCALA
// single-line
object.fn1().fn2().fn3().fn4()

// multi-line
object
.fn1()
.fn2()
.fn3()
.fn4()

Python suggestion

Python supports function chaining. However, we don't use the function chaining pattern unless we really need to. It is simply not pythonic.

But since the pySpark api is simply a wrapper around the scala api, the function chaining pattern has leaked into the pySpark API.

In python, we don't have a decent syntax for multi-line chained-functions. So we have to separate the code into multiple lines using escape characters \. This leads to very ugly multi-line code.

Example:

# single-line
object = object.fn1().fn2().fn3().fn4().fn5()

# multi-line (have to use escape characters)
object = object.fn1() \
        .fn2() \
        .fn3() \
        .fn4() \
        .fn5()

Instead of this, you can just wrap the entire expression in braces ( ... ) to get rid of the escape character

object = (
      object
      .fn1()
      .fn2()
      .fn3()
      .fn4()
      .fn5()
)

It leads to much cleaner codebases, and also improves coding speed.

pySpark examples

Example 1

Before

# before
# read csv data
sales_df = spark.read \
        .format("csv") \
        .option("header", "true") \
        .option("inferschema", "true") \ # optional option
        .load("/data/sales/")

After

# after
# read csv data
sales_df = (
    spark
    .read
    .format("csv")
    .option("header", "true")
    .option("inferschema", "true") # optional option
    .load("/data/sales/")
)

Example 2

Before:

# before
# write partitioned dataframe
sales_df.write \
    .format("csv") \
    .option("header", "true") \
    .mode("overwrite") \
    .partitionBy("year", "date", "day") \
    .save("/data/sales") \

After

# after
# write partitioned dataframe
(
    sales_df
    .write
    .format("csv")
    .option("header", "true")
    .mode("overwrite")
    .partitionBy("year", "date", "day")
    .save("/data/sales")
)

That's it. Enjoy.


© pySparkGuide.com 2024 | Website was autogenerated on 2024-04-24

Brought to you by Niraj Zade - Website, Linkedin

~ whoever owns storage, owns computing ~