Skip to content

Spark Extension

API Documentation

laktory.spark.functions
laktory.spark.dataframe

Apache Spark, an open-source, distributed computing system. Apache Spark is designed for big data processing and analytics and provides a fast and general-purpose cluster-computing framework. It supports various programming languages, including Scala, Java, Python, and R.

To facilitate the transformation of your data Laktory extends spark native functions by in a laktory namespace.

Functions¤

The first extension is the provision of a library of functions that can be used to build columns from other columns or constants.

import laktory
import pandas as pd
import pyspark.sql.functions as F

df = spark.createDataFrame(pd.DataFrame({"x": [1, 2, 3]}))
df = df.withColumn("y", F.laktory.poly1("x", -1, 1.0))
Here function poly1 is a Laktory-specific function and is available because of the import laktory statement. All other custom functions are also available from the pyspark.sql.functions.laktory namespace.

Dataframe methods¤

In this case the methods are designed to be applied directly on a spark dataframe.

import laktory
import pandas as pd

df = spark.createDataFrame(pd.DataFrame({"x": [1, 2, 3]}))
df.laktory.has_column("x")

Laktory is monkey patching the DataFrame class from spark by assigning all the custom methods under the laktory namespace at runtime.

Some methods of interest are: