Skip to content

string_split

laktory.spark.functions.string_split ยค

string_split(x, pattern, key)

Get substring using separator pat.

PARAMETER DESCRIPTION
x

Input text series to split

TYPE: COLUMN_OR_NAME

pattern

String or regular expression to split on. If not specified, split on whitespace.

TYPE: str

key

Split index to return

TYPE: int

RETURNS DESCRIPTION
Column

Result

Examples:

from pyspark.sql import SparkSession
import pyspark.sql.functions as F

spark = SparkSession.builder.getOrCreate()

df = spark.range(1).withColumn("x", F.lit("price_close"))
df = df.withColumn("y", F.laktory.string_split("x", pattern="_", key=1))
print(df.laktory.show_string())
'''
+---+-----------+-----+
| id|          x|    y|
+---+-----------+-----+
|  0|price_close|close|
+---+-----------+-----+
'''
Source code in laktory/spark/functions/string.py
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
def string_split(
    x: COLUMN_OR_NAME,
    pattern: str,
    key: int,
) -> Column:
    """
    Get substring using separator `pat`.

    Parameters
    ----------
    x:
        Input text series to split
    pattern:
        String or regular expression to split on. If not specified, split on whitespace.
    key:
        Split index to return

    Returns
    -------
    :
        Result

    Examples
    --------
    ```py
    from pyspark.sql import SparkSession
    import pyspark.sql.functions as F

    spark = SparkSession.builder.getOrCreate()

    df = spark.range(1).withColumn("x", F.lit("price_close"))
    df = df.withColumn("y", F.laktory.string_split("x", pattern="_", key=1))
    print(df.laktory.show_string())
    '''
    +---+-----------+-----+
    | id|          x|    y|
    +---+-----------+-----+
    |  0|price_close|close|
    +---+-----------+-----+
    '''
    ```
    """
    return F.split(_col(x), pattern=pattern).getItem(key)