Skip to content

FileDataSource

laktory.models.datasources.FileDataSource ยค

Bases: BaseDataSource

Data source using disk files, such as data events (json/csv) and dataframe parquets. It is generally used in the context of a data pipeline.

ATTRIBUTE DESCRIPTION
format

Format of the data files

TYPE: Literal['CSV', 'PARQUET', 'DELTA', 'JSON', 'EXCEL', 'BINARYFILE']

header

If True, first line of CSV files is assumed to be the column names.

TYPE: bool

multiline

If True, JSON files are parsed assuming that an object maybe be defined on multiple lines (as opposed to having a single object per line)

TYPE: bool

read_options

Other options passed to spark.read.options

TYPE: dict[str, str]

schema_location

Path for files schema. If None, parent directory of path is used

TYPE: str

Examples:

from laktory import models

source = models.FileDataSource(
    path="/Volumes/sources/landing/events/yahoo-finance/stock_price",
    format="JSON",
    as_stream=False,
)
# df = source.read(spark)