BaseDataSource
laktory.models.datasources.basedatasource.BaseDataSource
¤
Bases: BaseModel
, PipelineChild
Base class for building data source
ATTRIBUTE | DESCRIPTION |
---|---|
as_stream |
If
TYPE:
|
broadcast |
If |
dataframe_backend |
Type of dataframe
TYPE:
|
drops |
List of columns to drop |
filter |
SQL expression used to select specific rows from the source table |
renames |
Mapping between the source table column names and new column names |
selects |
Columns to select from the source table. Can be specified as a list or as a dictionary to rename the source columns |
watermark |
Spark structured streaming watermark specifications |
METHOD | DESCRIPTION |
---|---|
read |
Read data with options specified in attributes. |
ATTRIBUTE | DESCRIPTION |
---|---|
is_orchestrator_dlt |
If
TYPE:
|
Attributes¤
is_orchestrator_dlt
property
¤
is_orchestrator_dlt
If True
, data source is used in the context of a DLT pipeline
Functions¤
read
¤
read(spark=None)
Read data with options specified in attributes.
PARAMETER | DESCRIPTION |
---|---|
spark
|
Spark context
DEFAULT:
|
RETURNS | DESCRIPTION |
---|---|
DataFrame
|
Resulting dataframe |
Source code in laktory/models/datasources/basedatasource.py
127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 |
|
laktory.models.datasources.basedatasource.Watermark
¤
Bases: BaseModel
Definition of a spark structured streaming watermark for joining data streams.
ATTRIBUTE | DESCRIPTION |
---|---|
column |
Event time column name
TYPE:
|
threshold |
How late, expressed in seconds, the data is expected to be with respect to event time.
TYPE:
|
References
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#handling-late-data-and-watermarking
laktory.models.datasources.DataSourcesUnion
module-attribute
¤
DataSourcesUnion = Union[FileDataSource, MemoryDataSource, PipelineNodeDataSource, TableDataSource]