BaseDataSink
laktory.models.datasinks.basedatasink.BaseDataSink
¤
Bases: BaseModel
, PipelineChild
Base class for building data sink
ATTRIBUTE | DESCRIPTION |
---|---|
is_primary |
A primary sink will be used to read data for downstream nodes when moving from stream to batch. Don't apply for quarantine sinks.
TYPE:
|
is_quarantine |
Sink used to store quarantined results from node expectations.
TYPE:
|
merge_cdc_options |
Merge options to handle input DataFrames that are Change Data Capture
(CDC). Only used when
TYPE:
|
mode |
Write mode. - overwrite: Overwrite existing data - append: Append contents of the dataframe to existing data - error: Throw and exception if data already exists - ignore: Silently ignore this operation if data already exists - complete: Overwrite for streaming dataframes - merge: Append, update and optionally delete records. Requires cdc specification.
TYPE:
|
write_options |
Other options passed to |
METHOD | DESCRIPTION |
---|---|
write |
Write dataframe into sink. |
purge |
Delete sink data and checkpoints |
read |
Read dataframe from sink. |
ATTRIBUTE | DESCRIPTION |
---|---|
dlt_apply_changes_kwargs |
Keyword arguments for dlt.apply_changes function |
Attributes¤
dlt_apply_changes_kwargs
property
¤
dlt_apply_changes_kwargs
Keyword arguments for dlt.apply_changes function
Functions¤
write
¤
write(df=None, mode=None, spark=None, view_definition=None)
Write dataframe into sink.
PARAMETER | DESCRIPTION |
---|---|
df
|
Input dataframe.
TYPE:
|
mode
|
Write mode overwrite of the sink default mode.
TYPE:
|
spark
|
Spark Session for creating a view
DEFAULT:
|
view_definition
|
View definition. Overwrites view definition defined in the sink.
TYPE:
|
Source code in laktory/models/datasinks/basedatasink.py
608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 |
|
purge
¤
purge()
Delete sink data and checkpoints
Source code in laktory/models/datasinks/basedatasink.py
724 725 726 727 728 |
|
read
¤
read(spark=None, as_stream=None)
Read dataframe from sink.
PARAMETER | DESCRIPTION |
---|---|
spark
|
Spark Session
DEFAULT:
|
as_stream
|
If
DEFAULT:
|
Source code in laktory/models/datasinks/basedatasink.py
737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 |
|
laktory.models.datasinks.basedatasink.DataSinkMergeCDCOptions
¤
Bases: BaseModel
Options for merging a change data capture (CDC).
They are also used to build the target using apply_changes
method when
using Databricks DLT.
ATTRIBUTE | DESCRIPTION |
---|---|
delete_where |
Specifies when a CDC event should be treated as a DELETE rather than an upsert.
TYPE:
|
end_at_column_name |
When using SCD type 2, name of the column storing the end time (or sequencing index) during which a row is active. This attribute is not used when using Databricks DLT which does not allow column rename.
TYPE:
|
exclude_columns |
A subset of columns to exclude in the target table. |
ignore_null_updates |
Allow ingesting updates containing a subset of the target columns.
When a CDC event matches an existing row and ignore_null_updates is
TYPE:
|
include_columns |
A subset of columns to include in the target table. Use
|
order_by |
The column name specifying the logical order of CDC events in the source data. Used to handle change events that arrive out of order.
TYPE:
|
primary_keys |
The column or combination of columns that uniquely identify a row in the source data. This is used to identify which CDC events apply to specific records in the target table. |
scd_type |
Whether to store records as SCD type 1 or SCD type 2.
TYPE:
|
start_at_column_name |
When using SCD type 2, name of the column storing the start time (or sequencing index) during which a row is active. This attribute is not used when using Databricks DLT which does not allow column rename.
TYPE:
|
METHOD | DESCRIPTION |
---|---|
execute |
Merge source into target delta from sink |
Functions¤
execute
¤
execute(source)
Merge source into target delta from sink
PARAMETER | DESCRIPTION |
---|---|
source
|
Source DataFrame to merge into target (sink).
TYPE:
|
Source code in laktory/models/datasinks/basedatasink.py
443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 |
|
--
laktory.models.datasinks.DataSinksUnion
module-attribute
¤
DataSinksUnion = Union[FileDataSink, TableDataSink]