Skip to content

MemoryDataSource

laktory.models.datasources.MemoryDataSource ยค

Bases: BaseDataSource

Data source using in-memory DataFrame, generally used in the context of a data pipeline.

ATTRIBUTE DESCRIPTION
data

Serialized data to build input DataFrame

TYPE: Union[dict[str, list[Any]], list[dict[str, Any]]]

df

Input DataFrame

TYPE: Any

Examples:

import polars as pl

from laktory import models

data = {
    "symbol": ["AAPL", "GOOGL"],
    "price": [200.0, 205.0],
    "tstamp": ["2023-09-01", "2023-09-01"],
}

# Spark from dict
source = models.MemoryDataSource(
    data=data,
    dataframe_backend="SPARK",
)
df = source.read(spark=spark)
print(df.laktory.show_string())
'''
+-----+------+----------+
|price|symbol|    tstamp|
+-----+------+----------+
|200.0|  AAPL|2023-09-01|
|205.0| GOOGL|2023-09-01|
+-----+------+----------+
'''

# Polars from df
source = models.MemoryDataSource(
    df=pl.DataFrame(data),
)
df = source.read()
print(df.to_pandas())
'''
  symbol  price      tstamp
0   AAPL  200.0  2023-09-01
1  GOOGL  205.0  2023-09-01
'''