Skip to content

Cluster

laktory.models.resources.databricks.Cluster ¤

Bases: BaseModel, PulumiResource, TerraformResource

Databricks cluster

ATTRIBUTE DESCRIPTION
access_controls

List of access controls

TYPE: list[AccessControl]

apply_policy_default_values

Whether to use policy default values for missing cluster attributes.

TYPE: bool

autoscale

Autoscale specifications

TYPE: ClusterAutoScale

autotermination_minutes

Automatically terminate the cluster after being inactive for this time in minutes.

TYPE: int

cluster_id

Cluster ID. Used mostly when assigning a cluster to a job task.

TYPE: str

custom_tags

Additional tags for cluster resources. Databricks will tag all cluster resources (e.g., AWS EC2 instances and EBS volumes) with these tags in addition to default_tags. If a custom cluster tag has the same name as a default cluster tag, the custom tag is prefixed with an x_ when it is propagated.

TYPE: dict[str, str]

data_security_mode

Select the security features of the cluster. Unity Catalog requires SINGLE_USER or USER_ISOLATION mode. If omitted, no security features are enabled. In the Databricks UI, this has been recently been renamed Access Mode and USER_ISOLATION has been renamed Shared, but use these terms here.

TYPE: Literal['NONE', 'SINGLE_USER', 'USER_ISOLATION']

driver_instance_pool_id

Similar to instance_pool_id, but for driver node. If omitted, and instance_pool_id is specified, then the driver will be allocated from that pool.

TYPE: str

driver_node_type_id

The node type of the Spark driver. This field is optional; if unset, API will set the driver node type to the same value as node_type_id defined above.

TYPE: str

enable_elastic_disk

If you don’t want to allocate a fixed number of EBS volumes at cluster creation time, use autoscaling local storage. With autoscaling local storage, Databricks monitors the amount of free disk space available on your cluster’s Spark workers. If a worker begins to run too low on disk, Databricks automatically attaches a new EBS volume to the worker before it runs out of disk space. EBS volumes are attached up to a limit of 5 TB of total disk space per instance (including the instance’s local storage). To scale down EBS usage, make sure you have autotermination_minutes and autoscale attributes set.

TYPE: bool

enable_local_disk_encryption

Some instance types you use to run clusters may have locally attached disks. Databricks may store shuffle data or temporary data on these locally attached disks. To ensure that all data at rest is encrypted for all storage types, including shuffle data stored temporarily on your cluster’s local disks, you can enable local disk encryption. When local disk encryption is enabled, Databricks generates an encryption key locally unique to each cluster node and uses it to encrypt all data stored on local disks. The scope of the key is local to each cluster node and is destroyed along with the cluster node itself. During its lifetime, the key resides in memory for encryption and decryption and is stored encrypted on the disk. Your workloads may run more slowly because of the performance impact of reading and writing encrypted data to and from local volumes. This feature is not available for all Azure Databricks subscriptions. Contact your Microsoft or Databricks account representative to request access.

TYPE: bool

idempotency_token

An optional token to guarantee the idempotency of cluster creation requests. If an active cluster with the provided token already exists, the request will not create a new cluster, but it will return the existing running cluster's ID instead. If you specify the idempotency token, upon failure, you can retry until the request succeeds. Databricks platform guarantees to launch exactly one cluster with that idempotency token. This token should have at most 64 characters.

TYPE: str

init_scripts

List of init scripts specifications

TYPE: list[ClusterInitScript]

instance_pool_id

To reduce cluster start time, you can attach a cluster to a predefined pool of idle instances. When attached to a pool, a cluster allocates its driver and worker nodes from the pool. If the pool does not have sufficient idle resources to accommodate the cluster’s request, it expands by allocating new instances from the instance provider. When an attached cluster changes its state to TERMINATED, the instances it used are returned to the pool and reused by a different cluster.

TYPE: str

is_pinned

boolean value specifying if the cluster is pinned (not pinned by default). You must be a Databricks administrator to use this. The pinned clusters' maximum number is limited to 100, so apply may fail if you have more than that (this number may change over time, so check Databricks documentation for actual number).

TYPE: bool

libraries

List of libraries specifications

TYPE: list[ClusterLibrary]

lookup_existing

Specifications for looking up existing resource. Other attributes will be ignored.

TYPE: ClusterLookup

name

Cluster name, which doesn’t have to be unique. If not specified at creation, the cluster name will be an empty string.

TYPE: str

node_type_id

Any supported databricks.getNodeType id. If instance_pool_id is specified, this field is not needed.

TYPE: str

num_workers

Number of worker nodes that this cluster should have. A cluster has one Spark driver and num_workers executors for a total of num_workers + 1 Spark nodes.

TYPE: int

policy_id

TYPE: str

runtime_engine

The type of runtime engine to use. If not specified, the runtime engine type is inferred based on the spark_version value

TYPE: Literal['STANDARD', 'PHOTON']

single_user_name

The optional user name of the user to assign to an interactive cluster. This field is required when using data_security_mode set to SINGLE_USER or AAD Passthrough for Azure Data Lake Storage (ADLS) with a single-user cluster (i.e., not high-concurrency clusters).

TYPE: str

spark_conf

Map with key-value pairs to fine-tune Spark clusters, where you can provide custom Spark configuration properties in a cluster configuration.

TYPE: dict[str, str]

spark_env_vars

Map with environment variable key-value pairs to fine-tune Spark clusters. Key-value pairs of the form (X,Y) are exported (i.e., X='Y') while launching the driver and workers.

TYPE: dict[str, str]

spark_version

Runtime version of the cluster. Any supported databricks.getSparkVersion id. We advise using Cluster Policies to restrict the list of versions for simplicity while maintaining enough control.

TYPE: str

ssh_public_keys

SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. You can specify up to 10 keys.

TYPE: list[str]

Examples:

from laktory import models

cluster = models.resources.databricks.Cluster(
    name="default",
    spark_version="14.0.x-scala2.12",
    data_security_mode="USER_ISOLATION",
    node_type_id="Standard_DS3_v2",
    autoscale={
        "min_workers": 1,
        "max_workers": 4,
    },
    num_workers=0,
    autotermination_minutes=30,
    libraries=[{"pypi": {"package": "laktory==0.0.23"}}],
    access_controls=[
        {
            "group_name": "role-engineers",
            "permission_level": "CAN_RESTART",
        }
    ],
    is_pinned=True,
)
References

Attributes¤

additional_core_resources property ¤

additional_core_resources
  • permissions

laktory.models.resources.databricks.cluster.ClusterAutoScale ¤

Bases: BaseModel

Cluster Autoscale

ATTRIBUTE DESCRIPTION
min_workers

Minimum number of worker nodes

TYPE: int

max_workers

Maximum number of worker nodes

TYPE: int


laktory.models.resources.databricks.cluster.ClusterInitScriptVolumes ¤

Bases: BaseModel

Cluster Init Script Workspace

ATTRIBUTE DESCRIPTION
destination

Volume filepath

TYPE: str


laktory.models.resources.databricks.cluster.ClusterInitScriptWorkspace ¤

Bases: BaseModel

Cluster Init Script Workspace

ATTRIBUTE DESCRIPTION
destination

Workspace filepath

TYPE: str


laktory.models.resources.databricks.cluster.ClusterInitScript ¤

Bases: BaseModel

Cluster Init Script

ATTRIBUTE DESCRIPTION
volumes

Volumes file specification

TYPE: ClusterInitScriptVolumes

workspace

Workspace file specifications

TYPE: ClusterInitScriptWorkspace


laktory.models.resources.databricks.cluster.ClusterLibraryCran ¤

Bases: BaseModel


laktory.models.resources.databricks.cluster.ClusterLibraryMaven ¤

Bases: BaseModel


laktory.models.resources.databricks.cluster.ClusterInitScript ¤

Bases: BaseModel

Cluster Init Script

ATTRIBUTE DESCRIPTION
volumes

Volumes file specification

TYPE: ClusterInitScriptVolumes

workspace

Workspace file specifications

TYPE: ClusterInitScriptWorkspace


laktory.models.resources.databricks.cluster.ClusterLibraryPypi ¤

Bases: BaseModel

Cluster Library Pypi

ATTRIBUTE DESCRIPTION
package

Package name

TYPE: str

repo

Packages repository

TYPE: str


laktory.models.resources.databricks.cluster.ClusterLibrary ¤

Bases: BaseModel

Cluster Library

ATTRIBUTE DESCRIPTION
cran

Cran library specifications

TYPE: ClusterLibraryCran

egg

Egg filepath

TYPE: str

jar

Jar filepath

TYPE: str

maven

TYPE: ClusterLibraryMaven

pypi

Pypi library specifications

TYPE: ClusterLibraryPypi

whl

Wheel filepath

TYPE: str


laktory.models.resources.databricks.cluster.ClusterLookup ¤

Bases: ResourceLookup

ATTRIBUTE DESCRIPTION
cluster_id

The id of the cluster

TYPE: str