DagsterDocs
Quick search

GCP (dagster-gcp)

BigQuery

class dagster_gcp.BigQueryError[source]
dagster_gcp.bigquery_resource ResourceDefinition[source]
dagster_gcp.bq_create_dataset(*args, **kwargs)[source]

BigQuery Create Dataset.

This solid encapsulates creating a BigQuery dataset.

Expects a BQ client to be provisioned in resources as context.resources.bigquery.

dagster_gcp.bq_delete_dataset(*args, **kwargs)[source]

BigQuery Delete Dataset.

This solid encapsulates deleting a BigQuery dataset.

Expects a BQ client to be provisioned in resources as context.resources.bigquery.

dagster_gcp.bq_solid_for_queries(sql_queries)[source]

Executes BigQuery SQL queries.

Expects a BQ client to be provisioned in resources as context.resources.bigquery.

dagster_gcp.import_df_to_bq(*args, **kwargs)[source]
dagster_gcp.import_file_to_bq(*args, **kwargs)[source]
dagster_gcp.import_gcs_paths_to_bq(*args, **kwargs)[source]

Dataproc

dagster_gcp.dataproc_solid(*args, **kwargs)[source]
dagster_gcp.dataproc_resource ResourceDefinition[source]

GCS

dagster_gcp.gcs.gcs_intermediate_storage IntermediateStorageDefinition[source]
dagster_gcp.gcs_resource ResourceDefinition[source]
class dagster_gcp.GCSFileHandle(gcs_bucket: str, gcs_key: str)[source]

A reference to a file on GCS.

property gcs_bucket

The name of the GCS bucket.

Type

str

property gcs_key

The GCS key.

Type

str

property gcs_path

The file’s GCS URL.

Type

str

property path_desc

The file’s GCS URL.

Type

str

dagster_gcp.gcs_file_manager ResourceDefinition[source]

FileManager that provides abstract access to GCS.

Implements the FileManager API.

dagster_gcp.gcs.gcs_pickle_io_manager IOManagerDefinition[source]

Persistent IO manager using GCS for storage.

Serializes objects via pickling. Suitable for objects storage for distributed executors, so long as each execution node has network connectivity and credentials for GCS and the backing bucket.

Attach this resource definition to a ModeDefinition in order to make it available to your pipeline:

pipeline_def = PipelineDefinition(
    mode_defs=[
        ModeDefinition(
            resource_defs={'io_manager': gcs_pickle_io_manager, 'gcs': gcs_resource, ...},
        ), ...
    ], ...
)

You may configure this storage as follows:

resources:
    io_manager:
        config:
            gcs_bucket: my-cool-bucket
            gcs_prefix: good/prefix-for-files-