GCP (dagster-gcp)¶

BigQuery¶

class dagster_gcp.BigQueryError[source]¶

dagster_gcp.bigquery_resource ResourceDefinition[source]¶

dagster_gcp.bq_create_dataset(*args, **kwargs)[source]¶

BigQuery Create Dataset.

This solid encapsulates creating a BigQuery dataset.

Expects a BQ client to be provisioned in resources as context.resources.bigquery.

dagster_gcp.bq_delete_dataset(*args, **kwargs)[source]¶

BigQuery Delete Dataset.

This solid encapsulates deleting a BigQuery dataset.

Expects a BQ client to be provisioned in resources as context.resources.bigquery.

dagster_gcp.bq_solid_for_queries(sql_queries)[source]¶

Executes BigQuery SQL queries.

Expects a BQ client to be provisioned in resources as context.resources.bigquery.

dagster_gcp.import_df_to_bq(*args, **kwargs)[source]¶

dagster_gcp.import_file_to_bq(*args, **kwargs)[source]¶

dagster_gcp.import_gcs_paths_to_bq(*args, **kwargs)[source]¶

Dataproc¶

dagster_gcp.dataproc_solid(*args, **kwargs)[source]¶

dagster_gcp.dataproc_resource ResourceDefinition[source]¶

GCS¶

dagster_gcp.gcs.gcs_intermediate_storage IntermediateStorageDefinition[source]¶

dagster_gcp.gcs_resource ResourceDefinition[source]¶

class dagster_gcp.GCSFileHandle(gcs_bucket: str, gcs_key: str)[source]¶

A reference to a file on GCS.

property gcs_bucket¶

The name of the GCS bucket.

Type: str

property gcs_key¶

The GCS key.

Type: str

property gcs_path¶

The file’s GCS URL.

Type: str

property path_desc¶

The file’s GCS URL.

Type: str

dagster_gcp.gcs_file_manager ResourceDefinition[source]¶

FileManager that provides abstract access to GCS.

Implements the FileManager API.

dagster_gcp.gcs.gcs_pickle_io_manager IOManagerDefinition[source]¶

Persistent IO manager using GCS for storage.

Serializes objects via pickling. Suitable for objects storage for distributed executors, so long as each execution node has network connectivity and credentials for GCS and the backing bucket.

Attach this resource definition to a ModeDefinition in order to make it available to your pipeline:

pipeline_def = PipelineDefinition(
    mode_defs=[
        ModeDefinition(
            resource_defs={'io_manager': gcs_pickle_io_manager, 'gcs': gcs_resource, ...},
        ), ...
    ], ...
)

You may configure this storage as follows:

resources:
    io_manager:
        config:
            gcs_bucket: my-cool-bucket
            gcs_prefix: good/prefix-for-files-