Resource Config Guide#
GiGL Resource Config is a yaml file that is passed in at runtime and specifies the resource and environment configurations for each component in the GiGL. The proto definition for GiGL Resource Config can be seen here
Prerequisites#
If you don’t have cloud assets already setup i.e. a GCP project. See guide
Resource Config Breakdown#
Example: Full Template for Resource Config#
See Full Template Resource Config
shared_resource_config:
resource_labels:
cost_resource_group_tag: ""
cost_resource_group: ""
common_compute_config:
project: "project_id"
region: "gcp_region_here"
temp_assets_bucket: "gs://"
temp_regional_assets_bucket: "gs://"
perm_assets_bucket: "gs://"
temp_assets_bq_dataset_name: "bq_dataset_name_here"
embedding_bq_dataset_name: "bq_dataset_name_here"
gcp_service_account_email: "service_account_email_here"
k8_service_account: "service_account_name_here"
dataflow_worker_harness_image: "gcr.io/..."
dataflow_runner: "" # DataflowRunner or DirectRunner
preprocessor_config:
edge_preprocessor_config:
num_workers: 1
max_num_workers: 2
machine_type: "" # e.g. n1-highmem-32
disk_size_gb: 100
node_preprocessor_config:
num_workers: 1
max_num_workers: 2
machine_type: "" # e.g. n1-highmem-64
disk_size_gb: 100
subgraph_sampler_config:
machine_type: "" # e.g. n1-highmem-32
num_local_ssds: 1
num_replicas: 1
split_generator_config:
machine_type: "" # e.g. n1-highmem-32
num_local_ssds: 1
num_replicas: 1
trainer_config:
vertex_ai_trainer_config:
machine_type: "" # e.g. n1-highmem-16
gpu_type: "" # e.g. nvidia-tesla-p100
gpu_limit: 1
num_replicas: 1
inferencer_config:
num_workers: 1
max_num_workers: 2
machine_type: "" # e.g. n1-highmem-16
disk_size_gb: 100
Shared Resource Config
The shared_resource_config
field includes settings that apply across all GiGL components. You need to customize this section according to your GCP project specifics.
Resource Labels: Resource labels help you manage costs and organzie resources. Modify the
resource_labels
section to fit your project’s labeling scheme.Common Compute Config: This section includes important project specifications. Fill out the fields with your project ID, region, asset buckets, and service account email.
common_compute_config:
project: "your-gcp-project-id"
region: "your-region"
temp_assets_bucket: "gs://your-temp-bucket"
perm_assets_bucket: "gs://your-permanent-bucket"
gcp_service_account_email: "your-service-account-email"
Preprocessor Config
The preprocessor_config
specifies settings for the Dataflow preprocessor component, includes number of workers, machine type, and disk size. You must specify both the node_preprocessor_config
and edge_preprocessor_config
. See example:
preprocessor_config:
edge_preprocessor_config:
num_workers: 1
max_num_workers: 2
machine_type: "n1-highmem-32"
disk_size_gb: 100
node_preprocessor_config:
num_workers: 1
max_num_workers: 2
machine_type: "n1-highmem-32"
disk_size_gb: 100
Subgraph Sampler Config
The subgraph_sampler_config
specifies settings for the Spark subgraph sampler component, includes machine type, local SSDs, and number of replicas. See example:
subgraph_sampler_config:
machine_type: "n1-standard-4"
num_local_ssds: 1
num_replicas: 2
Split Generator Config
The split_generator_config
specifies settings for the Spark split generator component, includes machine type, local SSDs, and number of replicas
split_generator_config:
machine_type: "n1-standard-4"
num_local_ssds: 1
num_replicas: 2
Trainer Config
The trainer_config
specifies settings for the trainer config, currently supporting Vertex AI training or Local Training.
Vertex AI Trainer Config: The
vertex_ai_trainer_config
field of the trainer config requires a machine type, GPU type, GPU limit, and number of replicas. See example:trainer_config: vertex_ai_trainer_config: machine_type: "n1-standard-8" gpu_type: "nvidia-tesla-t4" gpu_limit: 1 num_replicas: 1
Local Trainer Config: The
local_trainer_config
field of the trainer config just requiresnum_workers
which can be used for data loaders.
Inferencer Config
The inferencer_config
specifies settings for the Dataflow preprocessor component, includes number of workers, machine type, and disk size. See example:
inferencer_config:
num_workers: 1
max_num_workers: 256
machine_type: "c2-standard-16"
disk_size_gb: 100