Quick Start Guide#

GiGL is a flexible framework that allows customization for many graph ML tasks in its components like data data pre-processing, training logic, inference.

This page outlines the steps needed to get up and running an end to end pipeline in different scenarios, starting from a simple local setup to more complex cloud-based operations.

Install GiGL#

Before proceeding, make sure you have correctly installed gigl by following the installation guide.

Quick Start: Local#

Note

Not yet supported

Running an end to end GiGL pipeline on your local machine is not yet supported. This would help provide a quick way to test functionalities without any cloud dependencies. Stay tuned for future releases!

Quick Start: Running Distributed on Cloud#

This section outlines the steps needed to get up and running an end to end pipeline (via GCP) on an in-built sample task: training and inference for transductive node-anchor based link prediction task on Cora (homogeneous). Running in the cloud enables the usage of services like Dataproc, VertexAI, and Dataflow to enable large-scale distributed/optimized performance.

Config Setup#

To run an end to end pipeline in GiGL, two config files are required. In this guide, some samples/templates are provided to get started but these would need to be modified as needed for custom tasks.

Resource Config:

The resource config contains GCP project specific information (service account, buckets, etc.) as well as GiGL Component resource allocation. To setup cloud resources for shared_resource_config.common_compute_config, see cloud setup guide.

Once you have setup your cloud project, you can populate the common_compute_config section of the template resource config provided in GiGL/docs/examples/template_resource_config.yaml. The remainder of the resource config is populated with some pre-defined values which should be suitable for this task.

For more information, see resource config guide.

Task Config:

The template task config is for populating custom class paths, custom arguments, and data configuations which will be passed into config populator. For task config usage/spec creation, see the task_config_guide.

In this guide, we will be using a built-in preprocessor config to run on one of GiGL’s supported mocked datasets (in this specific example cora_homogeneous_node_anchor_edge_features). The other supported mocked datasets can be seen in python/tests/test_assets/dataset_mocking/lib/mocked_dataset_artifact_metadata.json

The path for the template task config is: python/tests/test_assets/dataset_mocking/pipeline_test_assets/configs/e2e_node_anchor_based_link_prediction_template_gbml_config.yaml

Running an End To End GiGL Pipeline#

Now that we have our two config files setup, we can now kick off an end to end GiGL run.

GiGL supports various ways to orchestrate an end to end run such as Kubeflow Orchestration, GiGL Runner, and manual component import and running as needed. For more details see here

Below is an example on how we can use GiGL Runner and use the specs we created above.


from gigl.orchestration.local.runner import Runner, PipelineConfig
from gigl.common import UriFactory
from gigl.src.common.types import AppliedTaskIdentifier

pipeline_config = PipelineConfig(
    applied_task_identifier=AppliedTaskIdentifier("my-first-gigl-job"),
    template_task_config_uri=UriFactory.create_uri("python/tests/test_assets/dataset_mocking/pipeline_test_assets/configs/e2e_node_anchor_based_link_prediction_template_gbml_config.yaml"),
    resource_config_uri=UriFactory.create_uri("path-to-your-resource-config.yaml"),
)

runner = Runner(pipeline_config=pipeline_config)
runner.run()

Quick Start: Digging Deeper and Advanced Usage#

Now that you have an idea on how GiGL works, you may want to explore advanced customization options for your specific tasks. This section directs you to various guides that detail how to create and modify task specifications, use custom data, and general customization:

  • Task Spec Customization: For any custom logic needed at the component level, like pulling your own data, writing custom training/inference logic, or task specific arguments, see the task_config_guide.

  • Behind the Scenes: To better understand how each of GiGL’s components interact and operate, see the components page

  • Examples: For easy references and make your next steps easier, various example walkthroughs are available on the examples page. See here