gigl.src.training.v1.lib.data_loaders.RootedNodeNeighborhoodBatch#

class gigl.src.training.v1.lib.data_loaders.rooted_node_neighborhood_data_loader.RootedNodeNeighborhoodBatch(graph: 'Union[torch_geometric.data.Data, torch_geometric.data.hetero_data.HeteroData]', condensed_node_type_to_root_node_indices_map: 'Dict[CondensedNodeType, torch.LongTensor]', root_nodes: 'List[Node]', condensed_node_type_to_subgraph_id_to_global_node_id: 'Dict[CondensedNodeType, Dict[NodeId, NodeId]]')#

Bases: object

Methods

__init__

collate_pyg_rooted_node_neighborhood_minibatch

We coalesce the various sample subgraphs to build a single unified neighborhood, which we use for message passing.

get_default_data_loader

We often want to set should_loop = True because we want to be able to fetch random negatives on demand for each main-sample batch without worrying about this DataLoader "running out" of data.

preprocess_rooted_node_neighborhood_raw_sample_fn

preprocess_rooted_node_neighborhood_sample_fn

process_raw_pyg_samples_and_collate_fn

__eq__(other)#

Return self==value.

__hash__ = None#
__init__(graph: Data | HeteroData, condensed_node_type_to_root_node_indices_map: Dict[CondensedNodeType, LongTensor], root_nodes: List[Node], condensed_node_type_to_subgraph_id_to_global_node_id: Dict[CondensedNodeType, Dict[NodeId, NodeId]]) None#
__repr__()#

Return repr(self).

__weakref__#

list of weak references to the object (if defined)

static collate_pyg_rooted_node_neighborhood_minibatch(builder: GraphBuilder, graph_metadata_pb_wrapper: GraphMetadataPbWrapper, preprocessed_metadata_pb_wrapper: PreprocessedMetadataPbWrapper, samples: List[Dict[NodeType, RootedNodeNeighborhoodSample]]) RootedNodeNeighborhoodBatch#

We coalesce the various sample subgraphs to build a single unified neighborhood, which we use for message passing. By coalescing, overlaps between multiple samples’ subgraphs will be handled gracefully, and we will only conduct message passing over these edges once. If we do not coalesce, an edge e which appears in k samples’ subgraphs would result in a k-factor duplication of edges, edge features and messages. Likewise, a node n which appears in k samples’ subgraphs would result in a k-factor duplication of node features. :param samples: :return:

static get_default_data_loader(gbml_config_pb_wrapper: GbmlConfigPbWrapper, graph_builder: GraphBuilder, config: DataloaderConfig) DataLoader#

We often want to set should_loop = True because we want to be able to fetch random negatives on demand for each main-sample batch without worrying about this DataLoader “running out” of data. If this dataset were not “loopy”, then we could run into a scenario where e.g. the main-sample dataloader has 20 batches, but the random-negative dataloader only has 10. This pacing issue would cause us to not be able to fetch random negatives for the last 10 main-sample batches, undesirably.