gigl.src.common.graph_builder.PygGraphData#
- class gigl.src.common.graph_builder.pyg_graph_data.PygGraphData(**kwargs)#
Bases:
HeteroData,GbmlGraphDataProtocolExtends pytorch geometric graph data objects to provide support for more functionality. i.e. providing functionality to do equality checks
Methods
Applies the function
func, either to all attributes or only the ones given in*args.Applies the in-place function
func, either to all attributes or only the ones given in*args.Returns True if the two GbmlGraphDataProtocol objects do not share any edges.
Args:
Performs cloning of tensors, either for all attributes or only the ones given in
*args.Sorts and removes duplicated entries from edge indices
edge_index.Collects the attribute
keyfrom all node and edge types.Concatenates
selfwith anotherdataobject.contains_isolated_nodescontains_self_loopsEnsures a contiguous memory layout, either for all attributes or only the ones given in
*args.Returns the edge indices in the
GraphStorein COO format.Copies attributes to CPU memory, either for all attributes or only the ones given in
*args.Returns the edge indices in the
GraphStorein CSC format.Returns the edge indices in the
GraphStorein CSR format.Copies attributes to CUDA memory, either for all attributes or only the ones given in
*args.debugDetaches attributes from the computation graph by creating a new tensor, either for all attributes or only the ones given in
*args.Detaches attributes from the computation graph, either for all attributes or only the ones given in
*args.Returns all edge-level tensor attribute names.
Returns a list of edge type and edge storage pairs.
Returns the induced subgraph given by the edge indices in
subset_dictfor certain edge types.Returns the subgraph induced by the given
edge_types, i.e. the returnedHeteroDataobject only contains the edge types which are included inedge_types, and only contains the node types of the end points which are included innode_types.Creates a
HeteroDataobject from a dictionary.from_hetero_dataGenerates and sets
n_idande_idattributes to assign each node and edge to a continuously ascending and unique ID.Returns all registered edge attributes.
Returns all registered tensor attributes.
Synchronously obtains an
edge_indextuple from theGraphStore.Gets the
EdgeStorageobject of a particular edge type given by the tuple(src, rel, dst).Computes and fetches a dictionary mapping the global edge to its relevant edge features
Computes and fetches a dictionary mapping the global node to its relevant node features
Gets the
NodeStorageobject of a particular node typekey.Synchronously obtains a
tensorfrom theFeatureStore.Obtains the size of a tensor given its
TensorAttr, orNoneif the tensor does not exist.Returns
Trueif the graph contains isolated nodes.Returns
Trueif the graph contains self-loops.Returns
Trueif edge indicesedge_indexare sorted and do not contain duplicate entries.Returns
Trueif graph edges are directed.Returns
Trueif edge indicesedge_indexare sorted.Returns
Trueiftimeis sorted.Returns
Trueif graph edges are undirected.Returns a list of all graph attribute names.
Returns the heterogeneous meta-data, i.e. its node and edge types.
Synchronously obtains a list of tensors from the
FeatureStorefor each tensor associated with the attributes inattrs.Returns all node-level tensor attribute names.
Returns a list of node type and node storage pairs.
Returns the subgraph induced by the given
node_types, i.e. the returnedHeteroDataobject only contains the node types which are included innode_types, and only contains the edge types where both end points are included innode_types.Copies attributes to pinned memory, either for all attributes or only the ones given in
*args.Synchronously adds an
edge_indextuple to theGraphStore.Synchronously adds a
tensorto theFeatureStore.Ensures that the tensor memory is not reused for another tensor until all current work queued on
streamhas been completed, either for all attributes or only the ones given in*args.Synchronously deletes an
edge_indextuple from theGraphStore.Removes a tensor from the
FeatureStore.Renames the node type
nametonew_namein-place.Tracks gradient computation, either for all attributes or only the ones given in
*args.Sets the values in the dictionary
value_dictto the attribute with namekeyto all node/edge types present in the dictionary.Moves attributes to shared memory, either for all attributes or only the ones given in
*args.Returns the size of the adjacency matrix induced by the graph.
Returns a snapshot of
datato only hold events that occurred in period[start_time, end_time].Sorts edge indices
edge_indexand their corresponding edge features.Sorts data associated with
timeaccording totime.stores_asReturns the induced subgraph containing the node types and corresponding nodes in
subset_dict.Performs tensor device conversion, either for all attributes or only the ones given in
*args.Returns a dictionary of stored key/value pairs.
Convert the PygGraphData object back to a PyG HeteroData object
Converts a
HeteroDataobject to a homogeneousDataobject.Returns a
NamedTupleof stored key/value pairs.Returns a snapshot of
datato only hold events that occurred up toend_time(inclusive ofedge_time).Updates the data object with the elements from another data object.
Updates a
tensorin theFeatureStorewith a new value.Validates the correctness of the data.
Returns a view of the
FeatureStoregiven a not yet fully-specifiedTensorAttr.- __cat_dim__(key: str, value: Any, store: NodeStorage | EdgeStorage | None = None, *args, **kwargs) Any#
Returns the dimension for which the value
valueof the attributekeywill get concatenated when creating mini-batches usingtorch_geometric.loader.DataLoader.Note
This method is for internal use only, and should only be overridden in case the mini-batch creation process is corrupted for a specific attribute.
- __contains__(key: str) bool#
Returns
Trueif the attributekeyis present in the data.
- __delattr__(key: str)#
Implement delattr(self, name).
- __delitem__(*args: str | Tuple[str, str, str] | Tuple[str, str])#
Supports
del store[tensor_attr].
- __eq__(other: object) bool#
Return self==value.
- __getitem__(*args: str | Tuple[str, str, str] | Tuple[str, str]) Any#
Supports pythonic indexing into the
FeatureStore.In particular, the following rules are followed for indexing:
A fully-specified
keywill produce a tensor output.A partially-specified
keywill produce anAttrViewoutput, which is a view on theFeatureStore. If a view is called, it will produce a tensor output from the corresponding (partially specified) attributes.
- __hash__ = None#
- __inc__(key: str, value: Any, store: NodeStorage | EdgeStorage | None = None, *args, **kwargs) Any#
Returns the incremental count to cumulatively increase the value
valueof the attributekeywhen creating mini-batches usingtorch_geometric.loader.DataLoader.Note
This method is for internal use only, and should only be overridden in case the mini-batch creation process is corrupted for a specific attribute.
- __init__(**kwargs) None#
- classmethod __init_subclass__(*args, **kwargs)#
This method is called when a class is subclassed.
The default implementation does nothing. It may be overridden to extend subclasses.
- __len__() int#
Returns the number of graph attributes.
- __repr__() str#
Return repr(self).
- __setattr__(key: str, value: Any)#
Need to override functionality cause HeteroData does some weird logic with its __setattr__ function making @property.setter un-usable
- __setitem__(key: str, value: Any)#
Supports
store[tensor_attr] = tensor.
- __subclasshook__()#
Abstract classes can override this to customize issubclass().
This is invoked early on by abc.ABCMeta.__subclasscheck__(). It should return True, False or NotImplemented. If it returns NotImplemented, the normal algorithm is used. Otherwise, it overrides the normal algorithm (and the outcome is cached).
- __weakref__#
list of weak references to the object (if defined)
- apply(func: Callable, *args: str)#
Applies the function
func, either to all attributes or only the ones given in*args.
- apply_(func: Callable, *args: str)#
Applies the in-place function
func, either to all attributes or only the ones given in*args.
- static are_disjoint(a: GbmlGraphDataProtocol, b: GbmlGraphDataProtocol) bool#
Returns True if the two GbmlGraphDataProtocol objects do not share any edges. :param a: :param b: :return:
- static are_same_graph(a: GbmlGraphDataProtocol, b: GbmlGraphDataProtocol) bool#
- Args:
a (GbmlGraphDataProtocol) b (GbmlGraphDataProtocol)
- Returns:
bool: Returns True if both a and b objects that implement GbmlGraphDataProtocol represent the same graph in the global space. i.e. both have same nodes + related features, and edges + related features. i.e. a form of loose equality.
For example for a: PygGraphData and b: PygGraphData, may both have same 3 nodes and 3 edges with the same features. But, because they are built in a specific way i.e order of edges and nodes, they may not be strictly equal: a != b. But really, the two represent the same “graph” in different ways. This function fills that gap.
- clone(*args: str)#
Performs cloning of tensors, either for all attributes or only the ones given in
*args.
- coalesce() Self#
Sorts and removes duplicated entries from edge indices
edge_index.
- collect(key: str, allow_empty: bool = False) Dict[str | Tuple[str, str, str], Any]#
Collects the attribute
keyfrom all node and edge types.data = HeteroData() data['paper'].x = ... data['author'].x = ... print(data.collect('x')) >>> { 'paper': ..., 'author': ...}
Note
This is equivalent to writing
data.x_dict.- Args:
key (str): The attribute to collect from all node and ege types. allow_empty (bool, optional): If set to
True, will not raisean error in case the attribute does not exit in any node or edge type. (default:
False)
- concat(data: Self) Self#
Concatenates
selfwith anotherdataobject. All values needs to have matching shapes at non-concat dimensions.
- contiguous(*args: str)#
Ensures a contiguous memory layout, either for all attributes or only the ones given in
*args.
- coo(edge_types: List[Any] | None = None, store: bool = False) Tuple[Dict[Tuple[str, str, str], Tensor], Dict[Tuple[str, str, str], Tensor], Dict[Tuple[str, str, str], Tensor | None]]#
Returns the edge indices in the
GraphStorein COO format.- Args:
- edge_types (List[Any], optional): The edge types of edge indices
to obtain. If set to
None, will return the edge indices of all existing edge types. (default:None)- store (bool, optional): Whether to store converted edge indices in
the
GraphStore. (default:False)
- cpu(*args: str)#
Copies attributes to CPU memory, either for all attributes or only the ones given in
*args.
- csc(edge_types: List[Any] | None = None, store: bool = False) Tuple[Dict[Tuple[str, str, str], Tensor], Dict[Tuple[str, str, str], Tensor], Dict[Tuple[str, str, str], Tensor | None]]#
Returns the edge indices in the
GraphStorein CSC format.- Args:
- edge_types (List[Any], optional): The edge types of edge indices
to obtain. If set to
None, will return the edge indices of all existing edge types. (default:None)- store (bool, optional): Whether to store converted edge indices in
the
GraphStore. (default:False)
- csr(edge_types: List[Any] | None = None, store: bool = False) Tuple[Dict[Tuple[str, str, str], Tensor], Dict[Tuple[str, str, str], Tensor], Dict[Tuple[str, str, str], Tensor | None]]#
Returns the edge indices in the
GraphStorein CSR format.- Args:
- edge_types (List[Any], optional): The edge types of edge indices
to obtain. If set to
None, will return the edge indices of all existing edge types. (default:None)- store (bool, optional): Whether to store converted edge indices in
the
GraphStore. (default:False)
- cuda(device: int | str | None = None, *args: str, non_blocking: bool = False)#
Copies attributes to CUDA memory, either for all attributes or only the ones given in
*args.
- detach(*args: str)#
Detaches attributes from the computation graph by creating a new tensor, either for all attributes or only the ones given in
*args.
- detach_(*args: str)#
Detaches attributes from the computation graph, either for all attributes or only the ones given in
*args.
- edge_attrs() List[str]#
Returns all edge-level tensor attribute names.
- edge_items() List[Tuple[Tuple[str, str, str], EdgeStorage]]#
Returns a list of edge type and edge storage pairs.
- property edge_stores: List[EdgeStorage]#
Returns a list of all edge storages of the graph.
- edge_subgraph(subset_dict: Dict[Tuple[str, str, str], Tensor]) Self#
Returns the induced subgraph given by the edge indices in
subset_dictfor certain edge types. Will currently preserve all the nodes in the graph, even if they are isolated after subgraph computation.- Args:
- subset_dict (Dict[Tuple[str, str, str], LongTensor or BoolTensor]):
A dictionary holding the edges to keep for each edge type.
- edge_type_subgraph(edge_types: List[Tuple[str, str, str]]) Self#
Returns the subgraph induced by the given
edge_types, i.e. the returnedHeteroDataobject only contains the edge types which are included inedge_types, and only contains the node types of the end points which are included innode_types.
- property edge_types: List[Tuple[str, str, str]]#
Returns a list of all edge types of the graph.
- property edge_types_to_be_registered: List[EdgeType]#
Maintains a list of EdgeTypes associated with this graph data.
Used in conjunction with GraphBuilder, to preserve EdgeTypes when combining multiple GbmlGraphDataProtocol objects together.
- Returns:
List[EdgeType]
- classmethod from_dict(mapping: Dict[str, Any]) Self#
Creates a
HeteroDataobject from a dictionary.
- generate_ids()#
Generates and sets
n_idande_idattributes to assign each node and edge to a continuously ascending and unique ID.
- get_all_edge_attrs() List[EdgeAttr]#
Returns all registered edge attributes.
- get_all_tensor_attrs() List[TensorAttr]#
Returns all registered tensor attributes.
- get_edge_index(*args, **kwargs) Tuple[Tensor, Tensor]#
Synchronously obtains an
edge_indextuple from theGraphStore.
- get_edge_store(src: str, rel: str, dst: str) EdgeStorage#
Gets the
EdgeStorageobject of a particular edge type given by the tuple(src, rel, dst). If the storage is not present yet, will create a newtorch_geometric.data.storage.EdgeStorageobject for the given edge type.data = HeteroData() edge_storage = data.get_edge_store('author', 'writes', 'paper')
- get_global_edge_features_dict() FrozenDict[Edge, Tensor]#
Computes and fetches a dictionary mapping the global edge to its relevant edge features
- Returns:
FrozenDict[Edge, torch.Tensor]
- get_global_node_features_dict() FrozenDict[Node, Tensor]#
Computes and fetches a dictionary mapping the global node to its relevant node features
- Returns:
FrozenDict[Node, torch.Tensor]
- get_node_store(key: str) NodeStorage#
Gets the
NodeStorageobject of a particular node typekey. If the storage is not present yet, will create a newtorch_geometric.data.storage.NodeStorageobject for the given node type.data = HeteroData() node_storage = data.get_node_store('paper')
- get_tensor(*args, convert_type: bool = False, **kwargs) Tensor | ndarray#
Synchronously obtains a
tensorfrom theFeatureStore.- Args:
*args: Arguments passed to
TensorAttr. convert_type (bool, optional): Whether to convert the type of theoutput tensor to the type of the attribute index. (default:
False)**kwargs: Keyword arguments passed to
TensorAttr.- Raises:
- ValueError: If the input
TensorAttris not fully specified.
- KeyError: If the tensor corresponding to the input
TensorAttrwas not found.
- ValueError: If the input
- get_tensor_size(*args, **kwargs) Tuple[int, ...] | None#
Obtains the size of a tensor given its
TensorAttr, orNoneif the tensor does not exist.
- property global_node_to_subgraph_node_mapping: FrozenDict[Node, Node]#
Maintains Mapping from original Node to Mapped Node that is used in the underlying graph data format
During creation of GBML Data representations using graph libraries such as DGL and Pytorch geometric, there may be occasions where nodes will need to be remapped to contiguous node ids 0, 1, 2 …. either as a requirement from the graph library or to maintain simpler logic for formulating and working with these graphs data formats.
- Returns:
Dict[Node, Node]
- has_isolated_nodes() bool#
Returns
Trueif the graph contains isolated nodes.
- has_self_loops() bool#
Returns
Trueif the graph contains self-loops.
- is_coalesced() bool#
Returns
Trueif edge indicesedge_indexare sorted and do not contain duplicate entries.
- property is_cuda: bool#
Returns
Trueif anytorch.Tensorattribute is stored on the GPU,Falseotherwise.
- is_directed() bool#
Returns
Trueif graph edges are directed.
- is_sorted(sort_by_row: bool = True) bool#
Returns
Trueif edge indicesedge_indexare sorted.- Args:
- sort_by_row (bool, optional): If set to
False, will require column-wise order/by destination node order of
edge_index. (default:True)
- sort_by_row (bool, optional): If set to
- is_sorted_by_time() bool#
Returns
Trueiftimeis sorted.
- is_undirected() bool#
Returns
Trueif graph edges are undirected.
- keys() List[str]#
Returns a list of all graph attribute names.
- metadata() Tuple[List[str], List[Tuple[str, str, str]]]#
Returns the heterogeneous meta-data, i.e. its node and edge types.
data = HeteroData() data['paper'].x = ... data['author'].x = ... data['author', 'writes', 'paper'].edge_index = ... print(data.metadata()) >>> (['paper', 'author'], [('author', 'writes', 'paper')])
- multi_get_tensor(attrs: List[TensorAttr], convert_type: bool = False) List[Tensor | ndarray]#
Synchronously obtains a list of tensors from the
FeatureStorefor each tensor associated with the attributes inattrs.Note
The default implementation simply iterates over all calls to
get_tensor(). Implementor classes that can provide additional, more performant functionality are recommended to to override this method.- Args:
- attrs (List[TensorAttr]): A list of input
TensorAttr objects that identify the tensors to obtain.
- convert_type (bool, optional): Whether to convert the type of the
output tensor to the type of the attribute index. (default:
False)
- attrs (List[TensorAttr]): A list of input
- Raises:
- ValueError: If any input
TensorAttris not fully specified.
- KeyError: If any of the tensors corresponding to the input
TensorAttrwas not found.
- ValueError: If any input
- node_attrs() List[str]#
Returns all node-level tensor attribute names.
- node_items() List[Tuple[str, NodeStorage]]#
Returns a list of node type and node storage pairs.
- property node_stores: List[NodeStorage]#
Returns a list of all node storages of the graph.
- node_type_subgraph(node_types: List[str]) Self#
Returns the subgraph induced by the given
node_types, i.e. the returnedHeteroDataobject only contains the node types which are included innode_types, and only contains the edge types where both end points are included innode_types.
- property node_types: List[str]#
Returns a list of all node types of the graph.
- property num_edge_features: Dict[Tuple[str, str, str], int]#
Returns the number of features per edge type in the graph.
- property num_edges: int#
Returns the number of edges in the graph. For undirected graphs, this will return the number of bi-directional edges, which is double the amount of unique edges.
- property num_features: Dict[str, int]#
Returns the number of features per node type in the graph. Alias for
num_node_features.
- property num_node_features: Dict[str, int]#
Returns the number of features per node type in the graph.
- property num_nodes: int | None#
Returns the number of nodes in the graph.
- pin_memory(*args: str)#
Copies attributes to pinned memory, either for all attributes or only the ones given in
*args.
- put_edge_index(edge_index: Tuple[Tensor, Tensor], *args, **kwargs) bool#
Synchronously adds an
edge_indextuple to theGraphStore. Returns whether insertion was successful.
- put_tensor(tensor: Tensor | ndarray, *args, **kwargs) bool#
Synchronously adds a
tensorto theFeatureStore. Returns whether insertion was successful.
- record_stream(stream: Stream, *args: str)#
Ensures that the tensor memory is not reused for another tensor until all current work queued on
streamhas been completed, either for all attributes or only the ones given in*args.
- remove_edge_index(*args, **kwargs) bool#
Synchronously deletes an
edge_indextuple from theGraphStore. Returns whether deletion was successful.
- remove_tensor(*args, **kwargs) bool#
Removes a tensor from the
FeatureStore. Returns whether deletion was successful.
- rename(name: str, new_name: str) Self#
Renames the node type
nametonew_namein-place.
- requires_grad_(*args: str, requires_grad: bool = True)#
Tracks gradient computation, either for all attributes or only the ones given in
*args.
- set_value_dict(key: str, value_dict: Dict[str, Any]) Self#
Sets the values in the dictionary
value_dictto the attribute with namekeyto all node/edge types present in the dictionary.data = HeteroData() data.set_value_dict('x', { 'paper': torch.randn(4, 16), 'author': torch.randn(8, 32), }) print(data['paper'].x)
Moves attributes to shared memory, either for all attributes or only the ones given in
*args.
- size(dim: int | None = None) Tuple[int | None, int | None] | int | None#
Returns the size of the adjacency matrix induced by the graph.
- snapshot(start_time: float | int, end_time: float | int) Self#
Returns a snapshot of
datato only hold events that occurred in period[start_time, end_time].
- sort(sort_by_row: bool = True) Self#
Sorts edge indices
edge_indexand their corresponding edge features.- Args:
- sort_by_row (bool, optional): If set to
False, will sort edge_indexin column-wise order/by destination node. (default:True)
- sort_by_row (bool, optional): If set to
- sort_by_time() Self#
Sorts data associated with
timeaccording totime.
- property stores: List[BaseStorage]#
Returns a list of all storages of the graph.
- subgraph(subset_dict: Dict[str, Tensor]) Self#
Returns the induced subgraph containing the node types and corresponding nodes in
subset_dict.If a node type is not a key in
subset_dictthen all nodes of that type remain in the graph.data = HeteroData() data['paper'].x = ... data['author'].x = ... data['conference'].x = ... data['paper', 'cites', 'paper'].edge_index = ... data['author', 'paper'].edge_index = ... data['paper', 'conference'].edge_index = ... print(data) >>> HeteroData( paper={ x=[10, 16] }, author={ x=[5, 32] }, conference={ x=[5, 8] }, (paper, cites, paper)={ edge_index=[2, 50] }, (author, to, paper)={ edge_index=[2, 30] }, (paper, to, conference)={ edge_index=[2, 25] } ) subset_dict = { 'paper': torch.tensor([3, 4, 5, 6]), 'author': torch.tensor([0, 2]), } print(data.subgraph(subset_dict)) >>> HeteroData( paper={ x=[4, 16] }, author={ x=[2, 32] }, conference={ x=[5, 8] }, (paper, cites, paper)={ edge_index=[2, 24] }, (author, to, paper)={ edge_index=[2, 5] }, (paper, to, conference)={ edge_index=[2, 10] } )
- Args:
- subset_dict (Dict[str, LongTensor or BoolTensor]): A dictionary
holding the nodes to keep for each node type.
- property subgraph_node_to_global_node_mapping: FrozenDict[Node, Node]#
Inverse mapping of global_node_to_subgraph_node_mapping
- Returns:
FrozenDict[Node, Node]:
- to(device: int | str, *args: str, non_blocking: bool = False)#
Performs tensor device conversion, either for all attributes or only the ones given in
*args.
- to_dict() Dict[str, Any]#
Returns a dictionary of stored key/value pairs.
- to_hetero_data() HeteroData#
Convert the PygGraphData object back to a PyG HeteroData object
- returns:
HeteroData: The converted HeteroData object
- to_homogeneous(node_attrs: List[str] | None = None, edge_attrs: List[str] | None = None, add_node_type: bool = True, add_edge_type: bool = True, dummy_values: bool = True) Data#
Converts a
HeteroDataobject to a homogeneousDataobject. By default, all features with same feature dimensionality across different types will be merged into a single representation, unless otherwise specified via thenode_attrsandedge_attrsarguments. Furthermore, attributes namednode_typeandedge_typewill be added to the returnedDataobject, denoting node-level and edge-level vectors holding the node and edge type as integers, respectively.- Args:
- node_attrs (List[str], optional): The node features to combine
across all node types. These node features need to be of the same feature dimensionality. If set to
None, will automatically determine which node features to combine. (default:None)- edge_attrs (List[str], optional): The edge features to combine
across all edge types. These edge features need to be of the same feature dimensionality. If set to
None, will automatically determine which edge features to combine. (default:None)- add_node_type (bool, optional): If set to
False, will not add the node-level vector
node_typeto the returnedDataobject. (default:True)- add_edge_type (bool, optional): If set to
False, will not add the edge-level vector
edge_typeto the returnedDataobject. (default:True)- dummy_values (bool, optional): If set to
True, will fill attributes of remaining types with dummy values. Dummy values are
NaNfor floating point attributes,Falsefor booleans, and-1for integers. (default:True)
- to_namedtuple() NamedTuple#
Returns a
NamedTupleof stored key/value pairs.
- up_to(end_time: float | int) Self#
Returns a snapshot of
datato only hold events that occurred up toend_time(inclusive ofedge_time).
- update(data: Self) Self#
Updates the data object with the elements from another data object. Added elements will override existing ones (in case of duplicates).
- update_tensor(tensor: Tensor | ndarray, *args, **kwargs) bool#
Updates a
tensorin theFeatureStorewith a new value. Returns whether the update was succesful.Note
Implementor classes can choose to define more efficient update methods; the default performs a removal and insertion.
- validate(raise_on_error: bool = True) bool#
Validates the correctness of the data.
- view(*args, **kwargs) AttrView#
Returns a view of the
FeatureStoregiven a not yet fully-specifiedTensorAttr.