gigl.distributed.utils#

Modules

gigl.distributed.utils.device

gigl.distributed.utils.init_neighbor_loader_worker(...)

Sets up processes and torch device for initializing the GLT DistNeighborLoader, setting up RPC and worker groups to minimize the memory overhead and CPU contention. Returns the torch device which current worker is assigned to. Args: master_ip_address (str): Master IP Address to manage processes local_process_rank (int): Process number on the current machine local_process_world_size (int): Total number of processes on the current machine rank (int): Rank of current machine world_size (int): Total number of machines master_worker_port (int): Master port to use for communicating between workers during training or inference device (torch.device): The device where you want to load the data onto - i.e. where is your model? should_use_cpu_workers (bool): Whether we should do CPU training or inference. num_cpu_threads (Optional[int]): Number of cpu threads PyTorch should use for CPU training or inference. Must be set if should_use_cpu_workers is True. process_start_gap_seconds (float): Delay between each process for initializing neighbor loader. At large scales, it is recommended to set this value to be between 60 and 120 seconds -- otherwise multiple processes may attempt to initialize dataloaders at overlapping timesß, which can cause CPU memory OOM. Returns: torch.device: Device which current worker is assigned to.

gigl.distributed.utils.partition_book

gigl.distributed.utils.serialized_graph_metadata_translator