gigl.src.post_process.utils.cosine_similarity#

Functions

assert_cosine_similarity_stats

calculate_cosine_sim_between_embedding_tables

Return: a pd.Dataframe with columns: {DEFAULT_NODE_ID_FIELD, _emb_1, _emb_2, COSINE_SIM_FIELD} NOTE: Currently, the query below takes 17min for n=100M.

calculate_cosine_similarity_stats

Calculates statistics of cosine similarity Args: pd.DataFrame: with columns: {DEFAULT_NODE_ID_FIELD, _emb_1, _emb_2, COSINE_SIM_FIELD} Returns: pd.DataFrame: with columns: {count, mean, std, min, 1%, 5%, 25%, 50%, 75%, 95%, 99%, max, dtype}

get_table_paths_via_timedelta

Args: