gigl.src.post_process.utils.cosine_similarity#
Functions
Return: a pd.Dataframe with columns: {DEFAULT_NODE_ID_FIELD, _emb_1, _emb_2, COSINE_SIM_FIELD} NOTE: Currently, the query below takes 17min for n=100M. |
|
Calculates statistics of cosine similarity Args: pd.DataFrame: with columns: {DEFAULT_NODE_ID_FIELD, _emb_1, _emb_2, COSINE_SIM_FIELD} Returns: pd.DataFrame: with columns: {count, mean, std, min, 1%, 5%, 25%, 50%, 75%, 95%, 99%, max, dtype} |
|
Args: |