gigl.src.common.models.layers.CountMinSketch#

class gigl.src.common.models.layers.count_min_sketch.CountMinSketch(width: int = 2000, depth: int = 10)#

Bases: object

A probability data structure that can be used to estimate the frequency of an item in a stream. For full details please refer to the paper https://dsf.berkeley.edu/cs286/papers/countmin-latin2004.pdf

There is also a good blog from Redis to talk about its application https://redis.io/blog/count-min-sketch-the-art-and-science-of-estimating-stuff/

How accurate is the estimation?: Denote the total count is N, and the width of the table is w, the depth of the table is d. For each row (in d), we have atleast 1/2 of the probability that the hashed count is less than 2N/w Then, with d rows, the final results is more than 2N/w larger than the actual count with a probability less than 1/2^d

Currently, this implementation only uses single thread, and we might want to optimize it if we see performance issue

Methods

__init__

add

Add an item to the sketch

add_torch_long_tensor

Add all items in a torch long tensor to the sketch

estimate

Return the estimated count of the item

estimate_torch_long_tensor

Return the estimated count of all items in a torch long tensor

get_table

Return the internal state of the table, for testing purpose

total

Return the total number of items seen so far

__init__(width: int = 2000, depth: int = 10)#

__weakref__#: list of weak references to the object (if defined)

add(item: Any, delta: int = 1) → None#: Add an item to the sketch

add_torch_long_tensor(tensor: LongTensor) → None#: Add all items in a torch long tensor to the sketch

estimate(item: Any) → int#: Return the estimated count of the item

estimate_torch_long_tensor(tensor: LongTensor) → LongTensor#: Return the estimated count of all items in a torch long tensor

get_table() → ndarray#: Return the internal state of the table, for testing purpose

total() → int#: Return the total number of items seen so far

`__init__`
`add`	Add an item to the sketch
`add_torch_long_tensor`	Add all items in a torch long tensor to the sketch
`estimate`	Return the estimated count of the item
`estimate_torch_long_tensor`	Return the estimated count of all items in a torch long tensor
`get_table`	Return the internal state of the table, for testing purpose
`total`	Return the total number of items seen so far