Class BloomFilter<T>
- Type Parameters:
T- the type of instances that theBloomFilteraccepts
- All Implemented Interfaces:
Predicate<T>,Serializable,Predicate<T>
T. A Bloom filter offers an approximate containment test
with one-sided error: if it claims that an element is contained in it, this might be in error,
but if it claims that an element is not contained in it, then this is definitely true.
If you are unfamiliar with Bloom filters, this nice tutorial may help you understand how they work.
The false positive probability (FPP) of a Bloom filter is defined as the probability
that mightContain(Object) will erroneously return true for an object that
has not actually been put in the BloomFilter.
Bloom filters are serializable. They also support a more compact serial representation via the
writeTo(java.io.OutputStream) and readFrom(java.io.InputStream, com.google.common.hash.Funnel<? super T>) methods. Both serialized forms will continue to be
supported by future versions of this library. However, serial forms generated by newer versions
of the code may not be readable by older versions of the code (e.g., a serialized Bloom filter
generated today may not be readable by a binary that was compiled 6 months ago).
As of Guava 23.0, this class is thread-safe and lock-free. It internally uses atomics and compare-and-swap to ensure correctness when multiple threads are used to access it.
- Since:
- 11.0 (thread-safe since 23.0)
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprivate static class(package private) static interfaceA strategy to translate T instances, tonumHashFunctionsbit indexes. -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final BloomFilterStrategies.LockFreeBitArrayThe bit set of the BloomFilter (not necessarily power of 2!)The funnel to translate Ts to bytesprivate final intNumber of hashes per elementprivate static final longprivate final BloomFilter.StrategyThe strategy we employ to map an element T tonumHashFunctionsbit indexes. -
Constructor Summary
ConstructorsModifierConstructorDescriptionprivateBloomFilter(BloomFilterStrategies.LockFreeBitArray bits, int numHashFunctions, Funnel<? super T> funnel, BloomFilter.Strategy strategy) Creates a BloomFilter. -
Method Summary
Modifier and TypeMethodDescriptionbooleanDeprecated.longReturns an estimate for the total number of distinct elements that have been added to this Bloom filter.(package private) longbitSize()Returns the number of bits in the underlying bit array.copy()Creates a newBloomFilterthat's a copy of this instance.static <T> BloomFilter<T> Creates aBloomFilterwith the expected number of insertions and a default expected false positive probability of 3%.static <T> BloomFilter<T> Creates aBloomFilterwith the expected number of insertions and expected false positive probability.static <T> BloomFilter<T> Creates aBloomFilterwith the expected number of insertions and a default expected false positive probability of 3%.static <T> BloomFilter<T> Creates aBloomFilterwith the expected number of insertions and expected false positive probability.(package private) static <T> BloomFilter<T> create(Funnel<? super T> funnel, long expectedInsertions, double fpp, BloomFilter.Strategy strategy) booleanIndicates whether another object is equal to this predicate.doubleReturns the probability that mightContain(Object) will erroneously returntruefor an object that has not actually been put in theBloomFilter.inthashCode()booleanisCompatible(BloomFilter<T> that) Determines whether a given Bloom filter is compatible with this Bloom filter.booleanmightContain(T object) Returnstrueif the element might have been put in this Bloom filter,falseif this is definitely not the case.(package private) static longoptimalNumOfBits(long n, double p) Computes m (total bits of Bloom filter) which is expected to achieve, for the specified expected insertions, the required false positive probability.(package private) static intoptimalNumOfHashFunctions(long n, long m) Computes the optimal k (number of hashes per element inserted in Bloom filter), given the expected insertions and total number of bits in the Bloom filter.booleanPuts an element into thisBloomFilter.voidputAll(BloomFilter<T> that) Combines this Bloom filter with another Bloom filter by performing a bitwise OR of the underlying data.static <T> BloomFilter<T> readFrom(InputStream in, Funnel<? super T> funnel) Reads a byte stream, which was written by writeTo(OutputStream), into aBloomFilter.private voidreadObject(ObjectInputStream stream) static <T> Collector<T, ?, BloomFilter<T>> toBloomFilter(Funnel<? super T> funnel, long expectedInsertions) Returns aCollectorexpecting the specified number of insertions, and yielding aBloomFilterwith false positive probability 3%.static <T> Collector<T, ?, BloomFilter<T>> toBloomFilter(Funnel<? super T> funnel, long expectedInsertions, double fpp) Returns aCollectorexpecting the specified number of insertions, and yielding aBloomFilterwith the specified expected false positive probability.private ObjectvoidwriteTo(OutputStream out) Writes thisBloomFilterto an output stream, with a custom format (not Java serialization).
-
Field Details
-
bits
The bit set of the BloomFilter (not necessarily power of 2!) -
numHashFunctions
private final int numHashFunctionsNumber of hashes per element -
funnel
The funnel to translate Ts to bytes -
strategy
The strategy we employ to map an element T tonumHashFunctionsbit indexes. -
serialVersionUID
private static final long serialVersionUID- See Also:
-
-
Constructor Details
-
BloomFilter
private BloomFilter(BloomFilterStrategies.LockFreeBitArray bits, int numHashFunctions, Funnel<? super T> funnel, BloomFilter.Strategy strategy) Creates a BloomFilter.
-
-
Method Details
-
copy
Creates a newBloomFilterthat's a copy of this instance. The new instance is equal to this instance but shares no mutable state.- Since:
- 12.0
-
mightContain
Returnstrueif the element might have been put in this Bloom filter,falseif this is definitely not the case. -
apply
Deprecated.Provided only to satisfy thePredicateinterface; usemightContain(T)instead.Description copied from interface:PredicateReturns the result of applying this predicate toinput(Java 8+ users, see notes in the class documentation above). This method is generally expected, but not absolutely required, to have the following properties:- Its execution does not cause any observable side effects.
- The computation is consistent with equals; that is,
Objects.equal(a, b)implies thatpredicate.apply(a) == predicate.apply(b)).
-
put
Puts an element into thisBloomFilter. Ensures that subsequent invocations ofmightContain(Object)with the same element will always returntrue.- Returns:
- true if the Bloom filter's bits changed as a result of this operation. If the bits
changed, this is definitely the first time
objecthas been added to the filter. If the bits haven't changed, this might be the first timeobjecthas been added to the filter. Note thatput(t)always returns the opposite result to whatmightContain(t)would have returned at the time it is called. - Since:
- 12.0 (present in 11.0 with
voidreturn type})
-
expectedFpp
public double expectedFpp()Returns the probability that mightContain(Object) will erroneously returntruefor an object that has not actually been put in theBloomFilter.Ideally, this number should be close to the
fppparameter passed in create(Funnel, int, double), or smaller. If it is significantly higher, it is usually the case that too many elements (more than expected) have been put in theBloomFilter, degenerating it.- Since:
- 14.0 (since 11.0 as expectedFalsePositiveProbability())
-
approximateElementCount
public long approximateElementCount()Returns an estimate for the total number of distinct elements that have been added to this Bloom filter. This approximation is reasonably accurate if it does not exceed the value ofexpectedInsertionsthat was used when constructing the filter.- Since:
- 22.0
-
bitSize
long bitSize()Returns the number of bits in the underlying bit array. -
isCompatible
Determines whether a given Bloom filter is compatible with this Bloom filter. For two Bloom filters to be compatible, they must:- not be the same instance
- have the same number of hash functions
- have the same bit size
- have the same strategy
- have equal funnels
- Parameters:
that- The Bloom filter to check for compatibility.- Since:
- 15.0
-
putAll
Combines this Bloom filter with another Bloom filter by performing a bitwise OR of the underlying data. The mutations happen to this instance. Callers must ensure the Bloom filters are appropriately sized to avoid saturating them.- Parameters:
that- The Bloom filter to combine this Bloom filter with. It is not mutated.- Throws:
IllegalArgumentException- ifisCompatible(that) == false- Since:
- 15.0
-
equals
Description copied from interface:PredicateIndicates whether another object is equal to this predicate.Most implementations will have no reason to override the behavior of
Object.equals(java.lang.Object). However, an implementation may also choose to returntruewheneverobjectis aPredicatethat it considers interchangeable with this one. "Interchangeable" typically means thatthis.apply(t) == that.apply(t)for alltof typeT). Note that afalseresult from this method does not imply that the predicates are known not to be interchangeable. -
hashCode
public int hashCode() -
toBloomFilter
public static <T> Collector<T,?, toBloomFilterBloomFilter<T>> (Funnel<? super T> funnel, long expectedInsertions) Returns aCollectorexpecting the specified number of insertions, and yielding aBloomFilterwith false positive probability 3%.Note that if the
Collectorreceives significantly more elements than specified, the resultingBloomFilterwill suffer a sharp deterioration of its false positive probability.The constructed
BloomFilterwill be serializable if the providedFunnel<T>is.It is recommended that the funnel be implemented as a Java enum. This has the benefit of ensuring proper serialization and deserialization, which is important since
equals(java.lang.Object)also relies on object identity of funnels.- Parameters:
funnel- the funnel of T's that the constructedBloomFilterwill useexpectedInsertions- the number of expected insertions to the constructedBloomFilter; must be positive- Returns:
- a
Collectorgenerating aBloomFilterof the received elements - Since:
- 23.0
-
toBloomFilter
public static <T> Collector<T,?, toBloomFilterBloomFilter<T>> (Funnel<? super T> funnel, long expectedInsertions, double fpp) Returns aCollectorexpecting the specified number of insertions, and yielding aBloomFilterwith the specified expected false positive probability.Note that if the
Collectorreceives significantly more elements than specified, the resultingBloomFilterwill suffer a sharp deterioration of its false positive probability.The constructed
BloomFilterwill be serializable if the providedFunnel<T>is.It is recommended that the funnel be implemented as a Java enum. This has the benefit of ensuring proper serialization and deserialization, which is important since
equals(java.lang.Object)also relies on object identity of funnels.- Parameters:
funnel- the funnel of T's that the constructedBloomFilterwill useexpectedInsertions- the number of expected insertions to the constructedBloomFilter; must be positivefpp- the desired false positive probability (must be positive and less than 1.0)- Returns:
- a
Collectorgenerating aBloomFilterof the received elements - Since:
- 23.0
-
create
public static <T> BloomFilter<T> create(Funnel<? super T> funnel, int expectedInsertions, double fpp) Creates aBloomFilterwith the expected number of insertions and expected false positive probability.Note that overflowing a
BloomFilterwith significantly more elements than specified, will result in its saturation, and a sharp deterioration of its false positive probability.The constructed
BloomFilterwill be serializable if the providedFunnel<T>is.It is recommended that the funnel be implemented as a Java enum. This has the benefit of ensuring proper serialization and deserialization, which is important since
equals(java.lang.Object)also relies on object identity of funnels.- Parameters:
funnel- the funnel of T's that the constructedBloomFilterwill useexpectedInsertions- the number of expected insertions to the constructedBloomFilter; must be positivefpp- the desired false positive probability (must be positive and less than 1.0)- Returns:
- a
BloomFilter
-
create
public static <T> BloomFilter<T> create(Funnel<? super T> funnel, long expectedInsertions, double fpp) Creates aBloomFilterwith the expected number of insertions and expected false positive probability.Note that overflowing a
BloomFilterwith significantly more elements than specified, will result in its saturation, and a sharp deterioration of its false positive probability.The constructed
BloomFilterwill be serializable if the providedFunnel<T>is.It is recommended that the funnel be implemented as a Java enum. This has the benefit of ensuring proper serialization and deserialization, which is important since
equals(java.lang.Object)also relies on object identity of funnels.- Parameters:
funnel- the funnel of T's that the constructedBloomFilterwill useexpectedInsertions- the number of expected insertions to the constructedBloomFilter; must be positivefpp- the desired false positive probability (must be positive and less than 1.0)- Returns:
- a
BloomFilter - Since:
- 19.0
-
create
static <T> BloomFilter<T> create(Funnel<? super T> funnel, long expectedInsertions, double fpp, BloomFilter.Strategy strategy) -
create
Creates aBloomFilterwith the expected number of insertions and a default expected false positive probability of 3%.Note that overflowing a
BloomFilterwith significantly more elements than specified, will result in its saturation, and a sharp deterioration of its false positive probability.The constructed
BloomFilterwill be serializable if the providedFunnel<T>is.It is recommended that the funnel be implemented as a Java enum. This has the benefit of ensuring proper serialization and deserialization, which is important since
equals(java.lang.Object)also relies on object identity of funnels.- Parameters:
funnel- the funnel of T's that the constructedBloomFilterwill useexpectedInsertions- the number of expected insertions to the constructedBloomFilter; must be positive- Returns:
- a
BloomFilter
-
create
Creates aBloomFilterwith the expected number of insertions and a default expected false positive probability of 3%.Note that overflowing a
BloomFilterwith significantly more elements than specified, will result in its saturation, and a sharp deterioration of its false positive probability.The constructed
BloomFilterwill be serializable if the providedFunnel<T>is.It is recommended that the funnel be implemented as a Java enum. This has the benefit of ensuring proper serialization and deserialization, which is important since
equals(java.lang.Object)also relies on object identity of funnels.- Parameters:
funnel- the funnel of T's that the constructedBloomFilterwill useexpectedInsertions- the number of expected insertions to the constructedBloomFilter; must be positive- Returns:
- a
BloomFilter - Since:
- 19.0
-
optimalNumOfHashFunctions
static int optimalNumOfHashFunctions(long n, long m) Computes the optimal k (number of hashes per element inserted in Bloom filter), given the expected insertions and total number of bits in the Bloom filter.See http://en.wikipedia.org/wiki/File:Bloom_filter_fp_probability.svg for the formula.
- Parameters:
n- expected insertions (must be positive)m- total number of bits in Bloom filter (must be positive)
-
optimalNumOfBits
static long optimalNumOfBits(long n, double p) Computes m (total bits of Bloom filter) which is expected to achieve, for the specified expected insertions, the required false positive probability.See http://en.wikipedia.org/wiki/Bloom_filter#Probability_of_false_positives for the formula.
- Parameters:
n- expected insertions (must be positive)p- false positive rate (must be 0 invalid input: '<' p invalid input: '<' 1)
-
writeReplace
-
readObject
- Throws:
InvalidObjectException
-
writeTo
Writes thisBloomFilterto an output stream, with a custom format (not Java serialization). This has been measured to save at least 400 bytes compared to regular serialization.Use readFrom(InputStream, Funnel) to reconstruct the written BloomFilter.
- Throws:
IOException
-
readFrom
public static <T> BloomFilter<T> readFrom(InputStream in, Funnel<? super T> funnel) throws IOException Reads a byte stream, which was written by writeTo(OutputStream), into aBloomFilter.The
Funnelto be used is not encoded in the stream, so it must be provided here. Warning: the funnel provided must behave identically to the one used to populate the original Bloom filter!- Throws:
IOException- if the InputStream throws anIOException, or if its data does not appear to be a BloomFilter serialized using the writeTo(OutputStream) method.
-
Predicateinterface; usemightContain(T)instead.