Class CountingBloomFilter

java.lang.Object
org.apache.hadoop.util.bloom.Filter
org.apache.hadoop.util.bloom.CountingBloomFilter
All Implemented Interfaces:
Writable

@Public @Stable public final class CountingBloomFilter extends Filter
Implements a counting Bloom filter, as defined by Fan et al. in a ToN 2000 paper.

A counting Bloom filter is an improvement to standard a Bloom filter as it allows dynamic additions and deletions of set membership information. This is achieved through the use of a counting vector instead of a bit vector.

Originally created by European Commission One-Lab Project 034819.

See Also:
  • Field Summary

    Fields inherited from class org.apache.hadoop.util.bloom.Filter

    hash, hashType, nbHash, vectorSize
  • Constructor Summary

    Constructors
    Constructor
    Description
    Default constructor - use with readFields
    CountingBloomFilter(int vectorSize, int nbHash, int hashType)
    Constructor
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    add(Key key)
    Adds a key to this filter.
    void
    and(Filter filter)
    Peforms a logical AND between this filter and a specified filter.
    int
    This method calculates an approximate count of the key, i.e. how many times the key was added to the filter.
    void
    delete(Key key)
    Removes a specified key from this counting Bloom filter.
    boolean
    Determines wether a specified key belongs to this filter.
    void
    not()
    Performs a logical NOT on this filter.
    void
    or(Filter filter)
    Peforms a logical OR between this filter and a specified filter.
    void
    Deserialize the fields of this object from in.
     
    void
    Serialize the fields of this object to out.
    void
    xor(Filter filter)
    Peforms a logical XOR between this filter and a specified filter.

    Methods inherited from class org.apache.hadoop.util.bloom.Filter

    add, add, add

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
  • Constructor Details

    • CountingBloomFilter

      public CountingBloomFilter()
      Default constructor - use with readFields
    • CountingBloomFilter

      public CountingBloomFilter(int vectorSize, int nbHash, int hashType)
      Constructor
      Parameters:
      vectorSize - The vector size of this filter.
      nbHash - The number of hash function to consider.
      hashType - type of the hashing function (see Hash).
  • Method Details

    • add

      public void add(Key key)
      Description copied from class: Filter
      Adds a key to this filter.
      Specified by:
      add in class Filter
      Parameters:
      key - The key to add.
    • delete

      public void delete(Key key)
      Removes a specified key from this counting Bloom filter.

      Invariant: nothing happens if the specified key does not belong to this counter Bloom filter.

      Parameters:
      key - The key to remove.
    • and

      public void and(Filter filter)
      Description copied from class: Filter
      Peforms a logical AND between this filter and a specified filter.

      Invariant: The result is assigned to this filter.

      Specified by:
      and in class Filter
      Parameters:
      filter - The filter to AND with.
    • membershipTest

      public boolean membershipTest(Key key)
      Description copied from class: Filter
      Determines wether a specified key belongs to this filter.
      Specified by:
      membershipTest in class Filter
      Parameters:
      key - The key to test.
      Returns:
      boolean True if the specified key belongs to this filter. False otherwise.
    • approximateCount

      public int approximateCount(Key key)
      This method calculates an approximate count of the key, i.e. how many times the key was added to the filter. This allows the filter to be used as an approximate key -> count map.

      NOTE: due to the bucket size of this filter, inserting the same key more than 15 times will cause an overflow at all filter positions associated with this key, and it will significantly increase the error rate for this and other keys. For this reason the filter can only be used to store small count values 0 <= N << 15.

      Parameters:
      key - key to be tested
      Returns:
      0 if the key is not present. Otherwise, a positive value v will be returned such that v == count with probability equal to the error rate of this filter, and v > count otherwise. Additionally, if the filter experienced an underflow as a result of delete(Key) operation, the return value may be lower than the count with the probability of the false negative rate of such filter.
    • not

      public void not()
      Description copied from class: Filter
      Performs a logical NOT on this filter.

      The result is assigned to this filter.

      Specified by:
      not in class Filter
    • or

      public void or(Filter filter)
      Description copied from class: Filter
      Peforms a logical OR between this filter and a specified filter.

      Invariant: The result is assigned to this filter.

      Specified by:
      or in class Filter
      Parameters:
      filter - The filter to OR with.
    • xor

      public void xor(Filter filter)
      Description copied from class: Filter
      Peforms a logical XOR between this filter and a specified filter.

      Invariant: The result is assigned to this filter.

      Specified by:
      xor in class Filter
      Parameters:
      filter - The filter to XOR with.
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • write

      public void write(DataOutput out) throws IOException
      Description copied from interface: Writable
      Serialize the fields of this object to out.
      Specified by:
      write in interface Writable
      Overrides:
      write in class Filter
      Parameters:
      out - DataOuput to serialize this object into.
      Throws:
      IOException - any other problem for write.
    • readFields

      public void readFields(DataInput in) throws IOException
      Description copied from interface: Writable
      Deserialize the fields of this object from in.

      For efficiency, implementations should attempt to re-use storage in the existing object where possible.

      Specified by:
      readFields in interface Writable
      Overrides:
      readFields in class Filter
      Parameters:
      in - DataInput to deseriablize this object from.
      Throws:
      IOException - any other problem for readFields.