Class SequenceFile.Sorter

java.lang.Object
org.apache.hadoop.io.SequenceFile.Sorter
Enclosing class:
SequenceFile

public static class SequenceFile.Sorter extends Object
Sorts key/value pairs in a sequence-format file.

For best performance, applications should make sure that the Writable.readFields(DataInput) implementation of their keys is very efficient. In particular, it should avoid allocating memory.

  • Constructor Details

    • Sorter

      public Sorter(FileSystem fs, Class<? extends WritableComparable> keyClass, Class valClass, Configuration conf)
      Sort and merge files containing the named classes.
      Parameters:
      fs - input FileSystem.
      keyClass - input keyClass.
      valClass - input valClass.
      conf - input Configuration.
    • Sorter

      public Sorter(FileSystem fs, RawComparator comparator, Class keyClass, Class valClass, Configuration conf)
      Sort and merge using an arbitrary RawComparator.
      Parameters:
      fs - input FileSystem.
      comparator - input RawComparator.
      keyClass - input keyClass.
      valClass - input valClass.
      conf - input Configuration.
    • Sorter

      public Sorter(FileSystem fs, RawComparator comparator, Class keyClass, Class valClass, Configuration conf, SequenceFile.Metadata metadata)
      Sort and merge using an arbitrary RawComparator.
      Parameters:
      fs - input FileSystem.
      comparator - input RawComparator.
      keyClass - input keyClass.
      valClass - input valClass.
      conf - input Configuration.
      metadata - input metadata.
  • Method Details

    • setFactor

      public void setFactor(int factor)
      Set the number of streams to merge at once.
      Parameters:
      factor - factor.
    • getFactor

      public int getFactor()
      Returns:
      Get the number of streams to merge at once.
    • setMemory

      public void setMemory(int memory)
      Set the total amount of buffer memory, in bytes.
      Parameters:
      memory - buffer memory.
    • getMemory

      public int getMemory()
      Returns:
      Get the total amount of buffer memory, in bytes.
    • setProgressable

      public void setProgressable(Progressable progressable)
      Set the progressable object in order to report progress.
      Parameters:
      progressable - input Progressable.
    • sort

      public void sort(Path[] inFiles, Path outFile, boolean deleteInput) throws IOException
      Perform a file sort from a set of input files into an output file.
      Parameters:
      inFiles - the files to be sorted
      outFile - the sorted output file
      deleteInput - should the input files be deleted as they are read?
      Throws:
      IOException - raised on errors performing I/O.
    • sortAndIterate

      public SequenceFile.Sorter.RawKeyValueIterator sortAndIterate(Path[] inFiles, Path tempDir, boolean deleteInput) throws IOException
      Perform a file sort from a set of input files and return an iterator.
      Parameters:
      inFiles - the files to be sorted
      tempDir - the directory where temp files are created during sort
      deleteInput - should the input files be deleted as they are read?
      Returns:
      iterator the RawKeyValueIterator
      Throws:
      IOException - raised on errors performing I/O.
    • sort

      public void sort(Path inFile, Path outFile) throws IOException
      The backwards compatible interface to sort.
      Parameters:
      inFile - the input file to sort.
      outFile - the sorted output file.
      Throws:
      IOException - raised on errors performing I/O.
    • merge

      Merges the list of segments of type SegmentDescriptor
      Parameters:
      segments - the list of SegmentDescriptors
      tmpDir - the directory to write temporary files into
      Returns:
      RawKeyValueIterator
      Throws:
      IOException - raised on errors performing I/O.
    • merge

      public SequenceFile.Sorter.RawKeyValueIterator merge(Path[] inNames, boolean deleteInputs, Path tmpDir) throws IOException
      Merges the contents of files passed in Path[] using a max factor value that is already set
      Parameters:
      inNames - the array of path names
      deleteInputs - true if the input files should be deleted when unnecessary
      tmpDir - the directory to write temporary files into
      Returns:
      RawKeyValueIteratorMergeQueue
      Throws:
      IOException - raised on errors performing I/O.
    • merge

      public SequenceFile.Sorter.RawKeyValueIterator merge(Path[] inNames, boolean deleteInputs, int factor, Path tmpDir) throws IOException
      Merges the contents of files passed in Path[]
      Parameters:
      inNames - the array of path names
      deleteInputs - true if the input files should be deleted when unnecessary
      factor - the factor that will be used as the maximum merge fan-in
      tmpDir - the directory to write temporary files into
      Returns:
      RawKeyValueIteratorMergeQueue
      Throws:
      IOException - raised on errors performing I/O.
    • merge

      public SequenceFile.Sorter.RawKeyValueIterator merge(Path[] inNames, Path tempDir, boolean deleteInputs) throws IOException
      Merges the contents of files passed in Path[]
      Parameters:
      inNames - the array of path names
      tempDir - the directory for creating temp files during merge
      deleteInputs - true if the input files should be deleted when unnecessary
      Returns:
      RawKeyValueIteratorMergeQueue
      Throws:
      IOException - raised on errors performing I/O.
    • cloneFileAttributes

      public SequenceFile.Writer cloneFileAttributes(Path inputFile, Path outputFile, Progressable prog) throws IOException
      Clones the attributes (like compression of the input file and creates a corresponding Writer
      Parameters:
      inputFile - the path of the input file whose attributes should be cloned
      outputFile - the path of the output file
      prog - the Progressable to report status during the file write
      Returns:
      Writer
      Throws:
      IOException - raised on errors performing I/O.
    • writeFile

      public void writeFile(SequenceFile.Sorter.RawKeyValueIterator records, SequenceFile.Writer writer) throws IOException
      Writes records from RawKeyValueIterator into a file represented by the passed writer.
      Parameters:
      records - the RawKeyValueIterator
      writer - the Writer created earlier
      Throws:
      IOException - raised on errors performing I/O.
    • merge

      public void merge(Path[] inFiles, Path outFile) throws IOException
      Merge the provided files.
      Parameters:
      inFiles - the array of input path names
      outFile - the final output file
      Throws:
      IOException - raised on errors performing I/O.