Class TaskPool

java.lang.Object
org.apache.hadoop.util.functional.TaskPool

@Private @Unstable public final class TaskPool extends Object
Utility class for parallel execution, takes closures for the various actions. There is no retry logic: it is expected to be handled by the closures. From org.apache.hadoop.fs.s3a.commit.Tasks which came from the Netflix committer patch. Apache Iceberg has its own version of this, with a common ancestor at some point in its history. A key difference with this class is that the iterator is always, internally, an RemoteIterator. This is to allow tasks to be scheduled while incremental operations such as paged directory listings are still collecting in results. While awaiting completion, this thread spins and sleeps a time of SLEEP_INTERVAL_AWAITING_COMPLETION, which, being a busy-wait, is inefficient. There's an implicit assumption that remote IO is being performed, and so this is not impacting throughput/performance. History: This class came with the Netflix contributions to the S3A committers in HADOOP-13786. It was moved into hadoop-common for use in the manifest committer and anywhere else it is needed, and renamed in the process as "Tasks" has too many meanings in the hadoop source. The iterator was then changed from a normal java iterable to a hadoop RemoteIterator. This allows a task pool to be supplied with incremental listings from object stores, scheduling work as pages of listing results come in, rather than blocking until the entire directory/directory tree etc has been enumerated. There is a variant of this in Apache Iceberg in org.apache.iceberg.util.Tasks That is not derived from any version in the hadoop codebase, it just shares a common ancestor somewhere in the Netflix codebase. It is the more sophisticated version.
  • Method Details

    • foreach

      public static <I> TaskPool.Builder<I> foreach(Iterable<I> items)
      Create a task builder for the iterable.
      Type Parameters:
      I - type of result.
      Parameters:
      items - item source.
      Returns:
      builder.
    • foreach

      public static <I> TaskPool.Builder<I> foreach(RemoteIterator<I> items)
      Create a task builder for the remote iterator.
      Type Parameters:
      I - type of result.
      Parameters:
      items - item source.
      Returns:
      builder.
    • foreach

      public static <I> TaskPool.Builder<I> foreach(I[] items)