Class TaskPool
java.lang.Object
org.apache.hadoop.util.functional.TaskPool
Utility class for parallel execution, takes closures for the various
actions.
There is no retry logic: it is expected to be handled by the closures.
From
org.apache.hadoop.fs.s3a.commit.Tasks which came from
the Netflix committer patch.
Apache Iceberg has its own version of this, with a common ancestor
at some point in its history.
A key difference with this class is that the iterator is always,
internally, an RemoteIterator.
This is to allow tasks to be scheduled while incremental operations
such as paged directory listings are still collecting in results.
While awaiting completion, this thread spins and sleeps a time of
SLEEP_INTERVAL_AWAITING_COMPLETION, which, being a
busy-wait, is inefficient.
There's an implicit assumption that remote IO is being performed, and
so this is not impacting throughput/performance.
History:
This class came with the Netflix contributions to the S3A committers
in HADOOP-13786.
It was moved into hadoop-common for use in the manifest committer and
anywhere else it is needed, and renamed in the process as
"Tasks" has too many meanings in the hadoop source.
The iterator was then changed from a normal java iterable
to a hadoop RemoteIterator.
This allows a task pool to be supplied with incremental listings
from object stores, scheduling work as pages of listing
results come in, rather than blocking until the entire
directory/directory tree etc has been enumerated.
There is a variant of this in Apache Iceberg in
org.apache.iceberg.util.Tasks
That is not derived from any version in the hadoop codebase, it
just shares a common ancestor somewhere in the Netflix codebase.
It is the more sophisticated version.-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classBuilder for task execution.static interfaceTaskPool.FailureTask<I,E extends Exception> Callback invoked on a failure.static interfaceInterface to whatever lets us submit tasks.static interfaceTaskPool.Task<I,E extends Exception> Callback invoked to process an item. -
Method Summary
Modifier and TypeMethodDescriptionstatic <I> TaskPool.Builder<I>foreach(I[] items) static <I> TaskPool.Builder<I>Create a task builder for the iterable.static <I> TaskPool.Builder<I>foreach(RemoteIterator<I> items) Create a task builder for the remote iterator.
-
Method Details
-
foreach
Create a task builder for the iterable.- Type Parameters:
I- type of result.- Parameters:
items- item source.- Returns:
- builder.
-
foreach
Create a task builder for the remote iterator.- Type Parameters:
I- type of result.- Parameters:
items- item source.- Returns:
- builder.
-
foreach
-