Class FederationInterceptor

java.lang.Object
org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AbstractRequestInterceptor
org.apache.hadoop.yarn.server.nodemanager.amrmproxy.FederationInterceptor
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, org.apache.hadoop.yarn.api.ApplicationMasterProtocol, org.apache.hadoop.yarn.server.api.DistributedSchedulingAMProtocol, RequestInterceptor

public class FederationInterceptor extends AbstractRequestInterceptor
Extends the AbstractRequestInterceptor and provides an implementation for federation of YARN RM and scaling an application across multiple YARN sub-clusters. All the federation specific implementation is encapsulated in this class. This is always the last interceptor in the chain.
  • Field Details

    • NMSS_CLASS_PREFIX

      public static final String NMSS_CLASS_PREFIX
      See Also:
    • NMSS_REG_REQUEST_KEY

      public static final String NMSS_REG_REQUEST_KEY
      See Also:
    • NMSS_REG_RESPONSE_KEY

      public static final String NMSS_REG_RESPONSE_KEY
      See Also:
    • NMSS_SECONDARY_SC_PREFIX

      public static final String NMSS_SECONDARY_SC_PREFIX
      When AMRMProxy HA is enabled, secondary AMRMTokens will be stored in Yarn Registry. Otherwise, if NM recovery is enabled, the UAM token are stored in local NMSS instead under this directory name.
      See Also:
    • STRING_TO_BYTE_FORMAT

      public static final String STRING_TO_BYTE_FORMAT
      See Also:
  • Constructor Details

    • FederationInterceptor

      public FederationInterceptor()
      Creates an instance of the FederationInterceptor class.
  • Method Details

    • init

      public void init(AMRMProxyApplicationContext appContext)
      Initializes the instance using specified context.
      Specified by:
      init in interface RequestInterceptor
      Overrides:
      init in class AbstractRequestInterceptor
      Parameters:
      appContext - AMRMProxy application context
    • recover

      public void recover(Map<String,byte[]> recoveredDataMap)
      Description copied from class: AbstractRequestInterceptor
      Recover RequestInterceptor state from store.
      Specified by:
      recover in interface RequestInterceptor
      Overrides:
      recover in class AbstractRequestInterceptor
      Parameters:
      recoveredDataMap - states for all interceptors recovered from NMSS
    • registerApplicationMaster

      public org.apache.hadoop.yarn.api.protocolrecords.RegisterApplicationMasterResponse registerApplicationMaster(org.apache.hadoop.yarn.api.protocolrecords.RegisterApplicationMasterRequest request) throws org.apache.hadoop.yarn.exceptions.YarnException, IOException
      Sends the application master's registration request to the home RM. Between AM and AMRMProxy, FederationInterceptor modifies the RM behavior, so that when AM registers more than once, it returns the same register success response instead of throwing InvalidApplicationMasterRequestException. Furthermore, we present to AM as if we are the RM that never fails over (except when AMRMProxy restarts). When actual RM fails over, we always re-register automatically. We did this because FederationInterceptor can receive concurrent register requests from AM because of timeout between AM and AMRMProxy, which is shorter than the timeout + failOver between FederationInterceptor (AMRMProxy) and RM. For the same reason, this method needs to be synchronized.
      Throws:
      org.apache.hadoop.yarn.exceptions.YarnException
      IOException
    • allocate

      public org.apache.hadoop.yarn.api.protocolrecords.AllocateResponse allocate(org.apache.hadoop.yarn.api.protocolrecords.AllocateRequest request) throws org.apache.hadoop.yarn.exceptions.YarnException, IOException
      Sends the heart beats to the home RM and the secondary sub-cluster RMs that are being used by the application.
      Throws:
      org.apache.hadoop.yarn.exceptions.YarnException
      IOException
    • finishApplicationMaster

      public org.apache.hadoop.yarn.api.protocolrecords.FinishApplicationMasterResponse finishApplicationMaster(org.apache.hadoop.yarn.api.protocolrecords.FinishApplicationMasterRequest request) throws org.apache.hadoop.yarn.exceptions.YarnException, IOException
      Sends the finish application master request to all the resource managers used by the application.
      Throws:
      org.apache.hadoop.yarn.exceptions.YarnException
      IOException
    • setNextInterceptor

      public void setNextInterceptor(RequestInterceptor next)
      Description copied from class: AbstractRequestInterceptor
      Sets the RequestInterceptor in the chain.
      Specified by:
      setNextInterceptor in interface RequestInterceptor
      Overrides:
      setNextInterceptor in class AbstractRequestInterceptor
      Parameters:
      next - the next interceptor to set
    • shutdown

      public void shutdown()
      This is called when the application pipeline is being destroyed. We will release all the resources that we are holding in this call.
      Specified by:
      shutdown in interface RequestInterceptor
      Overrides:
      shutdown in class AbstractRequestInterceptor
    • cleanupRegistry

      @VisibleForTesting protected void cleanupRegistry()
      Only for unit test cleanup.
    • getRegistryClient

      @VisibleForTesting protected org.apache.hadoop.yarn.server.federation.utils.FederationRegistryClient getRegistryClient()
    • getAttemptId

      @VisibleForTesting protected org.apache.hadoop.yarn.api.records.ApplicationAttemptId getAttemptId()
    • getHomeHeartbeatHandler

      @VisibleForTesting protected org.apache.hadoop.yarn.server.AMHeartbeatRequestHandler getHomeHeartbeatHandler()
    • createUnmanagedAMPoolManager

      @VisibleForTesting protected org.apache.hadoop.yarn.server.uam.UnmanagedAMPoolManager createUnmanagedAMPoolManager(ExecutorService threadPool)
      Create the UAM pool manager for secondary sub-clusters. For unit test to override.
      Parameters:
      threadPool - the thread pool to use
      Returns:
      the UAM pool manager instance
    • createHomeHeartbeatHandler

      @VisibleForTesting protected org.apache.hadoop.yarn.server.AMHeartbeatRequestHandler createHomeHeartbeatHandler(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.yarn.api.records.ApplicationId appId, org.apache.hadoop.yarn.server.AMRMClientRelayer rmProxyRelayer)
    • createHomeRMProxy

      protected <T> T createHomeRMProxy(AMRMProxyApplicationContext appContext, Class<T> protocol, org.apache.hadoop.security.UserGroupInformation user)
      Create a proxy instance that is used to connect to the Home resource manager.
      Type Parameters:
      T - the type of the proxy
      Parameters:
      appContext - AMRMProxyApplicationContext
      protocol - the protocol class for the proxy
      user - the ugi for the proxy
      Returns:
      the proxy created
    • reAttachUAMAndMergeRegisterResponse

      protected void reAttachUAMAndMergeRegisterResponse(org.apache.hadoop.yarn.api.protocolrecords.RegisterApplicationMasterResponse homeResponse, org.apache.hadoop.yarn.api.records.ApplicationId appId)
      Try re-attach to all existing and running UAMs in secondary sub-clusters launched by previous application attempts if any. All running containers in the UAMs will be combined into the registerResponse. For the first attempt, the registry will be empty for this application and thus no-op here.
    • launchUAMAndRegisterApplicationMaster

      protected TokenAndRegisterResponse launchUAMAndRegisterApplicationMaster(org.apache.hadoop.yarn.conf.YarnConfiguration config, String subClusterId, org.apache.hadoop.yarn.api.records.ApplicationId applicationId) throws IOException, org.apache.hadoop.yarn.exceptions.YarnException
      Throws:
      IOException
      org.apache.hadoop.yarn.exceptions.YarnException
    • generateBaseAllocationResponse

      protected org.apache.hadoop.yarn.api.protocolrecords.AllocateResponse generateBaseAllocationResponse()
      Prepare the base allocation response. Use lastSCResponse and lastHeartbeatTimeStamp to assemble entries about cluster-wide info, e.g. AvailableResource, NumClusterNodes.
    • mergeAllocateResponse

      @VisibleForTesting protected void mergeAllocateResponse(org.apache.hadoop.yarn.api.protocolrecords.AllocateResponse homeResponse, org.apache.hadoop.yarn.api.protocolrecords.AllocateResponse otherResponse, org.apache.hadoop.yarn.server.federation.store.records.SubClusterId otherRMAddress)
    • getTimedOutSCs

      protected Set<org.apache.hadoop.yarn.server.federation.store.records.SubClusterId> getTimedOutSCs(boolean verbose)
    • splitResourceRequests

      protected Map<org.apache.hadoop.yarn.server.federation.store.records.SubClusterId,List<org.apache.hadoop.yarn.api.records.ResourceRequest>> splitResourceRequests(List<org.apache.hadoop.yarn.api.records.ResourceRequest> askList) throws org.apache.hadoop.yarn.exceptions.YarnException
      Splits the specified request to send it to different sub clusters. The splitting algorithm is very simple. If the request does not have a node preference, the policy decides the sub cluster. If the request has a node preference and if locality is required, then it is sent to the sub cluster that contains the requested node. If node preference is specified and locality is not required, then the policy decides the sub cluster.
      Parameters:
      askList - the ask list to split
      Returns:
      the split asks
      Throws:
      org.apache.hadoop.yarn.exceptions.YarnException - if split fails
    • getUnmanagedAMPoolSize

      @VisibleForTesting protected int getUnmanagedAMPoolSize()
    • getUnmanagedAMPool

      @VisibleForTesting protected org.apache.hadoop.yarn.server.uam.UnmanagedAMPoolManager getUnmanagedAMPool()
    • getUamRegisterFutures

      @VisibleForTesting protected Map<org.apache.hadoop.yarn.server.federation.store.records.SubClusterId,Future<?>> getUamRegisterFutures()
    • getAsyncResponseSink

      @VisibleForTesting public Map<org.apache.hadoop.yarn.server.federation.store.records.SubClusterId,List<org.apache.hadoop.yarn.api.protocolrecords.AllocateResponse>> getAsyncResponseSink()
    • isNullOrEmpty

      public static <T> boolean isNullOrEmpty(Collection<T> c)
      Utility method to check if the specified Collection is null or empty.
      Type Parameters:
      T - element type of the collection
      Parameters:
      c - the collection object
      Returns:
      whether is it is null or empty
    • isNullOrEmpty

      public static <T1, T2> boolean isNullOrEmpty(Map<T1,T2> c)
      Utility method to check if the specified Collection is null or empty.
      Type Parameters:
      T1 - key type of the map
      T2 - value type of the map
      Parameters:
      c - the map object
      Returns:
      whether is it is null or empty
    • cacheAllocatedContainersForSubClusterId

      @VisibleForTesting protected void cacheAllocatedContainersForSubClusterId(List<org.apache.hadoop.yarn.api.records.Container> containers, org.apache.hadoop.yarn.server.federation.store.records.SubClusterId subClusterId)
    • getContainerIdToSubClusterIdMap

      @VisibleForTesting protected Map<org.apache.hadoop.yarn.api.records.ContainerId,org.apache.hadoop.yarn.server.federation.store.records.SubClusterId> getContainerIdToSubClusterIdMap()