Class NetworkTopology

java.lang.Object
org.apache.hadoop.net.NetworkTopology
Direct Known Subclasses:
NetworkTopologyWithNodeGroup

@LimitedPrivate({"HDFS","MapReduce"}) @Unstable public class NetworkTopology extends Object
The class represents a cluster of computer with a tree hierarchical network topology. For example, a cluster may be consists of many data centers filled with racks of computers. In a network topology, leaves represent data nodes (computers) and inner nodes represent switches/routers that manage traffic in/out of data centers or racks.
  • Field Details

    • DEFAULT_RACK

      public static final String DEFAULT_RACK
      See Also:
    • LOG

      public static final org.slf4j.Logger LOG
    • numOfRacks

      protected int numOfRacks
      rack counter
    • netlock

      protected ReadWriteLock netlock
      the lock used to manage access
  • Constructor Details

    • NetworkTopology

      public NetworkTopology()
  • Method Details

    • getInstance

      public static NetworkTopology getInstance(Configuration conf)
      Get an instance of NetworkTopology based on the value of the configuration parameter net.topology.impl.
      Parameters:
      conf - the configuration to be used
      Returns:
      an instance of NetworkTopology
    • getInstance

      public static NetworkTopology getInstance(Configuration conf, InnerNode.Factory factory)
    • init

      protected NetworkTopology init(InnerNode.Factory factory)
    • add

      public void add(Node node)
      Add a leaf node Update node counter & rack counter if necessary
      Parameters:
      node - node to be added; can be null
      Throws:
      IllegalArgumentException - if add a node to a leave or node to be added is not a leaf
    • incrementRacks

      protected void incrementRacks()
    • getNodeForNetworkLocation

      protected Node getNodeForNetworkLocation(Node node)
      Return a reference to the node given its string representation. Default implementation delegates to getNode(String).

      To be overridden in subclasses for specific NetworkTopology implementations, as alternative to overriding the full add(Node) method.

      Parameters:
      node - The string representation of this node's network location is used to retrieve a Node object.
      Returns:
      a reference to the node; null if the node is not in the tree
      See Also:
    • getDatanodesInRack

      public List<Node> getDatanodesInRack(String loc)
      Given a string representation of a rack, return its children
      Parameters:
      loc - a path-like string representation of a rack
      Returns:
      a newly allocated list with all the node's children
    • remove

      public void remove(Node node)
      Remove a node Update node counter and rack counter if necessary
      Parameters:
      node - node to be removed; can be null
    • contains

      public boolean contains(Node node)
      Check if the tree contains node node
      Parameters:
      node - a node
      Returns:
      true if node is already in the tree; false otherwise
    • getNode

      public Node getNode(String loc)
      Given a string representation of a node, return its reference
      Parameters:
      loc - a path-like string representation of a node
      Returns:
      a reference to the node; null if the node is not in the tree
    • hasClusterEverBeenMultiRack

      public boolean hasClusterEverBeenMultiRack()
      Returns:
      true if this cluster has ever consisted of multiple racks, even if it is not now a multi-rack cluster.
    • getRack

      public String getRack(String loc)
      Given a string representation of a rack for a specific network location To be overridden in subclasses for specific NetworkTopology implementations, as alternative to overriding the full getRack(String) method.
      Parameters:
      loc - a path-like string representation of a network location
      Returns:
      a rack string
    • getNumOfRacks

      public int getNumOfRacks()
      Returns:
      the total number of racks
    • getNumOfLeaves

      public int getNumOfLeaves()
      Returns:
      the total number of leaf nodes
    • getDistance

      public int getDistance(Node node1, Node node2)
      Return the distance between two nodes It is assumed that the distance from one node to its parent is 1 The distance between two nodes is calculated by summing up their distances to their closest common ancestor.
      Parameters:
      node1 - one node
      node2 - another node
      Returns:
      the distance between node1 and node2 which is zero if they are the same or Integer.MAX_VALUE if node1 or node2 do not belong to the cluster
    • getDistanceByPath

      public static int getDistanceByPath(Node node1, Node node2)
      Return the distance between two nodes by comparing their network paths without checking if they belong to the same ancestor node by reference. It is assumed that the distance from one node to its parent is 1 The distance between two nodes is calculated by summing up their distances to their closest common ancestor.
      Parameters:
      node1 - one node
      node2 - another node
      Returns:
      the distance between node1 and node2
    • isOnSameRack

      public boolean isOnSameRack(Node node1, Node node2)
      Check if two nodes are on the same rack
      Parameters:
      node1 - one node (can be null)
      node2 - another node (can be null)
      Returns:
      true if node1 and node2 are on the same rack; false otherwise
      Throws:
      IllegalArgumentException - when either node1 or node2 is null, or node1 or node2 do not belong to the cluster
    • isNodeGroupAware

      public boolean isNodeGroupAware()
      Returns:
      Check if network topology is aware of NodeGroup.
    • isOnSameNodeGroup

      public boolean isOnSameNodeGroup(Node node1, Node node2)
      Parameters:
      node1 - input node1.
      node2 - input node2.
      Returns:
      Return false directly as not aware of NodeGroup, to be override in sub-class.
    • isSameParents

      protected boolean isSameParents(Node node1, Node node2)
      Compare the parents of each node for equality

      To be overridden in subclasses for specific NetworkTopology implementations, as alternative to overriding the full isOnSameRack(Node, Node) method.

      Parameters:
      node1 - the first node to compare
      node2 - the second node to compare
      Returns:
      true if their parents are equal, false otherwise
      See Also:
    • chooseRandom

      public Node chooseRandom(String scope)
      Randomly choose a node.
      Parameters:
      scope - range of nodes from which a node will be chosen
      Returns:
      the chosen node
      See Also:
    • chooseRandom

      public Node chooseRandom(String scope, Collection<Node> excludedNodes)
      Randomly choose one node from scope. If scope starts with ~, choose one from the all nodes except for the ones in scope; otherwise, choose one from scope. If excludedNodes is given, choose a node that's not in excludedNodes.
      Parameters:
      scope - range of nodes from which a node will be chosen
      excludedNodes - nodes to be excluded from
      Returns:
      the chosen node
    • chooseRandom

      protected Node chooseRandom(String scope, String excludedScope, Collection<Node> excludedNodes)
    • getLeaves

      public List<Node> getLeaves(String scope)
      return leaves in scope
      Parameters:
      scope - a path string
      Returns:
      leaves nodes under specific scope
    • countNumOfAvailableNodes

      @VisibleForTesting public int countNumOfAvailableNodes(String scope, Collection<Node> excludedNodes)
      return the number of leaves in scope but not in excludedNodes if scope starts with ~, return the number of nodes that are not in scope and excludedNodes;
      Parameters:
      scope - a path string that may start with ~
      excludedNodes - a list of nodes
      Returns:
      number of available nodes
    • toString

      public String toString()
      convert a network tree to a string.
      Overrides:
      toString in class Object
    • getFirstHalf

      public static String getFirstHalf(String networkLocation)
      Parameters:
      networkLocation - input networkLocation.
      Returns:
      Divide networklocation string into two parts by last separator, and get the first part here.
    • getLastHalf

      public static String getLastHalf(String networkLocation)
      Parameters:
      networkLocation - input networkLocation.
      Returns:
      Divide networklocation string into two parts by last separator, and get the second part here.
    • getWeight

      @VisibleForTesting protected int getWeight(Node reader, Node node)
      Returns an integer weight which specifies how far away {node} is away from {reader}. A lower value signifies that a node is closer.
      Parameters:
      reader - Node where data will be read
      node - Replica of data
      Returns:
      weight
    • getWeightUsingNetworkLocation

      @VisibleForTesting protected static int getWeightUsingNetworkLocation(Node reader, Node node)
      Returns an integer weight which specifies how far away node is from reader. A lower value signifies that a node is closer. It uses network location to calculate the weight
      Parameters:
      reader - Node where data will be read
      node - Replica of data
      Returns:
      weight
    • sortByDistance

      public void sortByDistance(Node reader, Node[] nodes, int activeLen)
      Sort nodes array by network distance to reader.

      In a three-level topology, a node can be either local, on the same rack, or on a different rack from the reader. Sorting the nodes based on network distance from the reader reduces network traffic and improves performance.

      As an additional twist, we also randomize the nodes at each network distance. This helps with load balancing when there is data skew.

      Parameters:
      reader - Node where data will be read
      nodes - Available replicas with the requested data
      activeLen - Number of active nodes at the front of the array
    • sortByDistance

      public <T extends Node> void sortByDistance(Node reader, T[] nodes, int activeLen, Consumer<List<T>> secondarySort)
      Sort nodes array by network distance to reader with secondary sort.

      In a three-level topology, a node can be either local, on the same rack, or on a different rack from the reader. Sorting the nodes based on network distance from the reader reduces network traffic and improves performance.

      As an additional twist, we also randomize the nodes at each network distance. This helps with load balancing when there is data skew.
      Type Parameters:
      T - Generics Type T
      Parameters:
      reader - Node where data will be read
      nodes - Available replicas with the requested data
      activeLen - Number of active nodes at the front of the array
      secondarySort - a secondary sorting strategy which can inject into that point from outside to help sort the same distance.
    • sortByDistanceUsingNetworkLocation

      public void sortByDistanceUsingNetworkLocation(Node reader, Node[] nodes, int activeLen)
      Sort nodes array by network distance to reader with secondary sort.

      using network location. This is used when the reader is not a datanode. Sorting the nodes based on network distance from the reader reduces network traffic and improves performance.

      Parameters:
      reader - Node where data will be read
      nodes - Available replicas with the requested data
      activeLen - Number of active nodes at the front of the array
    • sortByDistanceUsingNetworkLocation

      public <T extends Node> void sortByDistanceUsingNetworkLocation(Node reader, T[] nodes, int activeLen, Consumer<List<T>> secondarySort)
      Sort nodes array by network distance to reader.

      using network location. This is used when the reader is not a datanode. Sorting the nodes based on network distance from the reader reduces network traffic and improves performance.

      Type Parameters:
      T - Generics Type T.
      Parameters:
      reader - Node where data will be read
      nodes - Available replicas with the requested data
      activeLen - Number of active nodes at the front of the array
      secondarySort - a secondary sorting strategy which can inject into that point from outside to help sort the same distance.
    • isChildScope

      protected static boolean isChildScope(String parentScope, String childScope)
      Checks whether one scope is contained in the other scope.
      Parameters:
      parentScope - the parent scope to check
      childScope - the child scope which needs to be checked.
      Returns:
      true if childScope is contained within the parentScope
    • isNodeInScope

      protected static boolean isNodeInScope(Node node, String scope)
      Checks whether a node belongs to the scope.
      Parameters:
      node - the node to check.
      scope - scope to check.
      Returns:
      true if node lies within the scope
    • getNumOfNonEmptyRacks

      public int getNumOfNonEmptyRacks()
      Returns:
      the number of nonempty racks
    • recommissionNode

      public void recommissionNode(Node node)
      Update empty rack number when add a node like recommission.
      Parameters:
      node - node to be added; can be null
    • decommissionNode

      public void decommissionNode(Node node)
      Update empty rack number when remove a node like decommission.
      Parameters:
      node - node to be added; can be null
    • shuffle

      public void shuffle(Node[] nodes, int activeLen)
      Randomly permute the active nodes of the node array.
      Parameters:
      nodes - Available replicas with the requested data
      activeLen - Number of active nodes at the front of the array