Class DatanodeManager
java.lang.Object
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager
Manage datanodes, include decommission and other activities.
-
Method Summary
Modifier and TypeMethodDescriptionvoidaddSlowPeers(String dnUuid) voidvoidClear any actions that are queued up to be sent to the DNs on their next heartbeats.voidfetchDatanodes(List<DatanodeDescriptor> live, List<DatanodeDescriptor> dead, boolean removeDecommissionNode) Fetch live and dead datanodes.intlonggetDatanode(String datanodeUuid) Get a datanode descriptor given corresponding DatanodeUUIDgetDatanode(org.apache.hadoop.hdfs.protocol.DatanodeID nodeID) Get data node by datanode ID.getDatanodeByHost(String host) getDatanodeByHostName(String hostname) getDatanodeByXferAddr(String host, int xferPort) getDatanodeListForReport(org.apache.hadoop.hdfs.protocol.HdfsConstants.DatanodeReportType type) For generating datanode reportsgetDatanodeStorageInfos(org.apache.hadoop.hdfs.protocol.DatanodeID[] datanodeID, String[] storageIDs, String format, Object... args) org.apache.hadoop.hdfs.server.protocol.DatanodeStorageReport[]getDatanodeStorageReport(org.apache.hadoop.hdfs.protocol.HdfsConstants.DatanodeReportType type) Generates datanode reports for the given report type.booleanlonglongorg.apache.hadoop.hdfs.server.blockmanagement.Host2NodesMapintorg.apache.hadoop.net.NetworkTopologyintintintintintGet the number of content stale storages.Retrieve information about slow disks as a JSON.Use only for testing.Returns all tracking slow datanodes uuids.longRetrieve information about slow peers as a JSON.Returns all tracking slow peers.Use only for testing.handleHeartbeat(DatanodeRegistration nodeReg, org.apache.hadoop.hdfs.server.protocol.StorageReport[] reports, String blockPoolId, long cacheCapacity, long cacheUsed, int xceiverCount, int xmitsInProgress, int failedVolumes, VolumeFailureSummary volumeFailureSummary, org.apache.hadoop.hdfs.server.protocol.SlowPeerReports slowPeers, org.apache.hadoop.hdfs.server.protocol.SlowDiskReports slowDisks) Handle heartbeat from datanodes.voidhandleLifeline(DatanodeRegistration nodeReg, org.apache.hadoop.hdfs.server.protocol.StorageReport[] reports, long cacheCapacity, long cacheUsed, int xceiverCount, int failedVolumes, VolumeFailureSummary volumeFailureSummary) Handles a lifeline message sent by a DataNode.voidinitSlowPeerTracker(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.util.Timer timer, boolean dataNodePeerStatsEnabled) Determines whether slow peer tracker should be enabled.booleanvoidvoidrefreshNodes(org.apache.hadoop.conf.Configuration conf) Rereads conf to get hosts and exclude list file names.voidregisterDatanode(DatanodeRegistration nodeReg) Register the given datanode with the namenode.voidremoveDatanode(org.apache.hadoop.hdfs.protocol.DatanodeID node) Remove a datanodevoidReset the lastCachingDirectiveSentTimeMs field of all the DataNodes we know about.resolveNetworkLocation(List<String> names) Resolve network locations for specified hostsvoidrestartSlowPeerCollector(long interval) voidsetAvoidSlowDataNodesForReadEnabled(boolean enable) voidsetBalancerBandwidth(long bandwidth) Tell all datanodes to use a new, non-persistent bandwidth value for dfs.datanode.balance.bandwidthPerSec.voidsetBlockInvalidateLimit(int configuredBlockInvalidateLimit) voidsetHeartbeatExpireInterval(long expiryMs) voidsetHeartbeatInterval(long intervalSeconds) voidsetHeartbeatRecheckInterval(int recheckInterval) voidsetMaxSlowpeerCollectNodes(int maxNodes) voidsetMaxSlowPeersToReport(int maxSlowPeersToReport) voidsetShouldSendCachingCommands(boolean shouldSendCachingCommands) booleanWhether stale datanodes should be avoided as targets on the write path.voidsortLocatedBlocks(String targetHost, List<org.apache.hadoop.hdfs.protocol.LocatedBlock> locatedBlocks) Sort the non-striped located blocks by the distance to the target host.toString()
-
Method Details
-
initSlowPeerTracker
public void initSlowPeerTracker(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.util.Timer timer, boolean dataNodePeerStatsEnabled) Determines whether slow peer tracker should be enabled. If dataNodePeerStatsEnabledVal is true, slow peer tracker is initialized.- Parameters:
conf- The configuration to use while initializing slowPeerTracker.timer- Timer object for slowPeerTracker.dataNodePeerStatsEnabled- To determine whether slow peer tracking should be enabled.
-
restartSlowPeerCollector
public void restartSlowPeerCollector(long interval) -
getNetworkTopology
public org.apache.hadoop.net.NetworkTopology getNetworkTopology()- Returns:
- the network topology.
-
getDatanodeAdminManager
-
getHostConfigManager
-
setHeartbeatExpireInterval
@VisibleForTesting public void setHeartbeatExpireInterval(long expiryMs) -
getFSClusterStats
-
getBlockInvalidateLimit
@VisibleForTesting public int getBlockInvalidateLimit() -
getDatanodeStatistics
- Returns:
- the datanode statistics.
-
setAvoidSlowDataNodesForReadEnabled
public void setAvoidSlowDataNodesForReadEnabled(boolean enable) -
getEnableAvoidSlowDataNodesForRead
@VisibleForTesting public boolean getEnableAvoidSlowDataNodesForRead() -
setMaxSlowpeerCollectNodes
public void setMaxSlowpeerCollectNodes(int maxNodes) -
getMaxSlowpeerCollectNodes
@VisibleForTesting public int getMaxSlowpeerCollectNodes() -
sortLocatedBlocks
public void sortLocatedBlocks(String targetHost, List<org.apache.hadoop.hdfs.protocol.LocatedBlock> locatedBlocks) Sort the non-striped located blocks by the distance to the target host. For striped blocks, it will only move decommissioned/decommissioning/stale/slow nodes to the bottom. For example, assume we have storage list: d0, d1, d2, d3, d4, d5, d6, d7, d8, d9 mapping to block indices: 0, 1, 2, 3, 4, 5, 6, 7, 8, 2 Here the internal block b2 is duplicated, locating in d2 and d9. If d2 is a decommissioning node then should switch d2 and d9 in the storage list. After sorting locations, will update corresponding block indices and block tokens. -
getDatanodeByHost
- Returns:
- the datanode descriptor for the host.
-
getDatanodeByHostName
- Parameters:
hostname- hostname of the datanode- Returns:
- the datanode descriptor for the host.
-
getDatanodeByXferAddr
- Returns:
- the datanode descriptor for the host.
-
getDatanodes
- Returns:
- the datanode descriptors for all nodes.
-
getHost2DatanodeMap
public org.apache.hadoop.hdfs.server.blockmanagement.Host2NodesMap getHost2DatanodeMap()- Returns:
- the Host2NodesMap
-
getDatanode
Get a datanode descriptor given corresponding DatanodeUUID -
getDatanode
public DatanodeDescriptor getDatanode(org.apache.hadoop.hdfs.protocol.DatanodeID nodeID) throws UnregisteredNodeException Get data node by datanode ID.- Parameters:
nodeID- datanode ID- Returns:
- DatanodeDescriptor or null if the node is not found.
- Throws:
UnregisteredNodeException
-
getDatanodeStorageInfos
public DatanodeStorageInfo[] getDatanodeStorageInfos(org.apache.hadoop.hdfs.protocol.DatanodeID[] datanodeID, String[] storageIDs, String format, Object... args) throws UnregisteredNodeException - Throws:
UnregisteredNodeException
-
removeDatanode
public void removeDatanode(org.apache.hadoop.hdfs.protocol.DatanodeID node) throws UnregisteredNodeException Remove a datanode- Throws:
UnregisteredNodeException
-
getDatanodesSoftwareVersions
-
resolveNetworkLocation
Resolve network locations for specified hosts- Returns:
- Network locations if available, Else returns null
-
registerDatanode
public void registerDatanode(DatanodeRegistration nodeReg) throws DisallowedDatanodeException, UnresolvedTopologyException Register the given datanode with the namenode. NB: the given registration is mutated and given back to the datanode.- Parameters:
nodeReg- the datanode registration- Throws:
DisallowedDatanodeException- if the registration request is denied because the datanode does not match includes/excludesUnresolvedTopologyException- if the registration request is denied because resolving datanode network location fails.
-
refreshNodes
Rereads conf to get hosts and exclude list file names. Rereads the files to update the hosts and exclude lists. It checks if any of the hosts have changed states:- Throws:
IOException
-
getNumLiveDataNodes
public int getNumLiveDataNodes()- Returns:
- the number of live datanodes.
-
getNumDeadDataNodes
public int getNumDeadDataNodes()- Returns:
- the number of dead datanodes.
-
getNumOfDataNodes
public int getNumOfDataNodes()- Returns:
- the number of datanodes.
-
getDecommissioningNodes
- Returns:
- list of datanodes where decommissioning is in progress.
-
getEnteringMaintenanceNodes
- Returns:
- list of datanodes that are entering maintenance.
-
shouldAvoidStaleDataNodesForWrite
public boolean shouldAvoidStaleDataNodesForWrite()Whether stale datanodes should be avoided as targets on the write path. The result of this function may change if the number of stale datanodes eclipses a configurable threshold.- Returns:
- whether stale datanodes should be avoided on the write path
-
getBlocksPerPostponedMisreplicatedBlocksRescan
public long getBlocksPerPostponedMisreplicatedBlocksRescan() -
getHeartbeatInterval
public long getHeartbeatInterval() -
getHeartbeatRecheckInterval
public long getHeartbeatRecheckInterval() -
getNumStaleNodes
public int getNumStaleNodes()- Returns:
- Return the current number of stale DataNodes (detected by HeartbeatManager).
-
getNumStaleStorages
public int getNumStaleStorages()Get the number of content stale storages. -
fetchDatanodes
public void fetchDatanodes(List<DatanodeDescriptor> live, List<DatanodeDescriptor> dead, boolean removeDecommissionNode) Fetch live and dead datanodes. -
getDatanodeListForReport
public List<DatanodeDescriptor> getDatanodeListForReport(org.apache.hadoop.hdfs.protocol.HdfsConstants.DatanodeReportType type) For generating datanode reports -
getAllSlowDataNodes
-
handleHeartbeat
public DatanodeCommand[] handleHeartbeat(DatanodeRegistration nodeReg, org.apache.hadoop.hdfs.server.protocol.StorageReport[] reports, String blockPoolId, long cacheCapacity, long cacheUsed, int xceiverCount, int xmitsInProgress, int failedVolumes, VolumeFailureSummary volumeFailureSummary, @Nonnull org.apache.hadoop.hdfs.server.protocol.SlowPeerReports slowPeers, @Nonnull org.apache.hadoop.hdfs.server.protocol.SlowDiskReports slowDisks) throws IOException Handle heartbeat from datanodes.- Throws:
IOException
-
handleLifeline
public void handleLifeline(DatanodeRegistration nodeReg, org.apache.hadoop.hdfs.server.protocol.StorageReport[] reports, long cacheCapacity, long cacheUsed, int xceiverCount, int failedVolumes, VolumeFailureSummary volumeFailureSummary) throws IOException Handles a lifeline message sent by a DataNode.- Parameters:
nodeReg- registration info for DataNode sending the lifelinereports- storage reports from DataNodecacheCapacity- cache capacity at DataNodecacheUsed- cache used at DataNodexceiverCount- estimated count of transfer threads running at DataNodefailedVolumes- count of failed volumes at DataNodevolumeFailureSummary- info on failed volumes at DataNode- Throws:
IOException- if there is an error
-
setBalancerBandwidth
Tell all datanodes to use a new, non-persistent bandwidth value for dfs.datanode.balance.bandwidthPerSec. A system administrator can tune the balancer bandwidth parameter (dfs.datanode.balance.bandwidthPerSec) dynamically by calling "dfsadmin -setBalanacerBandwidth newbandwidth", at which point the following 'bandwidth' variable gets updated with the new value for each node. Once the heartbeat command is issued to update the value on the specified datanode, this value will be set back to 0.- Parameters:
bandwidth- Blanacer bandwidth in bytes per second for all datanodes.- Throws:
IOException
-
markAllDatanodesStaleAndSetKeyUpdateIfNeed
public void markAllDatanodesStaleAndSetKeyUpdateIfNeed() -
clearPendingQueues
public void clearPendingQueues()Clear any actions that are queued up to be sent to the DNs on their next heartbeats. This includes block invalidations, recoveries, and replication requests. -
resetLastCachingDirectiveSentTime
public void resetLastCachingDirectiveSentTime()Reset the lastCachingDirectiveSentTimeMs field of all the DataNodes we know about. -
toString
-
clearPendingCachingCommands
public void clearPendingCachingCommands() -
setShouldSendCachingCommands
public void setShouldSendCachingCommands(boolean shouldSendCachingCommands) -
setHeartbeatInterval
public void setHeartbeatInterval(long intervalSeconds) -
setHeartbeatRecheckInterval
public void setHeartbeatRecheckInterval(int recheckInterval) -
setBlockInvalidateLimit
public void setBlockInvalidateLimit(int configuredBlockInvalidateLimit) -
getSlowPeersReport
Retrieve information about slow peers as a JSON. Returns null if we are not tracking slow peers.- Returns:
-
getSlowPeersUuidSet
Returns all tracking slow peers.- Returns:
-
getSlowNodesUuidSet
Returns all tracking slow datanodes uuids.- Returns:
-
getSlowPeerTracker
Use only for testing. -
getSlowDiskTracker
Use only for testing. -
addSlowPeers
-
getSlowDisksReport
Retrieve information about slow disks as a JSON. Returns null if we are not tracking slow disks.- Returns:
-
getDatanodeStorageReport
public org.apache.hadoop.hdfs.server.protocol.DatanodeStorageReport[] getDatanodeStorageReport(org.apache.hadoop.hdfs.protocol.HdfsConstants.DatanodeReportType type) Generates datanode reports for the given report type.- Parameters:
type- type of the datanode report- Returns:
- array of DatanodeStorageReports
-
getDatanodeMap
-
setMaxSlowPeersToReport
public void setMaxSlowPeersToReport(int maxSlowPeersToReport) -
isSlowPeerCollectorInitialized
@VisibleForTesting public boolean isSlowPeerCollectorInitialized() -
getSlowPeerCollectionInterval
@VisibleForTesting public long getSlowPeerCollectionInterval()
-