2016-06-18 11 views
0

ich meine hadoop Cluster mich entschieden, die aus 4 Docker-Container besteht:MapReduce Job ipc.Client erneuten Versuch zu verbinden

  • DataNode
  • Secondary NameNode
  • NameNode
  • Resource Manager

Wenn ich einen Map Reduce-Job einreiche, merke ich Verbindungsprobleme, sobald sowohl Map als auch Reduced auf 100% stehen. Dies erreicht dann die maximale Anzahl von Wiederholungen, bevor eine Stapelverfolgung durchgeführt wird. Das Seltsame ist, dass der Job beendet ist und eine Antwort gibt. Die Webschnittstelle des Knoten-Managers zeigt jedoch einen fehlgeschlagenen Job an. Keine der Fragen/Antworten, die ich bis jetzt gefunden habe, behebt mein spezielles Problem.

Alle meine Maschinen haben den Portbereich 50100: 50200 freigelegt, um der Eigenschaft "gam.app.mapreduce.am.job.client.port-range" zu entsprechen.

Die Arbeit, die ich einreichen ist

sudo -u hdfs hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.7.1.jar pi 1 1 

Dies ist der Ausgang:

Number of Maps = 1 
    Samples per Map = 1 
    Wrote input for Map #0 
    Starting Job 
    16/06/18 19:14:07 INFO client.RMProxy: Connecting to ResourceManager at resource-manager/172.19.0.2:8032 
    16/06/18 19:14:08 INFO input.FileInputFormat: Total input paths to process : 1 
    16/06/18 19:14:08 INFO mapreduce.JobSubmitter: number of splits:1 
    16/06/18 19:14:08 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1466277178029_0001 
    16/06/18 19:14:08 INFO impl.YarnClientImpl: Submitted application application_1466277178029_0001 
    16/06/18 19:14:08 INFO mapreduce.Job: The url to track the job: http://resource-manager:8088/proxy/application_1466277178029_0001/ 
    16/06/18 19:14:08 INFO mapreduce.Job: Running job: job_1466277178029_0001 
    16/06/18 19:14:15 INFO mapreduce.Job: Job job_1466277178029_0001 running in uber mode : false 
    16/06/18 19:14:15 INFO mapreduce.Job: map 0% reduce 0% 
    16/06/18 19:14:19 INFO mapreduce.Job: map 100% reduce 0% 
    16/06/18 19:14:26 INFO mapreduce.Job: map 100% reduce 100% 
    16/06/18 19:14:32 INFO ipc.Client: Retrying connect to server: 01d3c03f829a/172.19.0.4:50100. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 
    16/06/18 19:14:33 INFO ipc.Client: Retrying connect to server: 01d3c03f829a/172.19.0.4:50100. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 
    16/06/18 19:14:34 INFO ipc.Client: Retrying connect to server: 01d3c03f829a/172.19.0.4:50100. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 
    16/06/18 19:14:36 INFO mapreduce.Job: map 0% reduce 0% 
    16/06/18 19:14:36 INFO mapreduce.Job: Job job_1466277178029_0001 failed with state FAILED due to: Application application_1466277178029_0001 failed 2 times due to AM Container for appattempt_1466277178029_0001_000002 exited with exitCode: 1 
    For more detailed output, check application tracking page:http://resource-manager:8088/proxy/application_1466277178029_0001/AThen, click on links to logs of each attempt. 
    Diagnostics: Exception from container-launch. 
    Container id: container_1466277178029_0001_02_000001 
    Exit code: 1 
    Stack trace: ExitCodeException exitCode=1: 
     at org.apache.hadoop.util.Shell.runCommand(Shell.java:561) 
     at org.apache.hadoop.util.Shell.run(Shell.java:478) 
     at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:738) 
     at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213) 
     at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) 
     at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) 
     at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
     at java.lang.Thread.run(Thread.java:745) 


    Container exited with a non-zero exit code 1 
    Failing this attempt. Failing the application. 
    16/06/18 19:14:36 INFO mapreduce.Job: Counters: 0 
    Job Finished in 28.862 seconds 
    Estimated value of Pi is 4.00000000000000000000 

der Behälter Protokoll hat die folgenden:

2016-06-18 19:14:32,273 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for application appattempt_1466277178029_0001_000002 
    2016-06-18 19:14:32,443 WARN [main] org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
    2016-06-18 19:14:32,475 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Executing with tokens: 
    2016-06-18 19:14:32,477 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Kind: YARN_AM_RM_TOKEN, Service: , Ident: ([email protected]) 
    2016-06-18 19:14:32,515 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Using mapred newApiCommitter. 
    2016-06-18 19:14:33,060 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Attempt num: 2 is last retry: true because a commit was started. 
    2016-06-18 19:14:33,061 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.JobEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$NoopEventHandler 
    2016-06-18 19:14:33,067 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.jobhistory.EventType for class org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler 
    2016-06-18 19:14:33,068 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.rm.ContainerAllocator$EventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter 
    2016-06-18 19:14:33,118 INFO [main] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system is set solely by core-default.xml therefore - ignoring 
    2016-06-18 19:14:33,141 INFO [main] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system is set solely by core-default.xml therefore - ignoring 
    2016-06-18 19:14:33,162 INFO [main] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system is set solely by core-default.xml therefore - ignoring 
    2016-06-18 19:14:33,183 INFO [main] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Emitting job history data to the timeline server is not enabled 
    2016-06-18 19:14:33,185 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Will not try to recover. recoveryEnabled: true recoverySupportedByCommitter: false numReduceTasks: 1 shuffleKeyValidForRecovery: true ApplicationAttemptID: 2 
    2016-06-18 19:14:33,210 INFO [main] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system is set solely by core-default.xml therefore - ignoring 
    2016-06-18 19:14:33,212 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Previous history file is at hdfs://namenode:9000/user/hdfs/.staging/job_1466277178029_0001/job_1466277178029_0001_1.jhist 
    2016-06-18 19:14:33,621 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.JobFinishEvent$Type for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler 
    2016-06-18 19:14:33,640 WARN [main] org.apache.hadoop.metrics2.impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-mrappmaster.properties,hadoop-metrics2.properties 
    2016-06-18 19:14:33,689 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 
    2016-06-18 19:14:33,689 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MRAppMaster metrics system started 
    2016-06-18 19:14:33,708 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: nodeBlacklistingEnabled:true 
    2016-06-18 19:14:33,708 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: maxTaskFailuresPerNode is 3 
    2016-06-18 19:14:33,708 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: blacklistDisablePercent is 33 
    2016-06-18 19:14:33,739 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at resource-manager/172.19.0.2:8030 
    2016-06-18 19:14:33,814 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: maxContainerCapability: <memory:4096, vCores:4> 
    2016-06-18 19:14:33,814 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: queue: root.hdfs 
    2016-06-18 19:14:33,837 INFO [main] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system is set solely by core-default.xml therefore - ignoring 
    2016-06-18 19:14:33,840 INFO [main] org.apache.hadoop.mapreduce.jobhistory.JobHistoryCopyService: History file is at hdfs://namenode:9000/user/hdfs/.staging/job_1466277178029_0001/job_1466277178029_0001_1.jhist 
    2016-06-18 19:14:33,894 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer setup for JobId: job_1466277178029_0001, File: hdfs://namenode:9000/user/hdfs/.staging/job_1466277178029_0001/job_1466277178029_0001_2.jhist 
    2016-06-18 19:14:33,959 WARN [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:SIMPLE) cause:java.io.IOException: Was asked to shut down. 
    2016-06-18 19:14:33,959 FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster 
    java.io.IOException: Was asked to shut down. 
     at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$4.run(MRAppMaster.java:1546) 
     at java.security.AccessController.doPrivileged(Native Method) 
     at javax.security.auth.Subject.doAs(Subject.java:422) 
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) 
     at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1540) 
     at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1473) 
    2016-06-18 19:14:33,962 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting with status 1 

Einige Male sagt, es kann nicht gefunden werden Konfiguration 'oder' Standard-Dateisystem wird nur von core-default.xml gesetzt '. Ist das wichtig? Falls dies etwas ändert, verwende ich das Cloudera Repo, um verschiedene hadoop Services zu installieren, anstatt ein .tar.gz zu entpacken.

Meine Config-Dateien sind:

Kern-site.xml

<configuration> 
     <property> 
     <name>fs.defaultFS</name> 
     <value>hdfs://namenode:9000</value> 
     </property> 
     <property> 
     <name>hadoop.proxyuser.mapred.groups</name> 
     <value>*</value> 
     </property> 
     <property> 
     <name>hadoop.proxyuser.mapred.hosts</name> 
     <value>*</value> 
     </property> 
    </configuration> 

yar-site.xml

<configuration> 
     <property> 
     <name>yarn.resourcemanager.hostname</name> 
     <value>resource-manager</value> 
     </property> 
     <property> 
     <name>yarn.resourcemanager.address</name> 
     <value>resource-manager:8032</value> 
     </property> 
     <property> 
     <name>yarn.resourcemanager.scheduler.address</name> 
     <value>resource-manager:8030</value> 
     </property> 
     <property> 
     <description>Classpath for typical applications.</description> 
     <name>yarn.application.classpath</name> 
     <value> 
      $HADOOP_CONF_DIR, 
      $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*, 
      $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*, 
      $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*, 
      $HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/* 
     </value> 
     </property> 
     <property> 
     <name>yarn.nodemanager.aux-services</name> 
     <value>mapreduce_shuffle</value> 
     </property> 
     <property> 
     <name>yarn.nodemanager.local-dirs</name> 
     <value>file:///data/1/yarn/local,file:///data/2/yarn/local,file:///data/3/yarn/local</value> 
     </property> 
     <property> 
     <name>yarn.nodemanager.log-dirs</name> 
     <value>file:///data/1/yarn/logs,file:///data/2/yarn/logs,file:///data/3/yarn/logs</value> 
     </property> 
     <property> 
     <name>yarn.log.aggregation-enable</name> 
     <value>true</value> 
     </property> 
     <property> 
     <description>Where to aggregate logs</description> 
     <name>yarn.nodemanager.remote-app-log-dir</name> 
     <value>hdfs://namenode:8020/var/log/hadoop-yarn/apps</value> 
     </property> 
     <property> 
     <name>yarn.resourcemanager.webapp.address</name> 
     <value>resource-manager:8088</value> 
     </property> 
     <property> 
     <name>yarn.resourcemanager.resource-tracker.address</name> 
     <value>resource-manager:8031</value> 
     </property> 
     <property> 
     <name>yarn.resourcemanager.admin.address</name> 
     <value>resource-manager:8033</value> 
     </property> 
     <property> 
     <name>yarn.nodemanager.delete.debug-delay-sec</name> 
     <value>600</value> 
     </property> 
     <property> 
     <name>yarn.nodemanager.resource.memory-mb</name> 
     <value>4096</value> 
     <description>Amount of physical memory, in MB, that can be allocated for containers.</description> 
     </property> 
     <property> 
     <name>yarn.scheduler.minimum-allocation-mb</name> 
     <value>1000</value> 
     </property> 
    </configuration> 

mapred-site.xml

<configuration> 
     <property> 
     <name>mapreduce.framework.name</name> 
     <value>yarn</value> 
     </property> 
     <property> 
     <name>mapred.job.tracker</name> 
     <value>namenode:8021</value> 
     </property> 
     <property> 
     <name>yarn.app.mapreduce.am.staging-dir</name> 
     <value>/user</value> 
     </property> 
     <property> 
     <name>mapreduce.jobhistory.address</name> 
     <value>history-server:10020</value> 
     <description>Enter your JobHistoryServer hostname.</description> 
     </property> 
     <property> 
     <name>mapreduce.jobhistory.webapp.address</name> 
     <value>history-server:19888</value> 
     <description>Enter your JobHistoryServer hostname.</description> 
     </property> 
     <property> 
     <name>yarn.app.mapreduce.am.job.client.port-range</name> 
     <value>50100-50200</value> 
     </property> 
    </configuration> 

hdfs -site.xml

<configuration> 
     <property> 
     <name>dfs.permissions.superusergroup</name> 
     <value>hadoop</value> 
     </property> 
     <property> 
     <name>dfs.name.dir or dfs.namenode.name.dir</name> 
     <value>file:///data/1/dfs/nn,file:///nfsmount/dfs/nn</value> 
     </property> 
     <property> 
     <name>dfs.data.dir or dfs.datanode.data.dir</name> 
     <value>file:///data/1/dfs/dn,file:///data/2/dfs/dn,file:///data/3/dfs/dn,file:///data/4/dfs/dn</value> 
     </property> 
     <property> 
     <name>dfs.namenode.http-address</name> 
     <value>namenode:50070</value> 
     <description> 
     The address and the base port on which the dfs NameNode Web UI will listen. 
     </description> 
     </property> 
     <property> 
     <name>dfs.webhdfs.enabled</name> 
     <value>true</value> 
     </property> 
    </configuration> 

Danke fürs Lesen.

Antwort

0

Für alle, die das gleiche Problem hat die Lösung die folgenden auf die hdfs-site.xml hinzuzufügen ist:

<property> 
    <name>dfs.safemode.threshold.pct</name> 
    <value>0</value> 
</property>