Hadoop-Streaming-Job fehlgeschlagen (nicht erfolgreich) in Python

So funktionieren meine Skripts perfekt, wenn ich: cat England.txt | ./mapperEngl.py | sortieren | ./reducerEngl.pyHadoop-Streaming-Job fehlgeschlagen (nicht erfolgreich) in Python

Allerdings, wenn ich laufen:

/shared/hadoop/Aktuell/bin/hadoop jar /shared/hadoop/cur/share/hadoop/tools/lib/hadoop-streaming-2.6. 0.jar -datei /home/hadoop/mapperEngl.py -mapper /home/hadoop/mapperEngl.py -datei /home/hadoop/reducerEngl.py -reducer /home/hadoop/reducerEngl.py -input/datadir/England. txt -Ausgang /outputdir/climateresults3.txt

ich erhalte die folgende Fehlermeldung:

16/05/03 09:27:15 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead. 
16/05/03 09:27:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
packageJobJar: [/home/hadoop/mapperEngl.py, /home/hadoop/reducerEngl.py, /tmp/hadoop-unjar6814867016081507297/] [] /tmp/streamjob1585723008278678599.jar tmpDir=null 
16/05/03 09:27:16 INFO client.RMProxy: Connecting to ResourceManager at mgmt-florida-poly-eth0/10.200.209.10:8032 
16/05/03 09:27:16 INFO client.RMProxy: Connecting to ResourceManager at mgmt-florida-poly-eth0/10.200.209.10:8032 
16/05/03 09:27:17 INFO mapred.FileInputFormat: Total input paths to process : 1 
16/05/03 09:27:17 INFO mapreduce.JobSubmitter: number of splits:2 
16/05/03 09:27:17 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1459438007195_0006 
16/05/03 09:27:17 INFO impl.YarnClientImpl: Submitted application application_1459438007195_0006 
16/05/03 09:27:17 INFO mapreduce.Job: The url to track the job: http://mgmt-florida-poly-eth0:8088/proxy/application_1459438007195_0006/ 
16/05/03 09:27:17 INFO mapreduce.Job: Running job: job_1459438007195_0006 
16/05/03 09:27:25 INFO mapreduce.Job: Job job_1459438007195_0006 running in uber mode : false 
16/05/03 09:27:25 INFO mapreduce.Job: map 0% reduce 0% 
16/05/03 09:27:31 INFO mapreduce.Job: map 50% reduce 0% 
16/05/03 09:27:32 INFO mapreduce.Job: map 100% reduce 0% 
16/05/03 09:27:38 INFO mapreduce.Job: Task Id : attempt_1459438007195_0006_r_000000_0, Status : FAILED 
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1 
     at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322) 
     at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535) 
     at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134) 
     at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237) 
     at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459) 
     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) 
     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) 
     at java.security.AccessController.doPrivileged(Native Method) 
     at javax.security.auth.Subject.doAs(Subject.java:415) 
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) 
     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) 

16/05/03 09:27:45 INFO mapreduce.Job: Task Id : attempt_1459438007195_0006_r_000000_1, Status : FAILED 
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1 
     at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322) 
     at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535) 
     at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134) 
     at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237) 
     at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459) 
     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) 
     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) 
     at java.security.AccessController.doPrivileged(Native Method) 
     at javax.security.auth.Subject.doAs(Subject.java:415) 
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) 
     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) 

16/05/03 09:27:51 INFO mapreduce.Job: Task Id : attempt_1459438007195_0006_r_000000_2, Status : FAILED 
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1 
     at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322) 
     at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535) 
     at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134) 
     at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237) 
     at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459) 
     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) 
     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) 
     at java.security.AccessController.doPrivileged(Native Method) 
     at javax.security.auth.Subject.doAs(Subject.java:415) 
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) 
     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) 

16/05/03 09:27:58 INFO mapreduce.Job: map 100% reduce 100% 
16/05/03 09:27:58 INFO mapreduce.Job: Job job_1459438007195_0006 failed with state FAILED due to: Task failed task_1459438007195_0006_r_000000 
Job failed as tasks failed. failedMaps:0 failedReduces:1 

16/05/03 09:27:58 INFO mapreduce.Job: Counters: 37 
     File System Counters 
       FILE: Number of bytes read=0 
       FILE: Number of bytes written=228560 
       FILE: Number of read operations=0 
       FILE: Number of large read operations=0 
       FILE: Number of write operations=0 
       HDFS: Number of bytes read=29265 
       HDFS: Number of bytes written=0 
       HDFS: Number of read operations=6 
       HDFS: Number of large read operations=0 
       HDFS: Number of write operations=0 
     Job Counters 
       Failed reduce tasks=4 
       Launched map tasks=2 
       Launched reduce tasks=4 
       Rack-local map tasks=2 
       Total time spent by all maps in occupied slots (ms)=134880 
       Total time spent by all reduces in occupied slots (ms)=242432 
       Total time spent by all map tasks (ms)=8430 
       Total time spent by all reduce tasks (ms)=15152 
       Total vcore-seconds taken by all map tasks=8430 
       Total vcore-seconds taken by all reduce tasks=15152 
       Total megabyte-seconds taken by all map tasks=17264640 
       Total megabyte-seconds taken by all reduce tasks=31031296 
     Map-Reduce Framework 
       Map input records=107 
       Map output records=223 
       Map output bytes=9014 
       Map output materialized bytes=9472 
       Input split bytes=202 
       Combine input records=0 
       Spilled Records=223 
       Failed Shuffles=0 
       Merged Map outputs=0 
       GC time elapsed (ms)=0 
       CPU time spent (ms)=1540 
       Physical memory (bytes) snapshot=1305165824 
       Virtual memory (bytes) snapshot=5482422272 
       Total committed heap usage (bytes)=2022440960 
     File Input Format Counters 
       Bytes Read=29063 
16/05/03 09:27:58 ERROR streaming.StreamJob: Job not successful! 
Streaming Command Failed! 
[[email protected] ~]$

I-Lösung von anderen Fragen, und es versucht haben, scheint nicht zu funktionieren.

Ja, nur komplett hier stecken.

Quelle

2016-05-03 Joel Antonio Delacruz Paredes

Ich habe eine Weile nicht mit hadoop gearbeitet, aber ich denke, es sollte einen Weg geben, stderr von versagender Arbeit zu bekommen, es sollte dir helfen zu verstehen, warum der Job nicht glücklich ist. –

Wie lautet das Eingabeformat & was macht der Mapper/Reducer? – Akarsh

verrückt wie, aber ich meine mit # behoben haben!/Usr/bin/python anstatt #!/Usr/bin/python3

Ich denke, es ein Problem mit unserem Hadoop Cluster Config ist.

Quelle

2017-03-30 07:50:49 user7790450

anscheinend hatte ich Probleme mit dem Mischen von Leerzeichen und Tabs für den Einzug. – user7790450

Hadoop-Streaming-Job fehlgeschlagen (nicht erfolgreich) in Python

Antwort

Verwandte Themen