kann nicht ausgeführt werden, Ich schätze es, wenn Sie mir etwas Licht geben können.Spark Schritt in EMR
Ich hatte Probleme beim Ausführen von Wortzählung Map reduzieren in Amazon EMR als Spark-Schritt. Aber ich habe es geschafft, SSH zu Master-Knoten und Word-Count-Logik in Spark-Shell ohne Problem zu laufen.
Sie wirft __spark_conf_xx.zip nicht auf Master HDFS existiert, obwohl kein Fehler beim Kopieren
16/04/05 07:20:21 INFO yarn.Client: Uploading resource file:/mnt/tmp/spark-1d701ab0-7990-4ca2-bee2-099aed8e8e6b/__spark_conf__9006968814682693730.zip -> hdfs://ip-172-31-26-247.ap-northeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1459839685827_0001/__spark_conf__9006968814682693730.zip
Das Protokoll wird wie folgt dar:
16/04/05 07:20:16 INFO client.RMProxy: Connecting to ResourceManager at ip-172-31-26-247.ap-northeast-1.compute.internal/172.31.26.247:8032
16/04/05 07:20:16 INFO yarn.Client: Requesting a new application from cluster with 2 NodeManagers
16/04/05 07:20:16 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (11520 MB per container)
16/04/05 07:20:16 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
16/04/05 07:20:16 INFO yarn.Client: Setting up container launch context for our AM
16/04/05 07:20:16 INFO yarn.Client: Setting up the launch environment for our AM container
16/04/05 07:20:16 INFO yarn.Client: Preparing resources for our AM container
16/04/05 07:20:17 INFO yarn.Client: Uploading resource file:/usr/lib/spark/lib/spark-assembly-1.6.1-hadoop2.7.2-amzn-0.jar -> hdfs://ip-172-31-26-247.ap-northeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1459839685827_0001/spark-assembly-1.6.1-hadoop2.7.2-amzn-0.jar
16/04/05 07:20:18 INFO metrics.MetricsSaver: MetricsConfigRecord disabledInCluster: false instanceEngineCycleSec: 60 clusterEngineCycleSec: 60 disableClusterEngine: false maxMemoryMb: 3072 maxInstanceCount: 500 lastModified: 1459839695291
16/04/05 07:20:18 INFO metrics.MetricsSaver: Created MetricsSaver j-3AZL0AH5ALBBL:i-96753119:SparkSubmit:11699 period:60 /mnt/var/em/raw/i-96753119_20160405_SparkSubmit_11699_raw.bin
16/04/05 07:20:19 INFO metrics.MetricsSaver: 1 aggregated HDFSWriteDelay 2327 raw values into 1 aggregated values, total 1
16/04/05 07:20:20 INFO fs.EmrFileSystem: Consistency disabled, using com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem as filesystem implementation
16/04/05 07:20:20 INFO yarn.Client: Uploading resource s3://gda-test/logic/wordCount.jar -> hdfs://ip-172-31-26-247.ap-northeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1459839685827_0001/wordCount.jar
16/04/05 07:20:20 INFO s3n.S3NativeFileSystem: Opening 's3://gda-test/logic/wordCount.jar' for reading
16/04/05 07:20:20 INFO metrics.MetricsSaver: Thread 1 created MetricsLockFreeSaver 1
16/04/05 07:20:21 INFO metrics.MetricsSaver: 1 MetricsLockFreeSaver 1 comitted 33 matured S3ReadDelay values
16/04/05 07:20:21 INFO yarn.Client: Uploading resource file:/mnt/tmp/spark-1d701ab0-7990-4ca2-bee2-099aed8e8e6b/__spark_conf__9006968814682693730.zip -> hdfs://ip-172-31-26-247.ap-northeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1459839685827_0001/__spark_conf__9006968814682693730.zip
16/04/05 07:20:21 INFO spark.SecurityManager: Changing view acls to: hadoop
16/04/05 07:20:21 INFO spark.SecurityManager: Changing modify acls to: hadoop
16/04/05 07:20:21 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
16/04/05 07:20:21 INFO yarn.Client: Submitting application 1 to ResourceManager
16/04/05 07:20:21 INFO impl.YarnClientImpl: Submitted application application_1459839685827_0001
16/04/05 07:20:22 INFO yarn.Client: Application report for application_1459839685827_0001 (state: ACCEPTED)
16/04/05 07:20:22 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1459840821323
final status: UNDEFINED
tracking URL: http://ip-172-31-26-247.ap-northeast-1.compute.internal:20888/proxy/application_1459839685827_0001/
user: hadoop
16/04/05 07:20:23 INFO yarn.Client: Application report for application_1459839685827_0001 (state: ACCEPTED)
16/04/05 07:20:24 INFO yarn.Client: Application report for application_1459839685827_0001 (state: ACCEPTED)
16/04/05 07:20:25 INFO yarn.Client: Application report for application_1459839685827_0001 (state: ACCEPTED)
16/04/05 07:20:26 INFO yarn.Client: Application report for application_1459839685827_0001 (state: ACCEPTED)
16/04/05 07:20:27 INFO yarn.Client: Application report for application_1459839685827_0001 (state: ACCEPTED)
16/04/05 07:20:28 INFO yarn.Client: Application report for application_1459839685827_0001 (state: ACCEPTED)
16/04/05 07:20:29 INFO yarn.Client: Application report for application_1459839685827_0001 (state: ACCEPTED)
16/04/05 07:20:30 INFO yarn.Client: Application report for application_1459839685827_0001 (state: ACCEPTED)
16/04/05 07:20:31 INFO yarn.Client: Application report for application_1459839685827_0001 (state: FAILED)
16/04/05 07:20:31 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1459839685827_0001 failed 2 times due to AM Container for appattempt_1459839685827_0001_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://ip-172-31-26-247.ap-northeast-1.compute.internal:8088/cluster/app/application_1459839685827_0001Then, click on links to logs of each attempt.
Diagnostics: File does not exist: hdfs://ip-172-31-26-247.ap-northeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1459839685827_0001/__spark_conf__9006968814682693730.zip
java.io.FileNotFoundException: File does not exist: hdfs://ip-172-31-26-247.ap-northeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1459839685827_0001/__spark_conf__9006968814682693730.zip
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1459840821323
final status: FAILED
tracking URL: http://ip-172-31-26-247.ap-northeast-1.compute.internal:8088/cluster/app/application_1459839685827_0001
user: hadoop
Exception in thread "main" org.apache.spark.SparkException: Application application_1459839685827_0001 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1034)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/04/05 07:20:31 INFO util.ShutdownHookManager: Shutdown hook called
16/04/05 07:20:31 INFO util.ShutdownHookManager: Deleting directory /mnt/tmp/spark-1d701ab0-7990-4ca2-bee2-099aed8e8e6b
Command exiting with ret '1'