2016-04-05 3 views
0

kann nicht ausgeführt werden, Ich schätze es, wenn Sie mir etwas Licht geben können.Spark Schritt in EMR

Ich hatte Probleme beim Ausführen von Wortzählung Map reduzieren in Amazon EMR als Spark-Schritt. Aber ich habe es geschafft, SSH zu Master-Knoten und Word-Count-Logik in Spark-Shell ohne Problem zu laufen.

Sie wirft __spark_conf_xx.zip nicht auf Master HDFS existiert, obwohl kein Fehler beim Kopieren

16/04/05 07:20:21 INFO yarn.Client: Uploading resource file:/mnt/tmp/spark-1d701ab0-7990-4ca2-bee2-099aed8e8e6b/__spark_conf__9006968814682693730.zip -> hdfs://ip-172-31-26-247.ap-northeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1459839685827_0001/__spark_conf__9006968814682693730.zip 

Das Protokoll wird wie folgt dar:

16/04/05 07:20:16 INFO client.RMProxy: Connecting to ResourceManager at ip-172-31-26-247.ap-northeast-1.compute.internal/172.31.26.247:8032 
16/04/05 07:20:16 INFO yarn.Client: Requesting a new application from cluster with 2 NodeManagers 
16/04/05 07:20:16 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (11520 MB per container) 
16/04/05 07:20:16 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead 
16/04/05 07:20:16 INFO yarn.Client: Setting up container launch context for our AM 
16/04/05 07:20:16 INFO yarn.Client: Setting up the launch environment for our AM container 
16/04/05 07:20:16 INFO yarn.Client: Preparing resources for our AM container 
16/04/05 07:20:17 INFO yarn.Client: Uploading resource file:/usr/lib/spark/lib/spark-assembly-1.6.1-hadoop2.7.2-amzn-0.jar -> hdfs://ip-172-31-26-247.ap-northeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1459839685827_0001/spark-assembly-1.6.1-hadoop2.7.2-amzn-0.jar 
16/04/05 07:20:18 INFO metrics.MetricsSaver: MetricsConfigRecord disabledInCluster: false instanceEngineCycleSec: 60 clusterEngineCycleSec: 60 disableClusterEngine: false maxMemoryMb: 3072 maxInstanceCount: 500 lastModified: 1459839695291 
16/04/05 07:20:18 INFO metrics.MetricsSaver: Created MetricsSaver j-3AZL0AH5ALBBL:i-96753119:SparkSubmit:11699 period:60 /mnt/var/em/raw/i-96753119_20160405_SparkSubmit_11699_raw.bin 
16/04/05 07:20:19 INFO metrics.MetricsSaver: 1 aggregated HDFSWriteDelay 2327 raw values into 1 aggregated values, total 1 
16/04/05 07:20:20 INFO fs.EmrFileSystem: Consistency disabled, using com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem as filesystem implementation 
16/04/05 07:20:20 INFO yarn.Client: Uploading resource s3://gda-test/logic/wordCount.jar -> hdfs://ip-172-31-26-247.ap-northeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1459839685827_0001/wordCount.jar 
16/04/05 07:20:20 INFO s3n.S3NativeFileSystem: Opening 's3://gda-test/logic/wordCount.jar' for reading 
16/04/05 07:20:20 INFO metrics.MetricsSaver: Thread 1 created MetricsLockFreeSaver 1 
16/04/05 07:20:21 INFO metrics.MetricsSaver: 1 MetricsLockFreeSaver 1 comitted 33 matured S3ReadDelay values 
16/04/05 07:20:21 INFO yarn.Client: Uploading resource file:/mnt/tmp/spark-1d701ab0-7990-4ca2-bee2-099aed8e8e6b/__spark_conf__9006968814682693730.zip -> hdfs://ip-172-31-26-247.ap-northeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1459839685827_0001/__spark_conf__9006968814682693730.zip 
16/04/05 07:20:21 INFO spark.SecurityManager: Changing view acls to: hadoop 
16/04/05 07:20:21 INFO spark.SecurityManager: Changing modify acls to: hadoop 
16/04/05 07:20:21 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop) 
16/04/05 07:20:21 INFO yarn.Client: Submitting application 1 to ResourceManager 
16/04/05 07:20:21 INFO impl.YarnClientImpl: Submitted application application_1459839685827_0001 
16/04/05 07:20:22 INFO yarn.Client: Application report for application_1459839685827_0001 (state: ACCEPTED) 
16/04/05 07:20:22 INFO yarn.Client: 
    client token: N/A 
    diagnostics: N/A 
    ApplicationMaster host: N/A 
    ApplicationMaster RPC port: -1 
    queue: default 
    start time: 1459840821323 
    final status: UNDEFINED 
    tracking URL: http://ip-172-31-26-247.ap-northeast-1.compute.internal:20888/proxy/application_1459839685827_0001/ 
    user: hadoop 
16/04/05 07:20:23 INFO yarn.Client: Application report for application_1459839685827_0001 (state: ACCEPTED) 
16/04/05 07:20:24 INFO yarn.Client: Application report for application_1459839685827_0001 (state: ACCEPTED) 
16/04/05 07:20:25 INFO yarn.Client: Application report for application_1459839685827_0001 (state: ACCEPTED) 
16/04/05 07:20:26 INFO yarn.Client: Application report for application_1459839685827_0001 (state: ACCEPTED) 
16/04/05 07:20:27 INFO yarn.Client: Application report for application_1459839685827_0001 (state: ACCEPTED) 
16/04/05 07:20:28 INFO yarn.Client: Application report for application_1459839685827_0001 (state: ACCEPTED) 
16/04/05 07:20:29 INFO yarn.Client: Application report for application_1459839685827_0001 (state: ACCEPTED) 
16/04/05 07:20:30 INFO yarn.Client: Application report for application_1459839685827_0001 (state: ACCEPTED) 
16/04/05 07:20:31 INFO yarn.Client: Application report for application_1459839685827_0001 (state: FAILED) 
16/04/05 07:20:31 INFO yarn.Client: 
    client token: N/A 
    diagnostics: Application application_1459839685827_0001 failed 2 times due to AM Container for appattempt_1459839685827_0001_000002 exited with exitCode: -1000 
For more detailed output, check application tracking page:http://ip-172-31-26-247.ap-northeast-1.compute.internal:8088/cluster/app/application_1459839685827_0001Then, click on links to logs of each attempt. 
Diagnostics: File does not exist: hdfs://ip-172-31-26-247.ap-northeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1459839685827_0001/__spark_conf__9006968814682693730.zip 
java.io.FileNotFoundException: File does not exist: hdfs://ip-172-31-26-247.ap-northeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1459839685827_0001/__spark_conf__9006968814682693730.zip 
    at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309) 
    at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301) 
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) 
    at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301) 
    at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253) 
    at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63) 
    at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361) 
    at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:415) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) 
    at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358) 
    at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62) 
    at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
    at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
    at java.lang.Thread.run(Thread.java:745) 

Failing this attempt. Failing the application. 
    ApplicationMaster host: N/A 
    ApplicationMaster RPC port: -1 
    queue: default 
    start time: 1459840821323 
    final status: FAILED 
    tracking URL: http://ip-172-31-26-247.ap-northeast-1.compute.internal:8088/cluster/app/application_1459839685827_0001 
    user: hadoop 
Exception in thread "main" org.apache.spark.SparkException: Application application_1459839685827_0001 finished with failed status 
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:1034) 
    at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081) 
    at org.apache.spark.deploy.yarn.Client.main(Client.scala) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:606) 
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) 
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) 
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) 
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) 
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 
16/04/05 07:20:31 INFO util.ShutdownHookManager: Shutdown hook called 
16/04/05 07:20:31 INFO util.ShutdownHookManager: Deleting directory /mnt/tmp/spark-1d701ab0-7990-4ca2-bee2-099aed8e8e6b 
Command exiting with ret '1' 

Antwort

4

ich die Lösung gefunden.

Es wurde von der Java-Versionskonflikt verursacht, weil Logik und Jar in Java8 ist, während EMR-Cluster Java7 standardmäßig verwendet.

In meinem Fall von Spark & Hadoop, muss ich Env wie folgt anpassen mit Advanced Option beim Erstellen von Cluster. http://docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/emr-configure-apps.html#configuring-java8

Ich hoffe, diese Informationen sind nützlich für diejenigen, die das gleiche Problem haben.