Ich bin neu für pyspark und möchte pyspark verwenden Ipython Notebook in meinem Ubuntu 12.04 Maschine. Im Folgenden finden Sie die Konfiguration für Pyspark und Ipython Notebook.PySpark SparkContext Name Fehler 'sc' in jupyter
[email protected]:~$ echo $JAVA_HOME
/usr/lib/jvm/java-8-oracle
# Path for Spark
[email protected]:~$ ls /home/sparkuser/spark/
bin CHANGES.txt data examples LICENSE NOTICE R RELEASE scala-2.11.6.deb
build conf ec2 lib licenses python README.md sbin spark-1.5.2-bin-hadoop2.6.tgz
Ich installierte Anaconda2 4.0.0 und den Pfad für anaconda:
[email protected]:~$ ls anaconda2/
bin conda-meta envs etc Examples imports include lib LICENSE.txt mkspecs pkgs plugins share ssl tests
PySpark Profil für IPython erstellen.
ipython profile create pyspark
[email protected]:~$ cat .bashrc
export SPARK_HOME="$HOME/spark"
export PYSPARK_SUBMIT_ARGS="--master local[2]"
# added by Anaconda2 4.0.0 installer
export PATH="/home/sparkuser/anaconda2/bin:$PATH"
Erstellen Sie eine Datei namens ~/.ipython/profile_pyspark/startup/00-pyspark-setup.py:
[email protected]:~$ cat .ipython/profile_pyspark/startup/00-pyspark-setup.py
import os
import sys
spark_home = os.environ.get('SPARK_HOME', None)
sys.path.insert(0, spark_home + "/python")
sys.path.insert(0, os.path.join(spark_home, 'python/lib/py4j-0.8.2.1-src.zip'))
filename = os.path.join(spark_home, 'python/pyspark/shell.py')
exec(compile(open(filename, "rb").read(), filename, 'exec'))
spark_release_file = spark_home + "/RELEASE"
if os.path.exists(spark_release_file) and "Spark 1.5.2" in open(spark_release_file).read():
pyspark_submit_args = os.environ.get("PYSPARK_SUBMIT_ARGS", "")
if not "pyspark-shell" in pyspark_submit_args:
pyspark_submit_args += " pyspark-shell"
os.environ["PYSPARK_SUBMIT_ARGS"] = pyspark_submit_args
in Logging Terminal pyspark:
[email protected]:~$ ~/spark/bin/pyspark
Python 2.7.11 |Anaconda 4.0.0 (64-bit)| (default, Dec 6 2015, 18:08:32)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/04/22 21:06:55 INFO SparkContext: Running Spark version 1.5.2
16/04/22 21:07:27 INFO BlockManagerMaster: Registered BlockManager
Welcome to
____ __
/__/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__/.__/\_,_/_/ /_/\_\ version 1.5.2
/_/
Using Python version 2.7.11 (default, Dec 6 2015 18:08:32)
SparkContext available as sc, HiveContext available as sqlContext.
>>> sc
<pyspark.context.SparkContext object at 0x7facb75b50d0>
>>>
Als ich Führen Sie den folgenden Befehl aus, ein Juypter-Browser wird geöffnet.
[email protected]:~$ ipython notebook --profile=pyspark
[TerminalIPythonApp] WARNING | Subcommand `ipython notebook` is deprecated and will be removed in future versions.
[TerminalIPythonApp] WARNING | You likely want to use `jupyter notebook`... continue in 5 sec. Press Ctrl-C to quit now.
[W 21:32:08.070 NotebookApp] Unrecognized alias: '--profile=pyspark', it will probably have no effect.
[I 21:32:08.111 NotebookApp] Serving notebooks from local directory: /home/sparkuser
[I 21:32:08.111 NotebookApp] 0 active kernels
[I 21:32:08.111 NotebookApp] The Jupyter Notebook is running at: http://localhost:8888/
[I 21:32:08.111 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
Created new window in existing browser session.
Wenn ich im Browser den folgenden Befehl eintippe, wird NameError ausgelöst.
In [ ]: print sc
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-2-ee8101b8fe58> in <module>()
----> 1 print sc
NameError: name 'sc' is not defined
Wenn ich den obigen Befehl in pyspark Terminal ausgeführt, ist es die erforderliche Ausgabe auszugeben, aber wenn ich den gleichen Befehl in jupyter ausgeführt ist es die oben genannten Fehler zu werfen.
Oben sind die Konfigurationseinstellungen von pyspark und Ipython. Wie konfiguriert man den Pyspark mit Jupyter?