PySpark: Python API for Spark

PySpark: Python API for Spark

Stoney

11 лет назад

53,373 Просмотров

Ссылки и html тэги не поддерживаются


Комментарии:

Inzululwazi Inzululwazi
Inzululwazi Inzululwazi - 13.11.2017 11:40

Great talk.

Ответить
kidexp
kidexp - 22.01.2015 23:12

It's great, just wonder whether we can use pyspark with install spark? Such as can I install the pyspark on my local machine(without spark installed), and use pyspark to connect remote spark cluster?

Ответить
Liguo Kong
Liguo Kong - 23.06.2014 03:46

great stuff. 

Ответить
Glenn Strycker
Glenn Strycker - 25.04.2014 22:06

Is there a place to download (or re-create) Josh's example data file "wikipedia-100"?  I'd like to play along with his tutorial during the video

Ответить
Josh Rosen
Josh Rosen - 26.09.2013 09:13

NumPy will have to be present on the workers' Python import paths. PySpark has a SparkContext.addPyFile() mechanism for shipping library dependencies with jobs. I'm not sure whether NumPy binaries can be packaged as .egg or .zip files for that, though. Another option is to install NumPy somewhere and add its installation path to PYTHONPATH in spark-env.sh (on each worker) so that it's set in each worker's environment when they launch their Python processes.

Ответить
Trung Huynh
Trung Huynh - 05.09.2013 19:59

great presentation. I am wondering if I use numpy, then do I have to install numpy for spark workers ?

Ответить