Комментарии:
Great talk.
ОтветитьIt's great, just wonder whether we can use pyspark with install spark? Such as can I install the pyspark on my local machine(without spark installed), and use pyspark to connect remote spark cluster?
Ответитьgreat stuff.
ОтветитьIs there a place to download (or re-create) Josh's example data file "wikipedia-100"? I'd like to play along with his tutorial during the video
ОтветитьNumPy will have to be present on the workers' Python import paths. PySpark has a SparkContext.addPyFile() mechanism for shipping library dependencies with jobs. I'm not sure whether NumPy binaries can be packaged as .egg or .zip files for that, though. Another option is to install NumPy somewhere and add its installation path to PYTHONPATH in spark-env.sh (on each worker) so that it's set in each worker's environment when they launch their Python processes.
Ответитьgreat presentation. I am wondering if I use numpy, then do I have to install numpy for spark workers ?
Ответить