Vectorized UDF: Scalable Analysis with Python and PySpark - Li Jin

Vectorized UDF: Scalable Analysis with Python and PySpark - Li Jin

Databricks

5 лет назад

5,906 Просмотров

Ссылки и html тэги не поддерживаются


Комментарии:

@dannykaplun8081
@dannykaplun8081 - 03.04.2023 19:31

One of the best videos on this topic that i have seen. Thanks!

Ответить
@Gerald-iz7mv
@Gerald-iz7mv - 05.03.2023 19:03

arent pandas UDF working on multiple rows? please correct me if im wrong.

Ответить
@haneulkim4902
@haneulkim4902 - 08.11.2022 10:14

Thanks for great talk! When using grouped map pandas udf for model training I am assuming that your group_column=id is unique. This doesn't hurt the performance since it needs to groupby unique items ?

Ответить
@SatyaKomatineni
@SatyaKomatineni - 30.09.2019 16:26

An essential great talk.

Ответить
@vinothpsg
@vinothpsg - 17.12.2018 23:12

Great video. Thanks so much. I am using pandas udf as a solution, where I ran into severe memory issues because of the serialization involved using python objects. I am using grouped map pandas udf and it expects static return type and that much poses a challenge in writing generic functions that can be decorated with pandas udf. Is there a way i can infer the return type during run time while using pandas udf?

Ответить