Select to view content in your preferred language

Pyspark and Pyarrow version issue

335
1
11-06-2023 06:06 AM
solar_man
New Contributor

Hello,

I am trying to set up Pyspark in a cloned arcpro environment. To clone the environment I use the package manager in arcpro cloning the arcgispro-py3 standard environment. Once that is set up I install Spyder, my preferred IDE. I then pip install Pyspark, I am currently trying to convert several scripts from using pandas to using the pandas api in Pyspark but when I run the code I get an error telling me that my version of Pyarrow is not high enough. From what I can tell the cloned environment comes with Pyarrow but just a lower version that what Pyspark requires. 

I have tried using conda forge to upgrade the package but that results in messing with something in the numpy install. 

Has anyone else had success in running Pyspark in a cloned environment? Or does anyone have any advice as to how I should try using this package within a cloned env?

Any help is greatly appreciated. 

Thanks!

0 Kudos
1 Reply
DanPatterson
MVP Esteemed Contributor

You might be better off trying to installing a lower version of Pyspark to avoid creating conflicts with what you have, letting the arrow version guide what version is needed.

See if anything on Conda works

Files :: Anaconda.org

 


... sort of retired...
0 Kudos