Hello,
I am trying to set up Pyspark in a cloned arcpro environment. To clone the environment I use the package manager in arcpro cloning the arcgispro-py3 standard environment. Once that is set up I install Spyder, my preferred IDE. I then pip install Pyspark, I am currently trying to convert several scripts from using pandas to using the pandas api in Pyspark but when I run the code I get an error telling me that my version of Pyarrow is not high enough. From what I can tell the cloned environment comes with Pyarrow but just a lower version that what Pyspark requires.
I have tried using conda forge to upgrade the package but that results in messing with something in the numpy install.
Has anyone else had success in running Pyspark in a cloned environment? Or does anyone have any advice as to how I should try using this package within a cloned env?
Any help is greatly appreciated.
Thanks!
You might be better off trying to installing a lower version of Pyspark to avoid creating conflicts with what you have, letting the arrow version guide what version is needed.
See if anything on Conda works