how can i find points with same x,y data and different z value

KONPETROV · ‎01-19-2015

I am reposting this question which was deleted by my mistake.(but a bit different)

I have a shp with over 13.000.000 points with x.y.z data. My data are decimals (333456.765432)and i want them to be integer (333456). After that i want to keep only data with same x,y. I want to find points which are one under the other (vertical). I managed to do that exporting it in csv and writing a script in python 3.4.7. 64bit. However i am using ArcGis Desctop 10.2.2. and this version uses python 2.7. 32 bit I didn't manage to write it at the other version because i am running out of memory ( Richard Fairhurst), although my computer has 8 GB Ram, free space on hard disc and at least good processor.

Answering to your questions

! I made them round to make comparing them easier ( Joshua Bixby ), i think it won't be a problem not having accuracy (from .1 to .9), am i wrong?

! how can i make it run without exporting it in csv.

! i cannot figure it out why i cannot write such script in Python 2.7 which my gis is using.

! if i find a solution at that i want to sort them by x,y data and only by pairs which differ 100meter (vertical). If a point is common between two pairs (because it may differ with another point also 100 meters) i want to be written again, as in the line 7 of x y z table. For example

i x y z

1 33 42 600

2 33 42 500

3 33 42 200

4 33 42 100

5 56 12 900

6 56 12 800

7 56 12 800

8 56 12 700

9 56 12 200

10 56 12 100

I am posting the code.

import math

f=open("C:\\Users\\Desktop\\\\DEM\\Sik\\points_3m.csv","r")

#g=open("C:\\Users\\Desktop\\test.csv","w")

data={}

for line in f:

line=line.split(";")

x=math.trunc(float(line[0]))

y=math.trunc(float(line[1]))

z=math.trunc(float(line[2]))

if (x,y) in data:

if data[(x,y)]['max']<z:

data[(x,y)]['max']=z

if data[(x,y)]['min']>z:

data[(x,y)]['min']=z

else:

data[(x,y)]={'max':z, 'min':z}

#print("finished reading\n")

f.close()

i=0

for key in data:

zmin=data[key]['min']

zmax=data[key]['max']

while zmin<zmax:

i+=1

#g.write("OID,X,Y,Z\n")

#g.write(str(i) + "," + str(key[0]) + "," + str(key[1]) + "," + str(zmin) + "\n")

print("OID,X,Y,Z\n")

print(str(i) + "," + str(key[0]) + "," + str(key[1]) + "," + str(zmin) + "\n")

zmin+=1

print(str(key[0]) + " " + str(key[1]) + " " + str(zmin) + "\n")

#print("finished writing\n")

#g.close()

XanderBakker · ‎01-20-2015

There are several ways to do this. The nice thing about having spatial information is that you can treat it as such. Since you are evaluating difference in height for the "same" location you could process parts of the file (selected by location). Now this will be a little less slow, however creating a spatial selection might be slow anyway when processing 13 million points (in a shapefile which is I believe slower than a file geodatabase that is properly indexed).

Now what I wonder is this; the 13 million points sound as LiDAR data. If you have a LAS file you can create a LAS dataset and use the LAS Point Statistics as Raster tool (ArcGIS Help (10.2, 10.2.1, and 10.2.2) ) which has an option to extract the Z_RANGE.

If you don't have the LAS file, you can create one using LAStools converting a text file to a LAS file.

View solution in original post

JoshuaBixby · ‎01-20-2015

Haven't had time to re-review the post yet, but I did notice the restricted visibility. Opening it up might get more feedback, maybe better feedback.

XanderBakker · ‎01-20-2015

There are several ways to do this. The nice thing about having spatial information is that you can treat it as such. Since you are evaluating difference in height for the "same" location you could process parts of the file (selected by location). Now this will be a little less slow, however creating a spatial selection might be slow anyway when processing 13 million points (in a shapefile which is I believe slower than a file geodatabase that is properly indexed).

Now what I wonder is this; the 13 million points sound as LiDAR data. If you have a LAS file you can create a LAS dataset and use the LAS Point Statistics as Raster tool (ArcGIS Help (10.2, 10.2.1, and 10.2.2) ) which has an option to extract the Z_RANGE.

If you don't have the LAS file, you can create one using LAStools converting a text file to a LAS file.

KONPETROV · ‎01-21-2015

Mr Xander Bakker i don't have lidar data, i have only a dem. With a LAS file i can do that without extracting points as csv in desctop? Simple inside from gis's LAStools? After i find the z-range will i have my data as x, y ,z and sorted with x ,y?

XanderBakker · ‎01-21-2015

LAStools (LAStools | rapidlasso GmbH ) is a product which is not created by Esri (there used to be a part that could be used freely (LAStools: converting, filtering, viewing, processing, and compressing LIDAR data in LAS format ), not sure if that is still the case).

I used it in the past to convert a text file (containing X,Y Z and RGB) into a LAS file. See my short presentation here: Playing around with LiDAR, AHN2, Aerial Photography and LAStools in A…

The analysis tool that I mentioned (LAS Point Statistics as Raster) will convert LAS dataset statistics into a raster. Your result is a raster file not vector. If you use a raster resolution of 1x1 meter you are obtaining information on the range of Z values registered in each grid cell of 1x1 meter. This can be used to depict cliffs or areas with errors. It could be converted back to vector (points)

JoshuaBixby · ‎01-20-2015

If you have 13.000.000 points in total, what percentage do you think will be dropped? If you are working with upper millions or more than 10.000.000 unique points, then you are likely running into a memory error related to using 32-bit Python. Quoting from Windows Dev Center:

Virtual Address Space
....
The virtual address space for 32-bit Windows is 4 gigabytes (GB) in size and divided into two partitions: one for use by the process [2 GB] and the other reserved for use by the system [2 GB].
....
If 4-gigabyte tuning (4GT) is enabled, ... a process that has the IMAGE_FILE_LARGE_ADDRESS_AWARE flag set in its image header will have access to the additional 1 GB of memory above the low 2 GB.

Even if your operating system is 64-bit and you have lots of RAM, 32-bit applications are still bound by the 4 GB address space. If 32-bit applications are compiled as large-address aware, they can take advantage of more (3 GB for 32-bit OS or 4 GB for 64-bit OS) address space than the normal 2 GB. If you want to dive into the weeds, check out the Memory Limits for Windows and Windows Server Releases.

Although Esri has compiled the primary ArcGIS Desktop binaries (arcmap.exe, arccatalog.exe, etc...) as large-address aware, the Python executables that get installed don't seem to be large-address aware. I am not sure how the memory limits play out since ArcMap is LAA but Python.exe isn't. When running a standalone 32-bit Python script, you start to see memory errors once the process gets above 1.5 GB of address space.

One way to potentially get around this is to use 64-bit Background Geoprocessing. Installing that software installs 64-bit versions of Python and ArcPy libraries. That said, Python code from the interactive Python window will not be executed as 64-bit so your script would have to be run as a standalone script outside of ArcGIS Desktop, but at least you could still use ArcPy.

KONPETROV · ‎01-21-2015

For God. Thanks a lot for your answer Mr. Bixby that was really helpful to understand, although I have 86, 64-bit Windows and 8 GB Ram. I can use ArcPy do i need to enable 64-Bit Background? or you mean somethign diefferent.

JoshuaBixby · ‎01-22-2015

If you install 64-bit Background Geoprocessing, background processing in ArcGIS Desktop only uses the 64-bit version. Now, you can disable background processing and keep everything in-process, which would be 32-bit, but you can't do 32-bit background processing when the 64-bit product is installed.

Installing the 64-bit Background Geoprocessing won't change ad-hoc Python code in the interactive Python window, that will still run 32-bit. That said, since ArcGIS Desktop is compiled to be Large Address Aware, it seems 4 GB of address space is available on 64-bit Windows instead of the regular 2 GB.

If you wrap your code into a Python Toolbox and run it as a tool, I am not sure whether that executes in the background as 64-bit or would still be in-process and 32-bit. I know I have tinkered with this before, but my memory fails me at the moment.

For me, the most straightforward way to use 64-bit Python and ArcPy is to develop the code and execute it outside of ArcGIS Desktop, a standalone script some might say. I develop in PyCharm usually, so I just configure PyCharm to use the 64-bit Python interpreter and ArcPy instead of the 32-bit. When I install 64-bit Background Geoprocessing, it usually automatically updates the IDLE (Python GUI) shortcut in All Programs to use 64-bit instead of 32-bit.