Handling "out of memory" error from python script

BrianVan_Nostrand · ‎09-20-2019

Hi all,

I'm working on writing a validation script to check the URL's of each tile of a cached map service. Basically, the REST endpoint URL of tiles is formatted as "https://<map service URL>/tile/<level>/<row>/<column>", giving you the number of rows and columns of the start and end tile of each level of detail cached for that service. Using these values, I use nested while loops to iterate through each cell in the grid composed of those rows and columns and will ultimately build a URL for and ping each tile to check that it doesn't respond with an error.

The cache for the largest scale contains 52,398,060 individual tiles and the pyscripter throws an "out of memory" error when I try to iterate through that range. The attached code shows what I'm talking about.

Is there a better way to write this or some kind of memory release I could do to make this thing work?

Thanks

BrianVan_Nostrand · ‎09-23-2019

All,

I've tried running the script in VS Code and it runs straight through without issue. It looks like Pyscripter has a memory management problem. I won't be using that anymore!

Brian

View solution in original post

JoeBorgione · ‎09-20-2019

I'd like to take a look at your code, but I have to maintain a healthy relationship with our security team. In other words, if you could post your code using the syntax highlighter, those of us who suffer from zip file paranoia would be willing to take a look....

That should just about do it....

BrianVan_Nostrand · ‎09-20-2019

from lxml.etree import tostring
from lxml import etree as et
from lxml import html
from itertools import *
from tkinter import *
import requests
import gc
Cache_Service_URL = "https://data.wsdot.wa.gov/arcgis/rest/services/Shared/WebBaseMapWebMercator/MapServer"
#http://hqolymgis38d.wsdot.loc:6080/arcgis/rest/services/ILT/ILT_Basemap_2019090/MapServer
LODsList = []
LODs = []
r = requests.get(Cache_Service_URL)
doc = html.fromstring(r.content)
#parse rest endpoint to get levels of detail and their grid definitions
# URL format: https://<map service URL>/tile/<level>/<row>/<column>
#4365270
for tag in doc.iter():
    if tag.text == "Level ID:":
        tileDefinitions = []
        parent = tag.getparent()
        children = parent.getchildren()
        for child in children:
            if "Level ID" in tostring(child):
                LODValue = tostring(child).split(" ")[2].split("&")[0]
                tileDefinitions.append(LODValue)
            elif "Start Tile" in tostring(child):
                startTile = []
                startRow = tostring(child).split(" ")[1].split("/")[9]
                startColumn = tostring(child).split(" ")[1].split("/")[10].split('"')[0]
                startTile.append("StartTile")
                startTile.append(startRow)
                startTile.append(startColumn)
                tileDefinitions.append(startTile)
            elif "End Tile" in tostring(child):
                endTile = []
                endRow = tostring(child).split(" ")[1].split("/")[9]
                endColumn = tostring(child).split(" ")[1].split("/")[10].split('"')[0]
                endTile.append("EndTile")
                endTile.append(endRow)
                endTile.append(endColumn)
                tileDefinitions.append(endTile)
            else:
                tkMessageBox.showinfo(title="ALERT", message="Unknown HTML Node: "+ tostring(child))
        tilesCount = (int(endRow)-int(startRow)+1)*(int(endColumn)-int(startColumn)+1)
        tileDefinitions.append(tilesCount)
        LODs.append(tileDefinitions)
#query each tile
#http://hqolymgis38d.wsdot.loc:6080/arcgis/rest/services/ILT/ILT_Basemap_2019090/MapServer/tile/0/42/16
totalTilesCount = []
for LOD in LODs:
    print (LOD)
for LOD in LODs:
    if LOD[0] == '10':
        LODValue = int(LOD[0])
        startTileRow = int(LOD[1][1])
        startTileColumn = int(LOD[1][2])
        endTileRow = int(LOD[2][1])
        endTileColumn = int(LOD[2][2])
        i = startTileRow
        while i < (endTileRow+1):
            j = startTileColumn
            while j < endTileColumn:
                print(str(LODValue) + " Row " + str(i) + " Column " + str(j))
                url = Cache_Service_URL+"/tile/"+str(LODValue)+"//"+str(i)+"//"+str(j)
                j+=1
                '''del url
                if int(i)%1000 == 0:
                    gc.collect()'''
            i += 1
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Ooh, I didn't know that existed. Let me know if you have any questions, it's a little messy.

DanPatterson_Retired · ‎09-20-2019

/blogs/dan_patterson/2016/08/14/script-formatting

You appear to be saving everything to memory and processing what is collected.

At what size (tile-wise) does it breakdown? and what memory do you have (assuming you are running anything else)

BrianVan_Nostrand · ‎09-20-2019

It's strange, I'm not seeing memory usage spike in task manager or anything. Where am I saving everything to memory? I tried to not do that. The only place I can think that I am creating anything that's saved is at the line where the url variable is defined and I have been using del url to (I thought) get rid of that shortly thereafter in each loop. I've never had to think about memory usage before so this is a new one for me . The breakdown happens after about 20 million tiles are processed successfully. I have 16 gigs of ram in this machine.

JoshuaBixby · ‎09-21-2019

I see where you are building the URL, on Line #64, but what code are you using to test/check the URL?

BrianVan_Nostrand · ‎09-23-2019

I'm not yet. The URL will be fed into a requests.get and if an error is returned, it will be written to a CSV file to be used to identify missing tiles. Pyscripter is running out of memory just iterating through the stack of tiles, without any http calls being made.

BrianVan_Nostrand · ‎09-23-2019

All,

I've tried running the script in VS Code and it runs straight through without issue. It looks like Pyscripter has a memory management problem. I won't be using that anymore!

Brian

JoeBorgione · ‎09-23-2019

Give spyder a try....

That should just about do it....