Looking for help to optimize a SQL query in ORACLE. What this does is
Take two polygon layers, overlay/intersect them, and where they intersect, summarize the attributes of the overlay to a aggregate list if the area of intersection is more than 15% of the source poly.
SELECT
hashid,
--this is an aggregate function that sorts and groups overlayed floodzones by parcel
LISTAGG (TO_CHAR (fld_zone), '','') WITHIN GROUP (ORDER BY fld_zone) FLOODZONE
FROM MCGIS.FEMA_2014_FLOOD_ZONES e, parcel.parcels_mv p
WHERE
--get matching polygons
st_intersects (p.shape, e.shape) = 1
AND
--use polygon if more than 15% overlay
(st_area (st_intersection (p.shape, e.shape))) / st_area (p.shape) >.15
--summarize by hashid
GROUP BY hashid
It works perfectly, but takes like 3 hours. i have 30 of these to do every night. Anything i can do to speed it up? Both layers have spatial indexes, and there are attribute indexes on both HASHID and FLD_ZONE fields.
This produces results like
HASHID FLOODZONE
124345 AE,X
124346 X
124347 AE,B,X
instead of
HASHID FLOODZONE
124345 AE
124345 X
124346 X
124347 AE
124347 B
124347 X
My guess on the time would be related to the number of spatial functions that have to run to calculate what is needed.
How many FLOODZONE and PARCEL polys are we talking about?
As for the end result, what if you remove the "GROUP BY hashid" would it give you the output format that you are looking for?
Just as a comparison, have you created a Python script to do the same process and see how long that takes to run?
Thanks for the reply
Its a county. 230k parcels. 7832 floodzones.
The group by parcelid is necessary for LISTAGG to group the results
I have not done in python as I am unaware of an equivalent python capability
I helped develop a tool that did something like the FEMA Changes Since Last Firm (at a previous job) and it would take at least 45 min - 1 hour to run and we did not have that many parcels and/or the calculation of the 15% overlap.
Maybe someone has a trick up their sleeve.
Is there a reason that you have to run it every night?
Can you remove the unshaded X areas and only use the SFHA and X-shaded (0.2%)?
Take two polygon layers, overlay/intersect them, and where they intersect, summarize the attributes of the overlay to a aggregate list if the area of intersection is more than 15% of the source poly.
You have already laid out the overall geoprocessing steps, you just need to script them out. The most expensive step will be the intersection at the start, knowing how long it takes to Intersect—Help | ArcGIS Desktop the two layers will tell you whether it is worth pursuing scripting it out vs using SQL. So, how long does it take?
This is just one 30 overlays we need to do. Parcels update nightly from the PAO, so we are trying to update the overlays. Right now we do those once a month, but new parcels do not pick up the overlays until that runs, so we were trying to do it in sync with the parcel updates.
I thought that might be the case. That is a lot of processing to run.
What does the Oracle DB resources look like when the process is running?
Where is the process kicked off from?
Do you put the data into a new feature class?
I have tried to write as little as possible so its actually just a two column table, no shape.
ORACLE is pinned. Linux box is smokin'
Process currently kicked off from TOAD.
That could be part of the reason for the time it takes to complete. Especially if the Linux box is smokin'
Any way to get more resources to that box or move the data to another box that has more resources available, long shot I know?