Spatial View so slow

HaniuHokkaido · ‎07-05-2013

Dear experts,

I have a spatial view with 25 columns in the database server that looks like this:

SELECT a.OBJECTID, a.Shape, a.KEYID, a.ROTATION, a.LAND_COVER, a.LAND_USE, a.COMPANY, b.Species, b.SpeciesGroup, c.Tipe
FROM     sde.LANDUSE AS a LEFT OUTER JOIN
               gdb.dbo.StandReg AS b ON a.KEYID = b.KEYID LEFT OUTER JOIN
                   (SELECT Dist, created_by, updated_by
                    FROM     gdb.dbo.Petak_PSDS
                    WHERE (Dist = 'District 1')) AS c ON a.KEYID = c.cKeyId

(the query has been truncated, not all columns are mentioned)

This View comprised of 3 tables. The view has 7600 data in it. I open it in my PC using arcmap / arccatalog and it takes around 4 minutes to load up. I have used analyze and update_dbms_stat and its still the same. Also there are no errors spatially.
How can i make it load faster ?

BTW, is view replicate-able using One-Way replication ?

Thanks

[arcsde 10.0, sql server 2005]

VinceAngelo · ‎07-05-2013

Since you're using SQL-Server 2005, the view has five tables. I doubt there's
any way, beyond making sure all the indexes are present, to improve that query.
It might be much faster with SQL-Server 2008 and a native geometry type, but
that's probably something to take up with Microsoft (if they're even still supporting
2005 at this late date).

- V

BTW: If you do upgrade to a modern database, be sure sure to follow best
practice, and load the data tables using some owner other than SDE -- the
SDE user should be reserved for instance administration.

MarcoBoeringa · ‎07-05-2013

Dear experts,

I have a spatial view with 25 columns in the database server that looks like this:
...
BTW, is view replicate-able using One-Way replication ?
[arcsde 10.0, sql server 2005]

A view is a view, that is, it doesn't contain data by itself, only references existing data in other tables. If you replicate the Feature Classes and Tables used in your view, and copy the view definition to the child geodatabase, you should have the most up-to-date "view" of the database after a synchronization.

There are some specific options for / things to take notice of with Relationship Classes and replication, see the Help for that:

Replicating related data
Synchronizing with filters and related data

KeithAdams · ‎07-15-2013

Hi-

We are finding storing the shape data in SQL Server GEOMETRY format has a HUGE negative impact on performance. (I just replied to your thread about that migration process.) If you are doing that, that would be one place to look. Did the views exist prior to your conversion to GEOMETRY? If so, how was the performance then?

For a polygon feature class that has about 25 attribute columns, we are getting data copy rates of only 7 to 9 features a second whether we use Feature Class to Feature Class, Copy Features, or Append. The polygons are based on 1:100,000 scale data at the county-level size (average of about 2,100 vertices / feature) to give you a sense of the level of detail.

It does seem to us, at least anecdotally, that virtually ALL database operations are slower at 10.1 than they were at 10.0 We sit and sit and sit just waiting for ArcCatalog to connect to the geodatabase before we try to do anything else. It also seemed, again only at the anecdotal evidence level, as if spatial views actually drew faster than the underlying spatial table (again, using tables that store feature data in SQL Server GEOMETRY as opposed to SDE_BINARY format).

A couple of other threads I have read suggest that there is a patch for SQL Server 2012, but we're still only at 2008R2. They also say that the "solution" is to go back to SDE_BINARY instead of using GEOMETRY, but that creates other problems for us (our spatial views inexplicably stopped working correctly, which was why we started down the GEOMETRY road in the first place) so it's not a viable alternative.

Of course there is the basic issue of whether you have the business key data on which you are doing the SQL joins indexed properly- that can have an enormous impact on performance, particularly when you have thousands or tens of thousands of featured involved.

MarcoBoeringa · ‎07-15-2013

For a polygon feature class that has about 25 attribute columns, we are getting data copy rates of only 7 to 9 features a second whether we use Feature Class to Feature Class, Copy Features, or Append. The polygons are based on 1:100,000 scale data at the county-level size (average of about 2,100 vertices / feature) to give you a sense of the level of detail.

...

Of course there is the basic issue of whether you have the business key data on which you are doing the SQL joins indexed properly- that can have an enormous impact on performance, particularly when you have thousands or tens of thousands of featured involved.

So you're saying you have 10K+ features comprising an average of 2100 vertices per feature at scale 1:100.000??? :eek:

Is this data stream-digitized from imagery, with never a proper weeding out of unnecessary vertices? I can't imagine any polygon / line needing up to 2100 individual vertices to define its form at scale 1:100.000. Even at scale 1:1000, it would be highly undesirable to have such huge amounts of vertices per feature...

For a highly detailed and high quality 1:1000 base map here in the Netherlands of the national highway system, where I was involved in the re-design of the photogrammetry based workflow and database migration some ten years ago, a test dataset had an average of about 27 vertices per feature..., minimum 4, maximum 1026, but that was a rare exception.

Writing to an old, but dedicated and further unused Sun Sparc Ultra-2 single 100 MHz processor Unix server, and storing in SDE_BINARY, resulted in data load speeds of about 120 features / second, so about 27*120=3240 vertices / second. (Oracle 😎

Again, we're talking >10 years ago here...

Your data loads at 14700 to 18900 vertices per second. I leave it to others to comment if that is a normal speed right now with the configuration details you posted... but you really may need to reconsider your workflow for collecting and storing this data...

VinceAngelo · ‎07-15-2013

Vertices per second is a very misleading metric. I wouldn't trust it for much.

Databases do work in transactions. Each ROW is processed in total, which includes
LOBs and strings. A one-vertex GeoNames point could have 8-10k of data in a single row,
while a dense 2k vertex polygon without attributes could easily use less. It is critical to make
sure all comparisons are for roughly equal row contents, otherwise the difference in real work
(I/O) processing each row will warp your expectations.

There are many other factors at play during loading as well -- disk contention in file groups,
number an complexity of indexes, the number of fileds with data in them, high-avaliability
synchronization overhead, replication, other databases in the same server competing for
compute and IO cycles... The list is endless. Rather than comparing a system against another
one, I always compare it against itself.

- V

MarcoBoeringa · ‎07-16-2013

Vertices per second is a very misleading metric. I wouldn't trust it for much.

Databases do work in transactions. Each ROW is processed in total, which includes
LOBs and strings. A one-vertex GeoNames point could have 8-10k of data in a single row,
while a dense 2k vertex polygon without attributes could easily use less. It is critical to make
sure all comparisons are for roughly equal row contents, otherwise the difference in real work
(I/O) processing each row will warp your expectations.- V

I fully realized that, but rows / features by itself is also a bad metric, as your results show:

1-vertex point features load over 15x faster per feature than huge 2100-vertex pseudo random circles...

Not taking into account the rare possibility of users storing whole images, Office documents or whatever in LOBs in a row of a geographic feature, that is a significant difference not to be ignored either...

Saying you have bad loading speeds of just a few features per second, while omitting the fact that each polygon has thousands of vertices, is therefore also not a very consistent question.

Like you say, it is mixture of a whole bunch of factors at play, including feature count and size of features in terms of number of vertices.

I really think Keith still needs to answer the question why they have average 2100 vertex polygons in their datasets... Unless this data is from extensive boundaries of big counties, and actually represents 1:1000 or 1:5000 detail in reality, and in practice must be combined in geographic overlays and other geoprocessing etc. with similar high quality, large scale data, I see no reason to maintain such huge features at scale 1:100.000. The data could be generalized.

I don't think having 2100 vertex polygons by itself is impossible or bad practice, there may be good reasons to maintain it like that, but it sure isn't standard, especially not in the context of datasets having "millions" of polygons.

I just ran some statistics again against a sample of the large scale photogrammetry based datasets I wrote about in the other thread. This is well maintained real world data:

Polygons: 18968

Vertex count
Total: 535334
Average: 28.22
Min: 4
Max: 2106

If you look at the graph below, most of the polygons have <100 vertices (vertical "Frequency"). The other image shows an example of the data, including some extra line layers, to give some impression of the level of detail.

[ATTACH=CONFIG]25957[/ATTACH][ATTACH=CONFIG]25956[/ATTACH]

MarcoBoeringa · ‎07-16-2013

Just for the sake of it, I decided to do a little test. I densified the dataset I had by using the Densify tool to create a dataset with a vertex every 0.5 meter. I guessed this would get me close to Keith's dataset, but I still lag behind with an average 540 vertices per feature:

Results for densified layer

Polygons: 18968

Vertex count
Total: 10247488
Average: 540.25
Min: 4
Max: 18956

I will also copy the original dataset, which had the properties as listed in the previous post:

Original dataset

Polygons: 18968

Vertex count
Total: 535334
Average: 28.22
Min: 4
Max: 2106

Once the densification process itself was finished, I used ArcGIS 10.1 and the Copy Features tool to create a new Feature Class from the existing densified one, curious to know how much time that would take, and compare this with the writing of the data during the densification step, and the copying of the original dataset:

Copying of the original non-densified dataset:
18968 polygons densified in 18 sec, so 18968 / 18 = 1035.8 features written per second, and 535334 / 18 = 29741 vertices written per second.

Results for Densification step:
18968 polygons densified in 9 min 2 sec, so 18968 / 542 = 35.0 features written per second, and 10247488 / 542 = 18906 vertices written per second.

Results for Copy Features step of densified dataset:
18968 polygons copied in 1 min 54 sec, so 18968 / 114 = 166.4 features written per second, and 10247488 / 114 = 89890 vertices written per second.

The test configuration was:
Acer Veriton desktop:
- Core i5 2320 3.00GHz quad core processor
- 6 GB RAM (never used completely during the process)
- 1TB harddrive capable of up to 180 MB/s, connected on SATAII, but never or rarely exceeded read/write IO of over 10 MB/s during the process
- ArcGIS Desktop 10.1 SP1
- SQL Server Express 2012 SP1
- Dataset using SQL Server GEOMETRY storage
- No versioning on the datasets written!

"CONCLUSIONS" (take with a pinch of salt ;)):
This is definitely not server hardware, but it isn't a low spec laptop either, nor is it a shared server bogged down by other processes running on it... I am surprised to see how close the Densification step is to Keith's results:

- Keith had 7-9 "2100-vertex" polygons written per second, with 14700 to 18900 vertices per second.
- The Densification step is 35.0 "540-vertex" features written per second, and 18906 vertices per second
- The Copy Features step is 166.4 "540-vertex" features written per second, and 89890 vertices per second
- Original dataset copy is 1035.8 "28-vertex" features written per second, and 29741 vertices per second
- Vince achieved 113 "2100-vertex" pseudo random circles written per second, and +/- 240K vertices per second

Of course, the simple "Copy Features" step outperformed the Densification by a factor 4-5, but it still shows there is a real penalty in writing big polygons with many vertices, especially if additional computational or IO steps are involved, and it may actually be that Keith's hardware is simply running at its max when importing this data...

Vince's results on his laptop were even better than the Copy Features results on my desktop writing his pseudo random "2100-vertex" circles (in terms of vertices, not in terms of records / features), but that was using the se_toolkit. It may be that the overhead of ArcGIS is the problem here, and in addition my real world dataset contains a couple of extra non-system text fields with data in it, needing to be written too.

PS:
1) Writing to SDEBINARY using the Copy Features tool gave the following results:
18968 polygons copied in 1 min 18 sec, so 18968 / 78 = 243.2 features written per second, and 10247488 / 78 = +/- 131K vertices written per second.

2) I now also attempted writing to GEOMETRY using Copy Features, to copy to a Feature Dataset using another Coordinate System and even another Datum, thus forcing re-projection.
Results are:
18968 polygons copied in 3 min 51 sec, so 18968 / 231 = 82.1 features written per second, and 10247488 / 231 = 44361 vertices written per second.
As you can see, the re-projection slows the copying down by about 1/2 of the original speed compared to copying without re-projection (166.4 features / 89890 vertices per second). This again makes it likely the figures Keith supplied may actually not be that abnormal... although still on the low side of things.

KeithAdams · ‎07-16-2013

Hi-

Perhaps what I was doing was exposing my naivete and relative inexperience when dealing with spatial data in "hard core" fashion. Here's how I got my numbers:

I queried SDE_LAYERS to get the layer_id for the feature class I wanted to examine, then I simply queried the corresponding f table to take the average of numofpts column. That is the number I reported as being the average number of vertices for a feature in that class. Did I misunderstand the content of the f table?

I have the following stats for the layer in question:

366k features
numofpts: avg 2127, min 13, max 144,969 (!!!), standard deviation about 5086, median 1023.

The feature with 145k vertices is Monroe County, FL.

All of our data are vector, no raster.

-= Keith Adams =-

For what it's worth, this is a feature class built by use a merge and dissolve process. In addition to the business data, the three feature class inputs to the merge are what were represented to me as 1:100,000 scale county, census tract, and county subdivision boundaries.

We run this process every day, generating something on the order of +/- 4,150 final perimeters and append the results to a growing "history" table that we can use for trending and historical change analysis. (I don't want to go further down the road of the hows, whys, and possible alternate strategies for doing this right now- that's neither her nor there in the context of this discussion because at the end of the day the 90+ days of historical data I have in the feature class are what has to get transferred and converted. I can thin the data some by archiving some of the days of data, and am in the process of doing so, but even so there's a big gob that has to get transferred. )

KeithAdams · ‎07-16-2013

So you're saying you have 10K+ features comprising an average of 2100 vertices per feature at scale 1:100.000??? :eek:

Is this data stream-digitized from imagery, with never a proper weeding out of unnecessary vertices? I can't imagine any polygon / line needing up to 2100 individual vertices to define its form at scale 1:100.000. Even at scale 1:1000, it would be highly undesirable to have such huge amounts of vertices per feature...

For a highly detailed and high quality 1:1000 base map here in the Netherlands of the national highway system, where I was involved in the re-design of the photogrammetry based workflow and database migration some ten years ago, a test dataset had an average of about 27 vertices per feature..., minimum 4, maximum 1026, but that was a rare exception.

Writing to an old, but dedicated and further unused Sun Sparc Ultra-2 single 100 MHz processor Unix server, and storing in SDE_BINARY, resulted in data load speeds of about 120 features / second, so about 27*120=3240 vertices / second. (Oracle 😎

Again, we're talking >10 years ago here...

Your data loads at 14700 to 18900 vertices per second. I leave it to others to comment if that is a normal speed right now with the configuration details you posted... but you really may need to reconsider your workflow for collecting and storing this data...

Hi-

This is initially a project to carry out a one-time data conversion / migration exercise of existing data. What happens after that remains to be seen.

We have built the data over the years, and rebuild / reload big chunks of it every night. The data are not versioned. We are in the early planning and analysis stages of redesigning the data management process, but for now that is beside the point.

At this point, the data are what they are, and the sole issue is how to get them from the old beat-up DBs that have gone through three version upgrades of both the DBMS and ArcGIS into brand shiny fresh new ones. At the end of the day, the new and old need to be indistinguishable from one another except that the new will be in GEOMETRY where the old was SDE_BINARY.

The databases are on virtual servers. The destination server is configured as two Intel E5-2260 @ 2.2 GHz CPUs, Windows Server 2008 Standard Edition, 64 bit, 16 GB RAM.

Incidentally, I checked and verified that the county, census tract, and county subdivision boundary data from which we are building the feature class came from Esri and the Census Bureau. We're just consuming them without having compiled them.

They get used for spatial analyses, so they need to be at as fine a level of detail as we can manage.

Thanks-

-= Keith Adams =-