|
POST
|
Ah,... yes. Well, this is more a religious issue than a technical one. I am not among those who value precision because it is possible, rather than needed. I do see significant performance benefits from my somewhat heretical techniques, though not as much in native RDBMS geometry storage, because non-Esri geometry' doesn't see the same storage reduction found in file geodatabase or ST_GEOMETRY. If your data is only accurate to 1-inch, then 1/0.01' or 1/0.005' could be reasonable XY scalefactors, The general tools don't make it easy to be consistent in non-default coordinate references, but I've found that using a "Transfer" FeatureDataset with the desired coordref constants as an intermediary for creation, then immediately dragging the newly created FCs out of that FDS to be effective. I usually script my creation utilities, which allows for more flexibility in SpatialReference definition and application, but the FDS technique works quite well for ad-hoc geoprocessing outputs. - V
... View more
Monday
|
1
|
0
|
31
|
|
POST
|
Okay, so it's a bit more complicated to drive from the joiner table... Because some of the tables have compound keys, I have to isolate only changed rows by joining the joiner back to the table in the subquery, returning table objectids: WHERE objectid in (
SELECT t.objectid
FROM schema.table_j1 j
JOIN schema.table_t1 t USING (keycol1,...,keycolN)
) And if it's versioned, then you have to join to the versioned view: WHERE objectid in (
SELECT t.objectid
FROM schema.table_j1 j
JOIN schema.table_t1_evw t USING (keycol1,...,keycolN)
) - V
... View more
2 weeks ago
|
1
|
0
|
293
|
|
POST
|
I've been using PostgreSQL for so long I've forgotten how to spell SQL Server. My joiner tables are regular tables with the same base name as the primary table (business table "source.table_name_t1", digest table "source.table_name_h1", joiner table "source.table_name_j1"). They all share the primary key column(s), but t1 has the row values, h1 has the row digest, and j1 has the u_or_d column. I name the oid columns differently so the joined result is unambiguous. INSERTs are executed immediately by an arcpy.da.InsertCursor. There really isn't likely to be much performance gain by writing to an extra table, then appending from that, though I suppose it depends on the number of indexes in the primary table. I have a further complication of needing to do this to a replica parent geodatabase, and using "INSERT INTO ... SELECT ..." with versioning would add unnecessary complexity. Performance-wise, the query from a source DBMS is the long pole in my tent, and SHA-1 hashing is somewhat expensive, especially when less than 1% of each table is modified, so there's large input I/O and compute costs, actually writing new rows is somewhat rare. Generally speaking, I don't permit empty strings to kludge around NOT NULL constraints, but I'm not always able to enforce that. For purposes of hashing, I'd recommend treating an empty field in a CSV as equivalent to a NULL. I have a funky table or two where I have to backtrack to enforce NOT NULL on strings by replacing None with '' before INSERT, but doing it that way makes me feel dirty. FWIW, I make extensive use of function dictionaries, so my encoder functions are placed in a dictionary by target datatype, and then they can massage oddities without harming the data stream. The code below handles empty string and NULL differently, (a single 0 byte vs a single ETX control character byte) def pack_str(v,n):
if (v == None): return(b'\x00')
ba = bytearray()
if (type(v) == datetime.datetime):
if VERBOSE_ENABLED:
print(" {:8d}: Expected str, got {:s}!".format(n,str(type)))
v = v.strftime(isoFmt)
elif (type(v) != str):
if VERBOSE_ENABLED:
print(" {:8d}: Expected str, got {:s}!".format(n,str(type)))
v = str(v)
vt = v.rstrip(' ').encode('UTF-8')
ba.extend(vt)
ba.extend(b'\x03') # ASCII End-Of-Text (ETX)
return ba
def pack_i2(v,n):
if (v == None): return(b'\x00')
return struct.pack('h',v)
def pack_i4(v,n):
if (v == None): return(b'\x00')
return struct.pack('i',v)
...
packer_by_type = { # Lookup via native 'udt' value
'bool' : pack_i2,
'bpchar' : pack_str,
'bytea' : None, # BLOBs manifest as memoryview (unsed)
'char' : pack_str,
'date' : pack_date,
'float4' : pack_f8,
'float8' : pack_f8,
'geography' : pack_geom,
'geometry' : pack_geom,
'int2' : pack_i2,
'int4' : pack_i4,
'int8' : pack_i8,
'numeric' : pack_f8,
'timestamp' : pack_date,
'text' : pack_str,
'uuid' : pack_str,
'varchar' : pack_str,
}
# Note: When object is instantiated, the packing function is added
# to a "packers" list
...
def getHash(self,qRow):
self.rowCount += 1
ba1 = bytearray()
for i,packer in enumerate(self.packers,start=self.keyCount):
ba1 += packer(qRow[i],self.rowCount)
return hashlib.new('sha1',ba1,usedforsecurity=False).hexdigest()
... View more
2 weeks ago
|
1
|
0
|
375
|
|
POST
|
I've done this several times now, once with hundreds of thousands of "changed" records twice daily (where only 10% are actually new), and once with 28 million records in each delivery (full dataset). What I settled on was storing SHA-1 hashes of the data rows in a parallel table in the database. The loader walks the tables, loading all the digest records first, then hashing the input stream with the same algorithm. If the key column(s) are not present in the dictionary of hashes, it's an INSERT; if the key exists, and the SHA-1 the same, that's a No-Op, and I delete the key from the dictionary; if the key exists, but the SHA-1 is different, I tag it as an update, then also delete the dictionary key. Once I've processed all rows, any keys that are left are my delete candidates. Initially, I just walked the UPDATE and DELETE lists, but that sometimes made for hundreds of thousands of UpdateCursor calls, and running a full scan on 20+ million rows to change 200K wasn't cost-effective. So I tweaked the initial solution to write records to a "joiner" table, with the key and 'U' or 'D', then ran one UpdateCursor on the table, with a "WHERE keycol in (SELECT keycol FROM joiner_tab)" query. Then I could join the joiner to the primary table and the digest table to manifest the changes in two DA cursor calls (one for each table). Using this technique I preserve the map service integrity during modifications (one COMMIT per table), without the outage caused by a TRUNCATE/re-load. Yeah, there might be occasional digest collisions that produce a false positive, but with similar data, collisions are exceedingly rare. This methodology is also highly reliable -- One of these systems has been running with zero maintenance (beyond database and ArcGIS Pro upgrades) for over five years. In theory, the API for Python could do this through the Portal, but in practice, using an UpdateCursor in ArcPy is literally a thousand times faster (680ms v. 12 minutes for 39K rows) than having to process each row as a series of XML web requests through the Portal and on to the AGS host, then finally to the database. I did this all with plain vanilla ArcPy. The only tricky part is dealing with values coming in as database rows and as ASCII text, since type preservation can be a challenge. I've also created parallel tables to insert all rows into, letting the database manage the conditional updates as a trigger, but that's better suited to a fixed number of records (states, countries, counties, zip codes,...). Good luck! - V
... View more
2 weeks ago
|
2
|
4
|
436
|
|
POST
|
I'm not a big fan of coordsys offsets with large numbers of significant digits. 0.003281 would be an *awful* precision to use. I would choose 0.001 or 0.002, since the conversion to integer is "times 1000" or "times 500" (and not "times 304.78512648582749161840902163974". Precision to a thousandth of a foot is gratuitous overkill for all but precision amoeba farming, but it's a nice round number. I try to avoid the default values, because they were chosen to maximize precision, at the cost of compressibility, and I'd rather transfer a BLOB with 278 bytes than one with 40K (your mileage may vary, but you should take your most complex geometries and try them at different scalefactors, to determine what the exact compression loss might be). - V
... View more
3 weeks ago
|
2
|
0
|
460
|
|
POST
|
Copy/delete/rename is far from the most efficient methodology here. I have tables with tens of millions of rows in them; doing this would cost me hours and could result in tables with columns slightly different from the datatypes I originally declared them to be. I pretty much avoid Copy for this reason. - V
... View more
4 weeks ago
|
0
|
0
|
507
|
|
POST
|
Brian -- There are potential issues to managing records in a registered table in this manner. The easiest solution is to just leave the feature class unregistered, and access it as a Query Layer. But make sure you recluster the table, since doing this sort of "load, then update geometry" process thoroughly fragments the table. Truncate and reload is not a very efficient process, and leaves any service looking at the table during that transition somewhat bereft. I prefer to do change detection, and only INSERT new records, UPDATE changed records, and DELETE removed ones. I'm doing this daily with tens of millions of rows and hundreds of thousands of potential changes, in just minutes over the time it takes to read the new records. - V
... View more
a month ago
|
0
|
1
|
344
|
|
POST
|
Alicia -- The RDBMS (or file geodatabase handler) is not aware of feature datasets, so even though you see it "in" a feature dataset, it isn't really, and you can't have duplicate-named feature classes, so you can't do a straight copy. I just drag the feature class out of the feature dataset to be standalone, then drag it back in to a different feature dataset, but I just tested, and it's possible to just drag from dataset1 to dataset2 without copy. So long as the Spatial Reference is the same, it's no big deal. If the spatial references are different, then you have to rename to "_old" then use FeatureClassToFeatureClass to change the SpatialReference. But you might want to re-evaluate whether you need to have feature classes in a feature dataset at all. If they're not being used for "feature-dataset-ish" tasks like topologies, then having more standalone feature classes would reduce the overhead of locking all the feature dataset members whenever a lock is placed on any feature class in the dataset. - V
... View more
a month ago
|
0
|
2
|
610
|
|
POST
|
Barbara -- There's probably a two-billion-ish limit on the number of concurrent states, but if you were to attempt it, your server would likely fail from old age before it completed. Keep in mind that a "version" is a named reference to a state in the state tree, and that you're unlikely to have 2^31-1 versions, because each edit is usually its own state, and there's usually more than one edit in any edit session. I doubt you'd be so lucky that some posts would succeed in such a situation, since it's a multi-pass process. If you want to find effective limits, I suggest you conduct a benchmark of sorts, with multiple edits per version, with increments of 500 versions at a time. You should be prepared for this to take an inordinate amount of time, and for it to eventually corrupt your instance, so I wouldn't suggest you do this on a production system. - Vince
... View more
04-15-2026
02:17 PM
|
0
|
0
|
465
|
|
POST
|
Paige - Cool. But don't forget to check if there are other, multi-part types in the table: SELECT ST_GeometryType(geomcol), count(*) as nfeats
FROM owner.geomtable
GROUP BY ST_GeometryType(geomcol)
ORDER BY 2 DESC Note that the database name isn't required, because PostgreSQL doesn't support cross-database queries. - V
... View more
04-01-2026
07:34 PM
|
0
|
0
|
800
|
|
POST
|
You need to put a WHERE constraint on the QueryLayer definition to restrict by either GeometryType(geomcol) or ST_GeometryType(geomcol), then specify the desired type in the layer definition Those two functions return different values (e.g. "POINT" and "ST_POINT"), and you may need to use an IN operator, because multi-part linestring and multi-part polygon geometeries are represented as different types in PostGIS. You may want to use ST_Dimension(geomcol) instead, since that just sorts POINT, LINESTRING, and POLYGON data into 0, 1, or 2 (but be careful, because MULTIPOINT also returns 0, and it's a different type as well, though not as common). In a perfect world, the various geometry types wouldn't be smushed together like that (it's an antipattern, but some folks love to go down that route). You'd get better performance on large tables (>100k features) if the dimensionality were also an attribute property capable of being indexed (e.g. 'WALL', 'WELL','PATIO') though I suppose a covering index on one of the differentiator functions could be effective (but only if you can get the data owners to construct that index). - V
... View more
03-27-2026
12:04 PM
|
2
|
2
|
944
|
|
POST
|
I've got a couple of read-only RDS replicas exposing PostGIS geodata via Query Layers. Works great. Selection sets would likely be a problem from an EGDB, since the database is read-only. EGDB replication would likely be better than an RDBMS replica for true geodatabase functionality. - V
... View more
02-25-2026
07:30 PM
|
1
|
0
|
418
|
|
POST
|
According to Stack Exchange, you want Latin1_General_100_..._UTF8 - V
... View more
11-12-2025
02:23 PM
|
0
|
0
|
1693
|
|
POST
|
No, latin1 is NOT UTF-8, it's latin1. You need to use a UTF-8 character set. - V
... View more
11-12-2025
12:48 PM
|
0
|
0
|
1697
|
| Title | Kudos | Posted |
|---|---|---|
| 1 | Monday | |
| 1 | 2 weeks ago | |
| 1 | 2 weeks ago | |
| 2 | 2 weeks ago | |
| 2 | 3 weeks ago |