Select to view content in your preferred language

Data Scrape and Add Features (need attachments)

1378
14
07-26-2024 10:37 AM
kapalczynski
Frequent Contributor

I have some code that I am using to Scrape a Service and based on a where clause copy specific records to another dataset.  It scrapes the service and creates json files for every 1000 records... It then reads the JSON files and uses the below to add them to the Service ..... This is working great.. but I need to modify it to move attachments as well... this is where I am confused ... 

Because I am writing the feature to JSON and then using that to add features I am not sure how to include the attachments for each of those features in the JSON file.

Any thoughts very appreciated... 

I am doing to add at the service level with 'edit_features' : 

add_result = ports_layer.edit_features(adds = featureAddingAdd)

 

# SNIP 

portal_item = gis.content.get('73xxxxxxxxxxxxxxxxxxxxxxxxx5')
ports_layer = portal_item.tables[0]

class DataScraper():
    def __init__(self):
        # URL to map service you want to extract data from
        self.service_url = s123URL
    def getServiceProperties(self, url):
        URL = url
        PARAMS = {'f' : 'json'}
        r = requests.get(url = URL, params = PARAMS)
        service_props = r.json()
        return service_props
    def getLayerIds(self, url, query=None):
        URL = url + '/query'
        print(URL)
        PARAMS = {'f':'json', 'returnIdsOnly': True, 'where' : "Imported = 'No'"}

        if query:
            PARAMS['where'] = "ST = '{}'".format(query)
        r = requests.get(url = URL, params = PARAMS)
        data = r.json()
        
        return data['objectIds']
    def getLayerDataByIds(self, url, ids):
        # ids parameter should be a list of object ids
        URL = url + '/query'
        field = 'OBJECTID'
        value = ', '.join([str(i) for i in ids])
        PARAMS = {'f': 'json', 'where': '{} IN ({})'.format(field, value), 'returnIdsOnly': False, 'returnCountOnly': False,
                  'returnGeometry': True, 'outFields': '*'}
        r = requests.post(url=URL, data=PARAMS)
        layer_data = r.json()
        return layer_data
    def chunks(self, lst, n):
        # Yield successive n-sized chunks from list
        for i in range(0, len(lst), n):
            yield lst[i:i + n]
            
def scrapeData():
    try:
        service_props = ds.getServiceProperties(ds.service_url)
        max_record_count = service_props['maxRecordCount']
        layer_ids = ds.getLayerIds(ds.service_url)
        
        id_groups = list(ds.chunks(layer_ids, max_record_count))
        
        for i, id_group in enumerate(id_groups):
            print('  group {} of {}'.format(i+1, len(id_groups)))
            layer_data = ds.getLayerDataByIds(ds.service_url, id_group)
            level = str(i)
            outjsonpath = outputVariable + level + ".json"

            layer_data_final = layer_data
            print('Writing JSON file...')
            with open(outjsonpath, 'w') as out_json_file:
                json.dump(layer_data_final, out_json_file)
                
    except Exception:
        # Handle errors accordingly...this is generic
        tb = sys.exc_info()[2]
        tb_info = traceback.format_tb(tb)[0]
        pymsg = 'PYTHON ERRORS:\n\tTraceback info:\t{tb_info}\n\tError Info:\t{str(sys.exc_info()[1])}\n'
        msgs = 'ArcPy ERRORS:\t{arcpy.GetMessages(2)}\n'
        print(pymsg)
        print(msgs)

def addAAHData():
    try:
        for x in os.listdir(path):
            if x.startswith("output"):
                filetoImport = path + x
                print("Appending: " + x)
                f = open(filetoImport)
                data = json.load(f)
                featureAddingAdd = data['features']

                add_result = ports_layer.edit_features(adds = featureAddingAdd)
                
    except Exception:
        # Handle errors accordingly...this is generic
        tb = sys.exc_info()[2]
        tb_info = traceback.format_tb(tb)[0]
        pymsg = 'PYTHON ERRORS:\n\tTraceback info:\t{tb_info}\n\tError Info:\t{str(sys.exc_info()[1])}\n'
        msgs = 'ArcPy ERRORS:\t{arcpy.GetMessages(2)}\n'
        print(pymsg)
        print(msgs)

 

0 Kudos
14 Replies
kapalczynski
Frequent Contributor

If I run with the layerQueries as below I get this error

    replica1 = aah_flc.replicas.create(replica_name = 'JaysTEST',
                                      layers='0,1',
                                      layerQueries = {"1":{"queryOption": "all"}},
                                      return_attachments=True,
                                      attachments_sync_direction="bidirectional",
                                      sync_model='none', # none, perReplica
                                      target_type='server',
                                      data_format='filegdb', 
                                      out_path=r'C:\Users\PROD\exports')

 

<FeatureLayer url:"https://vdotgisportal.vdot.virginia.gov/hosting/rest/services/Hosted/Adopt_A_Highway_UAT/FeatureServ...">

True

{
"supportsRegisteringExistingData": true,
"supportsSyncDirectionControl": true,
"supportsPerLayerSync": true,
"supportsPerReplicaSync": false,
"supportsRollbackOnFailure": false,
"supportsAsync": true,
"supportsSyncModelNone": true,
"supportsAttachmentsSyncDirection": true
}

8

Create,Editing,Uploads,Query,Update,Sync,Extract
Create,Editing,Uploads,Query,Update,Sync,Extract

-9283683.186,4375456.993,-8391917.256,4776746.944

Traceback (most recent call last):
File "C:\Users\SYNC_2.py", line 72, in <module>
out_path=r'C:\Users\PROD\exports')
TypeError: create() got an unexpected keyword argument 'layerQueries'
>>>

 

 

 

0 Kudos
kapalczynski
Frequent Contributor

The attachments are now working .... well for the Feature Layer... BUT still NOT for the Table... the table is downloaded but no records... 

I am not totally concerned with the layerQuery as I can query records out on my next step ... but WHY are their no Records coming across for the TABLE?

seems this might be an issue for 2 years now?  

https://community.esri.com/t5/arcgis-field-maps-questions/standalone-table-replica-not-populating-wi...

 

Side Note....

seems typical with ESRI documentation... . ALL documentation says 'return_Attachments' which causes error... when it should be 'return_attachments' with a LOWER CASE A... begs to question how many more errors are being caused by this .... some upper case some not no consistency... ugggg

Turns out its 'layer_queries' and not 'layerQueries' as well.... holy smokes... documentation is useless.

All Docs say true and false, but need to be True and False... wow

0 Kudos
kapalczynski
Frequent Contributor

I was able to stitch this together .... not relying on the documentation too much as its not reliable... 

ONLY thing I cannot get to work now is the layer_queries.... 

As you can see below the documentation uses queryOption and useFilter as well as a where clause... but I error out when I put those in my script... 

NOTHING works except queryOption = all --- which then negates ANY use of geometry or where clause... ugggg

*** I really need the WHERE CLAUSE because I have attachements etc and dont want to download any unnecessary data... 

 

kapalczynski_0-1722456551072.png

Trying to use something like this but nothing but ERRORS -- It will not accept queryOption parameter of 'useFilter' as seen in the documentation above... 

 

layer_queries = {'0':{'queryOption': 'all', 'includeRelated': True},
                '1':{'queryOption': 'useFilter', 'useGeometry': False, 'includeRelated': True, 'where': 'IMPORTED = Yes'}},

 

 

 

    replica1 = aah_flc.replicas.create(replica_name = 'JaysTEST',
              layers=[0,1],
              #layer_queries = {"0":{"queryOption": "all", 'includeRelated': True, "where": "IMPORTED = No" }}, 
              #layer_queries = {'0':{'queryOption': 'all', 'includeRelated': True}, '1':{'queryOption': 'useFilter', 'useGeometry': False, 'includeRelated': True, 'where': 'IMPORTED = Yes'}},
              layer_queries = {'0':{'queryOption': 'all', 'includeRelated': True}, '1':{'queryOption': 'all', 'includeRelated': True}},
              return_attachments=True,
              attachments_sync_direction="bidirectional",
              sync_model='none', # none, perReplica
              target_type='server',
              data_format='filegdb', 
              out_path=r'C:\Users\PROD\exports'
              )

 

 

Anyone have any thoughts on using a WHERE CLAUSE

0 Kudos
EarlMedina
Esri Regular Contributor

Sorry, with respect to documentation REST API documentation IS NOT 1-to-1 with ArcGIS API for Python documentation. REST API parameters follow camel case, while ArcGIS API for Python follows snake case (as is convention for Python). Notwithstanding, as the Python API is a wrapper for the REST API it is useful to refer to the REST API documentation in cases where more details are needed on a parameter's usage.

My apologies if that wasn't clear before. You do have to do a bit of translation yourself or else have access to a good IDE that will point out such problems.

 

As for you layer query, I might have missed what the problem was earlier. You don't even have to include queryOption for your case if all you need is a where clause. You can just do:

 

{"0":{"where": "IMPORTED = No"}}

 

 

 

 

 

 

0 Kudos
kapalczynski
Frequent Contributor

Thanks... Still working through this... 

As of now this is the only syntax that seems to work

{"1":{"where": "IMPORTED = 'No'"}}
0 Kudos