Data Scrape and Add Features (need attachments)

kapalczynski · ‎07-26-2024

I have some code that I am using to Scrape a Service and based on a where clause copy specific records to another dataset. It scrapes the service and creates json files for every 1000 records... It then reads the JSON files and uses the below to add them to the Service ..... This is working great.. but I need to modify it to move attachments as well... this is where I am confused ...

Because I am writing the feature to JSON and then using that to add features I am not sure how to include the attachments for each of those features in the JSON file.

Any thoughts very appreciated...

I am doing to add at the service level with 'edit_features' :

add_result = ports_layer.edit_features(adds = featureAddingAdd)

# SNIP 

portal_item = gis.content.get('73xxxxxxxxxxxxxxxxxxxxxxxxx5')
ports_layer = portal_item.tables[0]

class DataScraper():
    def __init__(self):
        # URL to map service you want to extract data from
        self.service_url = s123URL
    def getServiceProperties(self, url):
        URL = url
        PARAMS = {'f' : 'json'}
        r = requests.get(url = URL, params = PARAMS)
        service_props = r.json()
        return service_props
    def getLayerIds(self, url, query=None):
        URL = url + '/query'
        print(URL)
        PARAMS = {'f':'json', 'returnIdsOnly': True, 'where' : "Imported = 'No'"}

        if query:
            PARAMS['where'] = "ST = '{}'".format(query)
        r = requests.get(url = URL, params = PARAMS)
        data = r.json()
        
        return data['objectIds']
    def getLayerDataByIds(self, url, ids):
        # ids parameter should be a list of object ids
        URL = url + '/query'
        field = 'OBJECTID'
        value = ', '.join([str(i) for i in ids])
        PARAMS = {'f': 'json', 'where': '{} IN ({})'.format(field, value), 'returnIdsOnly': False, 'returnCountOnly': False,
                  'returnGeometry': True, 'outFields': '*'}
        r = requests.post(url=URL, data=PARAMS)
        layer_data = r.json()
        return layer_data
    def chunks(self, lst, n):
        # Yield successive n-sized chunks from list
        for i in range(0, len(lst), n):
            yield lst[i:i + n]
            
def scrapeData():
    try:
        service_props = ds.getServiceProperties(ds.service_url)
        max_record_count = service_props['maxRecordCount']
        layer_ids = ds.getLayerIds(ds.service_url)
        
        id_groups = list(ds.chunks(layer_ids, max_record_count))
        
        for i, id_group in enumerate(id_groups):
            print('  group {} of {}'.format(i+1, len(id_groups)))
            layer_data = ds.getLayerDataByIds(ds.service_url, id_group)
            level = str(i)
            outjsonpath = outputVariable + level + ".json"

            layer_data_final = layer_data
            print('Writing JSON file...')
            with open(outjsonpath, 'w') as out_json_file:
                json.dump(layer_data_final, out_json_file)
                
    except Exception:
        # Handle errors accordingly...this is generic
        tb = sys.exc_info()[2]
        tb_info = traceback.format_tb(tb)[0]
        pymsg = 'PYTHON ERRORS:\n\tTraceback info:\t{tb_info}\n\tError Info:\t{str(sys.exc_info()[1])}\n'
        msgs = 'ArcPy ERRORS:\t{arcpy.GetMessages(2)}\n'
        print(pymsg)
        print(msgs)

def addAAHData():
    try:
        for x in os.listdir(path):
            if x.startswith("output"):
                filetoImport = path + x
                print("Appending: " + x)
                f = open(filetoImport)
                data = json.load(f)
                featureAddingAdd = data['features']

                add_result = ports_layer.edit_features(adds = featureAddingAdd)
                
    except Exception:
        # Handle errors accordingly...this is generic
        tb = sys.exc_info()[2]
        tb_info = traceback.format_tb(tb)[0]
        pymsg = 'PYTHON ERRORS:\n\tTraceback info:\t{tb_info}\n\tError Info:\t{str(sys.exc_info()[1])}\n'
        msgs = 'ArcPy ERRORS:\t{arcpy.GetMessages(2)}\n'
        print(pymsg)
        print(msgs)

kapalczynski · ‎07-31-2024

If I run with the layerQueries as below I get this error

    replica1 = aah_flc.replicas.create(replica_name = 'JaysTEST',
                                      layers='0,1',
                                      layerQueries = {"1":{"queryOption": "all"}},
                                      return_attachments=True,
                                      attachments_sync_direction="bidirectional",
                                      sync_model='none', # none, perReplica
                                      target_type='server',
                                      data_format='filegdb', 
                                      out_path=r'C:\Users\PROD\exports')

True

{
"supportsRegisteringExistingData": true,
"supportsSyncDirectionControl": true,
"supportsPerLayerSync": true,
"supportsPerReplicaSync": false,
"supportsRollbackOnFailure": false,
"supportsAsync": true,
"supportsSyncModelNone": true,
"supportsAttachmentsSyncDirection": true
}

8

Create,Editing,Uploads,Query,Update,Sync,Extract
Create,Editing,Uploads,Query,Update,Sync,Extract

-9283683.186,4375456.993,-8391917.256,4776746.944

Traceback (most recent call last):
File "C:\Users\SYNC_2.py", line 72, in <module>
out_path=r'C:\Users\PROD\exports')
TypeError: create() got an unexpected keyword argument 'layerQueries'
>>>

kapalczynski · ‎07-31-2024

The attachments are now working .... well for the Feature Layer... BUT still NOT for the Table... the table is downloaded but no records...

I am not totally concerned with the layerQuery as I can query records out on my next step ... but WHY are their no Records coming across for the TABLE?

seems this might be an issue for 2 years now?

https://community.esri.com/t5/arcgis-field-maps-questions/standalone-table-replica-not-populating-wi...

Side Note....

seems typical with ESRI documentation... . ALL documentation says 'return_Attachments' which causes error... when it should be 'return_attachments' with a LOWER CASE A... begs to question how many more errors are being caused by this .... some upper case some not no consistency... ugggg

Turns out its 'layer_queries' and not 'layerQueries' as well.... holy smokes... documentation is useless.

All Docs say true and false, but need to be True and False... wow

kapalczynski · ‎07-31-2024

I was able to stitch this together .... not relying on the documentation too much as its not reliable...

ONLY thing I cannot get to work now is the layer_queries....

As you can see below the documentation uses queryOption and useFilter as well as a where clause... but I error out when I put those in my script...

NOTHING works except queryOption = all --- which then negates ANY use of geometry or where clause... ugggg

*** I really need the WHERE CLAUSE because I have attachements etc and dont want to download any unnecessary data...

Trying to use something like this but nothing but ERRORS -- It will not accept queryOption parameter of 'useFilter' as seen in the documentation above...

layer_queries = {'0':{'queryOption': 'all', 'includeRelated': True},
                '1':{'queryOption': 'useFilter', 'useGeometry': False, 'includeRelated': True, 'where': 'IMPORTED = Yes'}},

    replica1 = aah_flc.replicas.create(replica_name = 'JaysTEST',
              layers=[0,1],
              #layer_queries = {"0":{"queryOption": "all", 'includeRelated': True, "where": "IMPORTED = No" }}, 
              #layer_queries = {'0':{'queryOption': 'all', 'includeRelated': True}, '1':{'queryOption': 'useFilter', 'useGeometry': False, 'includeRelated': True, 'where': 'IMPORTED = Yes'}},
              layer_queries = {'0':{'queryOption': 'all', 'includeRelated': True}, '1':{'queryOption': 'all', 'includeRelated': True}},
              return_attachments=True,
              attachments_sync_direction="bidirectional",
              sync_model='none', # none, perReplica
              target_type='server',
              data_format='filegdb', 
              out_path=r'C:\Users\PROD\exports'
              )

Anyone have any thoughts on using a WHERE CLAUSE

EarlMedina · ‎08-01-2024

Sorry, with respect to documentation REST API documentation IS NOT 1-to-1 with ArcGIS API for Python documentation. REST API parameters follow camel case, while ArcGIS API for Python follows snake case (as is convention for Python). Notwithstanding, as the Python API is a wrapper for the REST API it is useful to refer to the REST API documentation in cases where more details are needed on a parameter's usage.

My apologies if that wasn't clear before. You do have to do a bit of translation yourself or else have access to a good IDE that will point out such problems.

As for you layer query, I might have missed what the problem was earlier. You don't even have to include queryOption for your case if all you need is a where clause. You can just do:

{"0":{"where": "IMPORTED = No"}}

kapalczynski · ‎08-02-2024

Thanks... Still working through this...

As of now this is the only syntax that seems to work

{"1":{"where": "IMPORTED = 'No'"}}