Downloading Feature Layer Attachments via the ArcGIS API for Python

39470
81
08-14-2017 08:15 AM

Downloading Feature Layer Attachments via the ArcGIS API for Python

I have commonly encountered questions regarding the downloading of attachments (e.g. pictures and documents) from ArcGIS Online Feature Layers. The aim of this is to provide an alternative solution to what's currently on offer.

Firstly, the only way attachments are included when exporting data via ArcGIS Online is when exporting as a File Geodatabase. The reason for this is the File Geodatabase is the only format that supports related records. Therefore, the most common answer to accessing Feature Layer attachments outside of ArcGIS Online is to export the Feature Layer as a File Geodatabase, and extrude the data using a script like this through ArcGIS Desktop. There is also a sync script available which can sync a local File Geodatabase with a hosted Feature Layer. These work well, but they do rely on ArcGIS Desktop, and in the case of option one, also require you to download the File Geodatabase (which can grow in size as you add data to it) in advance of running the script.

For this reason, I created a script using the new ArcGIS API for Python which works independently from ArcGIS Desktop. In fact it works directly with a Web GIS, so you don't even have to download a File Geodatabase in order to access your attachments. It can be re-run on a regular basis, and only downloads new attachments to disk if they have not been previously downloaded.

So what does this script do?

  1. Creates a folder in which feature attachments are stored (e.g. Attachment Downloads)
  2. Within this folder creates a sub-folder for each layer in the specified Feature Layer
    1. If the AttachmentStorage variable is set to 'GroupedFolder', attachments are stored in the format ObjectId-AttachmentId-OriginalFileName in one single folder
    2. If the AttachmentStorage variable is set to 'IndividualFolder', a new folder is created for every feature with an attachment (named using the features Object ID), while the attachments are stored within in the format AttachmentId-OriginalFileName
  3. If the attachment already exists on disk it will not be downloaded again
  4. Summary of downloads (total number of downloads and total attachments size) provided as final console output

How do you get setup?

  1. Follow the ArcGIS API for Python Install and set up guide
    1. If using Anaconda, you can use the .yml file to create appropriate environment
  2. If using Startup.bat file, edit (in a text editor) to reference appropriate folders
  3. Once you get a Jupyter Notebook open, paste the attached .zip file below in an accessible location (and unzip it)
  4. Open the DownloadAttachments.ipynb file via Jupyter Notebooks and run (click into code and press Crtl+Enter)
  5. The script will run on a the specified public Feature Layer and download the attachments to the specified Downloads folder
  6. An associated log file is created in the Logging folder every time you run the script

Notes:

  • Update the FeatureLayerId variable to run on your own Feature Layer
  • Specify your PortalUserName, PortalPassword if the Feature Layer is secured
  • The names of folders are filtered to show only 0-9 and a-Z characters
  • This script has not been tested against ArcGIS Enterprise (only Feature Layers hosted in ArcGIS Online), but it should still work
  • Code also available through the Esri developer-support GitHub repository
Attachments
Comments

very cool! i'd love to see this land in https://github.com/Esri/developer-support

Hi John - more than happy to push this into a GitHub repo! I'll PM you to progress.

Script has now been merged into the Esri developer-support GitHub repository!

LOVE this script, it has huge potential to streamline my workflow. One question though- how would I need to modify the script to change the naming convention of individual attachments to include a value from a field rather than the objectID? 

In my case, each feature has a SerialNumber field, and I'd love the exported attachments to be named (SerialNumber, AttachmentName) rather than (attachmentId, attachmentName)

Hi Josh,

Glad you find the script useful! The names of the files are set on lines 117:

fileName = '{}-{}'.format(attachmentId, attachmentName)

and 131 (depending on whether you are using individual or grouped folders):

fileName = '{}-{}-{}'.format(currentObjectId, attachmentId, attachmentName)

However in order to include attribution in the file name the code would need a decent amount of rejigging. The reason for this is that only Object ID's are returned from the Feature Layer as in link 92. A subsequent query would need to be made to get the attribution.

featureObjectIds = featureLayer.query(where='1=1', return_ids_only=True)

The reason it's written this way is to ensure the script can deal with a large dataset. It's a good suggestion so I'll consider enhancing the script if I get a chance. But it's unlikely to happen anytime soon. Please feel free to contribute to the script in GitHub if you manage to figure it out!

Mikie

This script is very helpful.  Is there a similar example for adding attachments to a hosted feature layer?

Hi Robert - not that I'm aware of. Ultimately you would have to structure the attachments so they can be linked to specific features in the dataset (like how they are downloaded through this script). There is documentation on adding attachments below, and some logic from this script could probably be used:

Layer Attachments - ArcGIS API for Python 

arcgis.features.managers Module

Hey Michael I was able to get a script running to add attachments from a folder to a hosted feature layer based on an attribute value.  I used your configuration and then used the dataframe object to match files etc.  Very basic but it works for me!  Thanks for pointing me in the right direction.

AddAttachments with API for Python

Very nice Robert - thanks for sharing. Definitely one to add to my bookmarks!

Hi Michael Kelly

Thanks a lot for the script, it's highly appreciated, as the save as "FileGDB", "Replica" and download from Survey123 interface doesn't work for the concerned service.

Yet we still run into issues.

1. The "GroupedFolder" as AttachmentStorage doesn't seem to work (individual folders are created, and no objectid attached to the filename).

-> Not so important, we just added the objectid to filename also in the "IndivualFolder" storage part.

2. The feature Service contains about 5000 points with about 10000 pictures. The script runs fine up to ~1000 pictures (in ~400 folders), but then stops.

Here are the errors:


RuntimeError Traceback (most recent call last)
<ipython-input-1-8bd11ca8f5f2> in <module>()
117 if not os.path.isfile(newAttachmentPath):
118 logger.info('The size of the current attachment being downloaded is {}MB'.format((attachmentSize/1000000)))
--> 119 currentAttachmentPath = featureLayer.attachments.download(oid=currentObjectId, attachment_id=attachmentId, save_path=currentFolder)
120 #Rename to ensure file name is unique
121 renameFile(currentAttachmentPath, newAttachmentPath)

~\Anaconda3\lib\site-packages\arcgis\features\managers.py in download(self, oid, attachment_id, save_path)
38 desired_att = [att for att in att_list if att['id']== attachment_id]
39 if len(desired_att) == 0: #bad attachment id
---> 40 raise RuntimeError
41 else:
42 att_name = desired_att[0]['name']

RuntimeError:

Do you have an idea? Before that error, we ran also into timeouts. So we are not quite sure if it's a connection issue. Or are there simply too many features? Is there a way to tell the script to do the first 1000, pause, resume where it stopped for another 1000 etc until it's finished?

Your help is highly appreciated,

Annina

#survey123 download

Hi Annina,

I haven't tested for a dataset with that many features, and it could potentially cause problems. A quick fix might be to alter line 92.

featureObjectIds = featureLayer.query(where='1=1', return_ids_only=True)‍‍

This is where the original query to the feature layer is made and returns all the relevant features. When the where clause is set to 1=1, all features are returned. You could alter this to something like below to run the script for parts of the dataset:

featureObjectIds = featureLayer.query(where='OBJECTID < 1000', return_ids_only=True)‍‍‍‍

Given that the Feature Layer won't export, something may be corrupted internally. Judging by the error message above, the script has a problem with an attachment ID which is 'non existent' which has been return through the attachment query. If you could share the Feature Layer with me, I may be able to identify issues with the script and/or Feature Layer in question. If this is possible, my ArcGIS Online username is mikie.kelly and you could share it with me via a group without making it public.

Kind regards,

Mikie

Good stuff, thanks for making our lives a little bit easier.  Here's my question though - my colleague has shared a feature class with our organization and is asking me to complete this task for her.  However, when I put my user credentials in the script and the feature name I can't seem to find the feature (probably because it belongs to her user credentials?).  Is there a way to adapt this script for this type of enterprise level task, or do I have to own the feature class to run this script?

Hi Daniel,

Once the Feature Layer has been shared with you it should work. Can you access the Feature Layer by inserting the Item ID to the URL below and logging in?

https://www.arcgis.com/home/item.html?id=InsertItemIdHere

Is the database you are accessing hosted in ArcGIS Online, ArcGIS Enterprise (Data Store) or in an Enterprise Geodatabase? I have not tested this script with Enterprise Geodatabases.

Kind regards,

Mikie

Thank you for the quick response Michael!

I did figure it out – I was trying to use the name of the feature layer (ArcGIS online). However, I was able to find the layer using the id in the http address (a long hex decimal ID). Is that normal and is there a place to find that id anywhere in the layer properties or only in the address bar? I’ve highlighted these in the attached pdf.

Second, we’d like to name the pictures with a field from the actual feature layer (rather than the ATTACH table). It looks like the script accesses the feature layer properties through the ‘gis’ and the ‘.content’ and ‘.get’ methods. Can you direct me to the resources to better understand this approach, and how I might name the folders & images with a field of our choice?

Thank you kindly,

Daniel

Hi Daniel,

Yes the URL is typically where you source the unique Item ID. An Item Name is not always unique so cannot be used for finding a specific item.

See my comment above in relation to attribution. In short, it is not currently possible with the script in its current state, and would need some further development to get this working. It's something I'll consider as an enhancement, but unlikely to happen in the short term.

Kind regards,

Mikie

Thanks again Michael,

I’m going to have to figure this out… I’ll forward a solution to you!

Daniel Aragon | Technical Specialist - GIT / Stream Restoration | Michael Baker International

daniel.aragon@mbakerintl.com<mailto:daniel.aragon@mbakerintl.com> | 720-479-3184

165 South Union Blvd., Suite 1000 | Lakewood, CO 80228

Thanks Daniel - feel free to push any enhancements to the GitHub repository! Also, I fixed the hyperlink in my previous comment.

Mikie

Hi Michael,

That worked! We managed to download all the attachments. It helped also to identify that "non existen"-object. Thanks a lot for your help.

Annina

PS: Possibly that layer was corrupt. That's also why we needed the attachments to download, store and republish properly 🙂

Glad to hear it Annina 

Anonymous User

Thanks Michael Kelly‌ for this workaround. This has been so useful 

This may sound really stupid, but is it possible to schedule the script to run every 10 minutes or so?

Thanks so much,

Gee

Hi Gee,

Yes it can definitely be run on a scheduled basis, you can check out the following blog which goes into further details: Scheduling a Python script or model to run at a prescribed time

Mikie

Anonymous User

Thanks MKellyesri-ireland-ie-esridist

I'm not sure how I can set this up in python as I'm using jupyter notebook to run this.

So, is it possible to create an execution schedule inside jupyter notebook? Or, could you please shed some light into how I could run this entire process within python?

Thanks so much,

Gee

Hi Gee,

You can download as a .py file via Jupyter notebooks as below:

Download from Jupyter Notebooks as a Python File

Once you do this, it can be used within Task Scheduler. If you are using the Python environment installed with ArcGIS Pro, the Python path will be something like:

C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\python.exe

Mikie

Anonymous User

Hi MKellyesri-ireland-ie-esridist‌,

I get the following error when I run it in python. Any thoughts?

Thanks again

Really really appreciate the help....

Traceback (most recent call last):
  File "C:\Users\Geethaka\Downloads\Download Attachments\DownloadAttachments.py", line 59, in <module>
    fileHandler = logging.handlers.RotatingFileHandler('{}/{}.log'.format(SaveLogsTo, logFileName), maxBytes=100000, backupCount=5)
AttributeError: module 'logging' has no attribute 'handlers'

Anonymous User

I commented out the following lines of code to get it working

Pretty sure these only affect the creation of the log files

"""
fileHandler = logging.handler.RotatingFileHandler('{}/{}.log'.format(SaveLogsTo, logFileName), maxBytes=100000, backupCount=5)
formatter = logging.Formatter('%(asctime)s %(levelname)s %(relativeCreated)d \n%(filename)s %(module)s %(funcName)s %(lineno)d \n%(message)s\n')
fileHandler.setFormatter(formatter)
logger.addHandler(fileHandler)
"""

Glad to hear you got it working.

In answer to your question, it might be just that you have to import logging.handler: Why do Python modules sometimes not import their sub-modules? - Stack Overflow

Anonymous User

Yep, that was it

Thanks so so much for the help

Anonymous User

Hi Robert Weber‌,

The link seems to be broken. Could you please share this again.

This would be super helpful 

Thanks,

Gee

Hi Gee,

The page has been moved - see this link. In case this happens again, you can go to the root of Roberts GitHub account here.

Mikie

Thank you for posting this, it seems like after an update something broke. I believe the method featurelayer.attachments.download is returning a list instead of a string.  

TypeError Traceback (most recent call last)
D:\spotted_lantern_fly.py in <module>()
134 currentAttachmentPath = featureLayer.attachments.download(oid=currentObjectId, attachment_id=attachmentId, save_path=featureLayerFolder)
135 #Rename to ensure file name is unique
--> 136 renameFile(currentAttachmentPath, newAttachmentPath)
137 downloadCounter += 1
138 downloadSizeCounter += attachmentSize

D:\spotted_lantern_fly.py in renameFile(currentAttachmentPath, newAttachmentPath)
43 #Rename file - ensure new attachment path does not exist already
44 if not os.path.exists(newAttachmentPath):
---> 45 os.replace(currentAttachmentPath, newAttachmentPath)
46 logger.info('{} being renamed as {}'.format(currentAttachmentPath, newAttachmentPath))
47 else:

TypeError: replace: src should be string, bytes or os.PathLike, not list 

I've done some research but cant find figure out how to provide a string. Ive tried replace, rename, renames Is this something anyone else is experiencing and any ideas on how to fix? 

Hi Holly - have you ever got this script working before? I have not replicated this behaviour with the version of the Python API included with ArcGIS Pro 2.1.2 (v1.2.5). Can you test with the FeatureLayerId = '092d075f4b3a40f78cf1329b20b0d5e7' - this is a public layer so you should be able to access.

I initially wrote this using v1.2.0 of the ArcGIS API for Python - I have not tested it with later versions.

Hi Michael, thanks for this post...very useful!

I am trying to download a feature layer created by a Survey123 application that allows users to capture multiple photos per record. To do so we had to build it as a relationship. Here is the URL: https:[host]/portal/home/item.html?id=f0dcf24f2bb9440d9fa2c8ec64141fe1#data. When viewing the Data tab from the Portal I can select the "(0) Add" text in the "Photos and Files" field to access the photos. I tried plugging f0dcf24f2bb9440d9fa2c8ec64141fe1#data into the FeatureLayerID =, but it returns no files. Any idea what I may be missing?

Hi Eric,

If the photos are stored in a table with no spatial attributes, it will appear as a table through the API (as opposed to a layer). Therefore the 'layers' text in lines 81/82/96 have to be chanced to 'tables'.

#Lines 81/82
for i in range(len(itemObject.tables)):
    featureLayer = itemObject.tables[i]

...

#Line 96
logger.info('Currently looping through feature attachments in layer {} of {}: storing in folder named "{}"'.format(str(i + 1), str(len(itemObject.tables)), featureLayerName))‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

This should loop through any associated tables (as opposed to layers) and return the appropriate results.

Kind regards,

Mikie

That was exactly what I needed! Thanks so much!

Hey Mike,

The script is running as expected, but I am trying to get it to place the attachments in a folder named after a value specified by a specific field ("Subject") in the related layer. Do you have any pointers for how to configure that?

Thanks!

Eric

Hi Eric - a similar question came through in one of the comments above. You would have to alter the query to send back the appropriate attributes, and then use the returned attributes whilst specifying the associated file name.

I can't carry out this program because the following error happens. Please tell me the solution.

My feature layer comprises Japanese words. So this error may happen.

Is this program applicable to Japanese layers?

File "C:\ProgramData\Anaconda3\lib\http\client.py", line 1117, in putrequest
self._output(request.encode('ascii'))

UnicodeEncodeError: 'ascii' codec can't encode characters in position 43-45: ordinal not in range(128)

Hi Holly,

I was getting the same error as you. I fixed it by adding [0] to the end of the currentAttachmentPath variable so that it gets the first and only attachment path in that list and uses it. I had to add it to this part: renameFile(currentAttachmentPath[0], newAttachmentPath) under both GroupedFolder and IndividualFolder. I tested it on Michael's public layer, and was able to download the attachments.

Thank you very much.

I'll try it.

2018年7月6日(金) 3:08 Chelsea Rozek <geonet@esri.com>:

GeoNet <https://community.esri.com/?et=watches.email.document_comment>

Downloading Feature Layer Attachments via the ArcGIS API for Python

new comment by Chelsea Rozek

<https://community.esri.com/people/rozekc_washtenaw?et=watches.email.document_comment> View

all comments on this document

<https://community.esri.com/docs/DOC-10441-downloading-feature-layer-attachments?commentID=63017&et=watches.email.document_comment#comment-63017>

Hi 竹元 貴彦,

Did you get the script working with the pre-set public layer? I have not tested this against layers with Japanese. Does the layer itself (layer name, field names, etc.) contain Japanese, or is there Japanese only in the attributes? Is the layer available publically, or could you share it with me through a group? My ArcGIS Online username is mikie.kelly.

Kind regards,

Mikie

Anonymous User

Mikie,

This is an awesome script and I was able to get it working quickly despite limited Python experience.  I am wondering how line 92 might be modified to fetch GUIDs instead of OIDs.

Thanks!

I worked using private layer. So I can't show it imediately. I make or search another layer that contains Japanese to show you the problem as soon as possible.

My private layer has Japanese both  field name and attributes.

Hi Bob,

Global ID's are returned with the attachments. You can access them by adding to lines 108-111 as below (both attachment Global ID and attachment parent Global ID accessible):

for k in range(len(currentObjectIdAttachments)):
     attachmentGlobalId = currentObjectIdAttachments[k]['globalId']
     attachmentParentGlobalId = currentObjectIdAttachments[k]['parentGlobalId']
     attachmentId = currentObjectIdAttachments[k]['id']
     attachmentName = currentObjectIdAttachments[k]['name']
     attachmentSize = currentObjectIdAttachments[k]['size']‍‍‍‍

Then you can reference them when naming the file 117

fileName = '{}-{}-{}'.format(attachmentGlobalId, attachmentParentGlobalId, attachmentName)

Mikie

Anonymous User

Works like a charm. Thanks.

Great script. I would also appreciate having the attachment files named based on a field in the associated attribute table. You mentioned this is possible but would require some rejigging -- have you or anyone else developed script for this?

Thanks!

Annalise

Hi Annalise,

I'm afraid not, at least not that I'm aware of.

Mikie

I work with the U.S. Fish and Wildlife Service and use an Enterprise Account login. Can this script function with an Enterprise login? 

 #What are your ArcGIS Enterprise/ArcGIS  credentials? This is case sensitive.
PortalUserName = 'email here'
PortalPassword = 'password here'
PortalUrl = 'https://fws.maps.arcgis.com'

I changed the PortalURL to direct to the FWS ArcGIS URL but I keep getting an error message --  [ERROR:arcgis._impl.connection:Invalid username or password.]

I've confirmed that the username and password I entered are correct. 

Thanks!

Annalise

Hi Annalise,

https://fws.maps.arcgis.com is an address for ArcGIS  as opposed to ArcGIS Enterprise. All ArcGIS  accounts can simply use https://www.arcgis.com for the PortalUrl. An ArcGIS Enterprise address is likely to look like this: https://portalcomputer.domain.com/portal/home.

PortalUserName = 'MKelly@MyDomain'

PortalPassword = 'MyPassword'

PortalUrl = 'https://portalcomputer.domain.com/portal/home'

Further information can be found here. One thing to note is that the username can be case sensitive - you need to replicate how it appears in the Portal organisation page.

Mikie

Hi Michael

Great script, quick question: Is it possible to get a file-like object from the attachments manager without downloading the file locally first?

So instead of downloading the file with attachments.download i would want to have the attachment as an in-memory object...

Currently my script first downloads the attachment and then opens it into a binary file-like object and then delete the local file when i am done....

Any help will be appreciated.

Hi Deon,

I haven't noticed anything like that in the AttachmentManager class - is there a problem with the approach you are currently taking?

get_list includes the URL of any attachments (from what I remember) if that's any use?

Mikie

Version history
Last update:
‎08-14-2017 08:15 AM
Updated by:
Contributors