Downloading Feature Layer Attachments via the ArcGIS API for Python

24349
73
08-14-2017 08:15 AM

Downloading Feature Layer Attachments via the ArcGIS API for Python

I have commonly encountered questions regarding the downloading of attachments (e.g. pictures and documents) from ArcGIS Online Feature Layers. The aim of this is to provide an alternative solution to what's currently on offer.

Firstly, the only way attachments are included when exporting data via ArcGIS Online is when exporting as a File Geodatabase. The reason for this is the File Geodatabase is the only format that supports related records. Therefore, the most common answer to accessing Feature Layer attachments outside of ArcGIS Online is to export the Feature Layer as a File Geodatabase, and extrude the data using a script like this through ArcGIS Desktop. There is also a sync script available which can sync a local File Geodatabase with a hosted Feature Layer. These work well, but they do rely on ArcGIS Desktop, and in the case of option one, also require you to download the File Geodatabase (which can grow in size as you add data to it) in advance of running the script.

For this reason, I created a script using the new ArcGIS API for Python which works independently from ArcGIS Desktop. In fact it works directly with a Web GIS, so you don't even have to download a File Geodatabase in order to access your attachments. It can be re-run on a regular basis, and only downloads new attachments to disk if they have not been previously downloaded.

So what does this script do?

  1. Creates a folder in which feature attachments are stored (e.g. Attachment Downloads)
  2. Within this folder creates a sub-folder for each layer in the specified Feature Layer
    1. If the AttachmentStorage variable is set to 'GroupedFolder', attachments are stored in the format ObjectId-AttachmentId-OriginalFileName in one single folder
    2. If the AttachmentStorage variable is set to 'IndividualFolder', a new folder is created for every feature with an attachment (named using the features Object ID), while the attachments are stored within in the format AttachmentId-OriginalFileName
  3. If the attachment already exists on disk it will not be downloaded again
  4. Summary of downloads (total number of downloads and total attachments size) provided as final console output

How do you get setup?

  1. Follow the ArcGIS API for Python Install and set up guide
    1. If using Anaconda, you can use the .yml file to create appropriate environment
  2. If using Startup.bat file, edit (in a text editor) to reference appropriate folders
  3. Once you get a Jupyter Notebook open, paste the attached .zip file below in an accessible location (and unzip it)
  4. Open the DownloadAttachments.ipynb file via Jupyter Notebooks and run (click into code and press Crtl+Enter)
  5. The script will run on a the specified public Feature Layer and download the attachments to the specified Downloads folder
  6. An associated log file is created in the Logging folder every time you run the script

Notes:

  • Update the FeatureLayerId variable to run on your own Feature Layer
  • Specify your PortalUserName, PortalPassword if the Feature Layer is secured
  • The names of folders are filtered to show only 0-9 and a-Z characters
  • This script has not been tested against ArcGIS Enterprise (only Feature Layers hosted in ArcGIS Online), but it should still work
  • Code also available through the Esri developer-support GitHub repository
Attachments
Comments
Anonymous User

Hi Michael Kelly‌,

A couple of days ago our machines were upgraded to ArcGIS Pro 2.2.1 (Python 3.6.5).

Since then attachments fail to include "OBJECTID - ATTACHMENTID" in the attachment file names (only attachment names).

Can you think of the reason for this - from the top of your head? Otherwise, I'm more than happy to take a further look.

But I thought I'd ask you first.

Thanks,

Gee

Anonymous User

All sorted.

It was the same error as what Holly Tran‌ was experiencing. And managed to get it working by using Chelsea Rozek‌'s suggestion.

Thanks guys,

Gee

Getting this error with enterprise. Anyone have a possible solution? 

Error

Sounds like Holly's error above, see my previous response for a solution: adding [0] to the end of the currentAttachmentPath variable

I did that step and this error popped up. 

Error

Is the feature layer you're running this on publicly accessible? If yes, what's the url?

this worked for me too (ie just adding [0] on line 123, and line 137, so that the line is

renameFile(currentAttachmentPath[0], newAttachmentPath)

instead of

renameFile(currentAttachmentPath, newAttachmentPath)

- I am running python 3.7.2, so it seems to be an issues that comes up when using a later version of python

Hello Michael Kelly‌. Thank you fo putting script together and sharing it with the Community. I forked your project in Github. I'm working on adding capabilities the Community is requesting here and will submit a commit soon. Plus I am adding the ability to choose a portal that is Enterprise or AGO, and some other automation. 

As for saving each attachment with a file name using an attribute value I am running into an issue with the FeatureSet object that is returned from Query() I am passing in the currentObjectId to obtain the field containing the pole number, which is esupportstructure_facilityid.

            currentObjectId = featureObjectIds['objectIds'][j]
            facilityID = featureLayer.query(objectids=currentObjectId,
                         outFields='esupportstructure_facilityid')

The documentation appears to return a JSON object so I also tried treated it as such. When I do that another error appears as attached below, 

FeatureSet' object is not subscriptable

I have not found any ArcGIS Python API examples that work with FeatureSet object and Query(), can you point to some samples or provide a snippet.

---------------------------------------------------------------------------TypeError                                 Traceback (most recent call last)<ipython-input-1-bc6c83cbc278> in <module>    202             currentObjectId = featureObjectIds['objectIds'][j]    203             facilityID = featureLayer.query(objectids=currentObjectId, outFields='esupportstructure_facilityid')--> 204             poleNumber = facilityID['eSupportStructure_FACILITYID']    205             logger.info(f'{poleNumber}')    206             currentObjectIdAttachments = featureLayer.attachments.get_list(oid=currentObjectId)TypeError: 'FeatureSet' object is not subscriptable

JSON Response from Query().

INFO:__main__:{"features": [{"geometry": {"x": 2678790.012921214, "y": 236090.84788563848}, "attributes": {"eSupportStructure_FACILITYID": "47185"}}], "objectIdFieldName": "OBJECTID", "globalIdFieldName": "GlobalID", "spatialReference": {"wkid": 102660, "latestWkid": 2238}, "geometryType": "esriGeometryPoint", "fields": [{"name": "eSupportStructure_FACILITYID", "alias": "Pole Number", "type": "esriFieldTypeString", "length": 20}]}

Michael Kelly‌ Thanks for this script!  I'm using it for a few applications and it works great! Recently I implemented a survey123 app that captures pictures in a repeat, and so images are attached to a related table.  Can this script be modified to download attachments from related tables, or is does it need to be from the feature layer?

Thanks!

Hi, everybody.
I followed the post to download the attached photos from a survey I generated (the photos are inside a table).
I have also modified the script because I had the same problem as @Chelsea Rozek


I'm running the script inside the ArcGis Online Notebook.
It doesn't give me any error and it seems to download the content. However I go to the path where the downloaded photos are supposed to be but the folder doesn't exist. Although the script is saying that the photos are downloaded.

INFO:__main__:Time: 2020-06-08 10:44:57.796204 INFO:__main__:Currently looping through feature attachments in layer 1 of 2: storing in folder named "0-foto" INFO:__main__:There are 386 features to iterate in this layer INFO:__main__:The size of the current attachment being downloaded is 1.17647MB
----------

INFO:__main__:C:\ScriptDownloads\\0-foto\\103/fotos-20200525-132307.jpg being renamed as C:\ScriptDownloads\\0-foto\\103\103-fotos-20200525-132307.jpg INFO:__main__:The size of the current attachment being downloaded is 1.69949MB


Any suggestions? Thanks

This script looks promising but I not sure I am set up to run it properly...  Currently I am Right clicking on the original "DownloadAttachments.py" and then running "Edit with IDLE (ArcGIS Pro)"  then Run > Run Module.  This is what I get:

Python 3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 14:00:49) [MSC v.1915 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license()" for more information.
>>>
================== RESTART: C:\Temp\DownloadAttachments.py ==================
INFO:__main__:Script Starting at 2020-06-09 18:11:08.674435
ERROR:arcgis._impl.connection:Item does not exist or is inaccessible.
Traceback (most recent call last):
File "C:\Temp\DownloadAttachments.py", line 77, in <module>
logger.info('Iterating through layers in Feature Layer "{}"'.format(itemObject.name))
AttributeError: 'NoneType' object has no attribute 'name'

I have left all of the configuration stuff alone and was trying to just see if I could get the test data to work...

Any suggestions for a novice?

You may not be calling the feature correctly, the message says you have no name attribute

AttributeError: 'NoneType' object has no attribute 'name'

The weird thing is I am just running the script as I downloaded it so wondering if the service that was set up for testing has now been taken down?  Have you gotten the script to run recently? Maybe there are other changes that have happened that are causing the script to fail?

Try using a service Item ID that you are familiar with.  Like one that is in your own AGOL organization do not try using the default Item ID.  The error you are getting is basically saying it is not finding a feature service based on the ID you are giving it.

Is there a way to only download some attachments based on an attribute field? I want to be able to put in a specific project name and just download the attachments from that project (so I'm downloading 20 attachments instead of 1500). I tried this for the query (ProjectName is an attribute in my feature layer):

featureObjectIds = featureLayer.query(where=ProjectName == '9858C', return_ids_only=True)

But then I got this error:

NameError: name 'ProjectName' is not defined

Is there any way to do this?

Yes! I was able to do this by using the highlighted query. 

You are a lifesaver!!!!! That worked perfectly. I had been trying to figure that out for hours.... Thank you so much!

Hi @MichaelKelly ,
Thanks for this, it's great!

I'm wanting to build the attachment table at the same time to get the relationships so I need the parentGlobalID. Mine doesn't have that available though: 

for k in range(len(currentObjectIdAttachments)):
     attachmentGlobalId = currentObjectIdAttachments[k]['globalId']
     attachmentParentGlobalId = currentObjectIdAttachments[k]['parentGlobalId']
     attachmentId = currentObjectIdAttachments[k]['id']
     attachmentName = currentObjectIdAttachments[k]['name']
     attachmentSize = currentObjectIdAttachments[k]['size']

and Global ID is 'globalid' instead of 'globalID'
any pointers on how do we get it from the feature as we iterate through? 

Cheers

Hi,

So I got the Jupyter Notebook open, I saved the attachment, and went in to customize the code to my needs. Ran the code and got this error:

NameError Traceback (most recent call last)
<ipython-input-2-17b343b15353> in <module>
9 "start_time": "2017-08-16T09:06:05.091046Z"
10 },
---> 11 "scrolled": false
12 },
13 "outputs": [

NameError: name 'false' is not defined

 

I'm trying to figure this out in the mean time, but please let me know if you can help me.

@MichaelKelly 

Hi @KellyMeehan 

looks like 'false' there in your code is not in brackets as a string so is trying to refer to a variable that has not been defined. In python I believe the keyword for the boolean true/false is with a capital: False

Hi anyone interested, thought I'd post my solution to my question above, 

I've gotten my parent global ids using this line:

featureGuIds = featureLayer.query(where=featureFilter,out_fields=['globalid','objectid'], as_df=True,return_geometry=False)

And then I could get it when I iterate through to get the attachments as well as the object id:

currentObjectId = featureGuIds['objectid'][j]
parentGlobalId = featureGuIds['globalid'][j]

 

@Felicitychun Thank you for your response! It seems that your suggestion resolved that error.

Now when I run the script, it seems to feed me back the code, and the file path I tried to feed the attachments in was empty. 

KellyMeehan_0-1607527875128.png

I really appreciate any response! A newbie to python and trying to improve!

@MichaelKelly 

Hello, 

Great script! I am just wondering if it is at all possible to modify the name of each individual folder based on an attribute within the layer?

 

Thanks!

Hello @Felicitychun  ,

Did you improve the script to download attachments from related table too ? If so, could you share with us? 

I'm look for way to export all attachments from the survey feature and repeats tables as well on a single script.

Version history
Revision #:
1 of 1
Last update:
‎08-14-2017 08:15 AM
Updated by:
 
Contributors