I'm wondering how I can use the Python API to query a feature layer and retrieve the feature and attachment storage separately.
The feature layer property 'size' returns the combined feature and attachment size. I'd like to be able to break the feature and attachment storage separately to better align with the reports generated by AGO.
Solved! Go to Solution.
The hosted feature layers I've been testing have not been edited since April 2022, so I don't believe that's an issue here.
I did some digging with a few of our hosted feature layers that have a considerable amount of attachments. It seems like the "search" method on the Attachment Manager only returns whatever is the max record count for the service. It doesn't seem to automatically paginate if the max record count is hit.
Using the feature layer above as an example, I should expect a return of around 3GB in size for the attachments.
If I use the code below, only the first 2,000 attachments are returned...the max record count for the hosted feature layer service. Even if I pass in a value of 9,999 for the "max_records" parameter used by the "search" method, only the first 2,000 attachments are returned.
import arcgis
itemId = '...'
# connect to portal
p = arcgis.GIS('...')
# set item object
item = p.content.get(itemId)
# get item layers
layers = item.layers
# set place holders
attachmentSize = 0
attachmentCount = 0
# iterate over item layers
for l in layers:
# check if layer supports attachments
attachmentSupport = l.properties.hasAttachments
if attachmentSupport:
attachments = l.attachments.search(where='1=1')
# iterate over attachments
for a in attachments:
attachmentCount += 1
s = a.get('SIZE')
attachmentSize += s
# convert attachment size to MB
attachmentSizeMB = ((float(attachmentSize) / 1024.0) / 1024.0)
print(f'Attachment Count: {attachmentCount:,}')
print(f'Attachment Size: {attachmentSize:,}')
print(f'Attachment Size (MB): {attachmentSizeMB:,.2f}')
It seems that in order to retrieve ALL attachments you need to account for pagination/offset when using the Attachment Manager. With the script below, which uses the "max_records" and "offset" parameters, I was able to finally return all attachments and get a similar number that the item page displays.
import arcgis
itemId = '...'
# connect to portal
p = arcgis.GIS('...')
# set item object
item = p.content.get(itemId)
# get item layers
layers = item.layers
# set place holders
attachmentLoops = 0
attachmentSize = 0
attachmentCount = 0
# iterate over item layers
for l in layers:
# check if layer supports attachments
attachmentSupport = l.properties.hasAttachments
if attachmentSupport:
# set query values
attachmentMax = 1000
attachmentOffset = 0
continueQuery = True
# get attachments
while continueQuery:
attachmentLoops += 1
attachments = l.attachments.search(where='1=1', max_records=attachmentMax, offset=attachmentOffset)
# check for attachments
if attachments:
# increment offset
attachmentOffset += attachmentMax
# iterate over attachments
for a in attachments:
attachmentCount += 1
s = a.get('SIZE')
attachmentSize += s
else:
continueQuery = False
# convert attachment size to MB
attachmentSizeMB = ((float(attachmentSize) / 1024.0) / 1024.0)
# check attachment loops
if attachmentLoops == 0:
attachmentLoops = 0
else:
attachmentLoops = attachmentLoops - 1
print(f'Number of Loops: {attachmentLoops:,}')
print(f'Attachment Count: {attachmentCount:,}')
print(f'Attachment Size: {attachmentSize:,}')
print(f'Attachment Size (MB): {attachmentSizeMB:,.2f}')
Hey I saw your other comment on this post which discusses using the attachment manager to find attachment sizes. I would recommend using this, but from your comment it seems the API does not return a very reliable figure.
I gave this a quick test with a layer uploaded that is reported as 1.316 MB on the attachment size while my notebook returns a value of 1381120 which looks about right. Can I just confirm if you did your testing a few hours after uploading any data? I have seen a few cases where the file size reporting in ArcGIS Online takes a few hours to display its true value after edits are made.
David
The hosted feature layers I've been testing have not been edited since April 2022, so I don't believe that's an issue here.
I did some digging with a few of our hosted feature layers that have a considerable amount of attachments. It seems like the "search" method on the Attachment Manager only returns whatever is the max record count for the service. It doesn't seem to automatically paginate if the max record count is hit.
Using the feature layer above as an example, I should expect a return of around 3GB in size for the attachments.
If I use the code below, only the first 2,000 attachments are returned...the max record count for the hosted feature layer service. Even if I pass in a value of 9,999 for the "max_records" parameter used by the "search" method, only the first 2,000 attachments are returned.
import arcgis
itemId = '...'
# connect to portal
p = arcgis.GIS('...')
# set item object
item = p.content.get(itemId)
# get item layers
layers = item.layers
# set place holders
attachmentSize = 0
attachmentCount = 0
# iterate over item layers
for l in layers:
# check if layer supports attachments
attachmentSupport = l.properties.hasAttachments
if attachmentSupport:
attachments = l.attachments.search(where='1=1')
# iterate over attachments
for a in attachments:
attachmentCount += 1
s = a.get('SIZE')
attachmentSize += s
# convert attachment size to MB
attachmentSizeMB = ((float(attachmentSize) / 1024.0) / 1024.0)
print(f'Attachment Count: {attachmentCount:,}')
print(f'Attachment Size: {attachmentSize:,}')
print(f'Attachment Size (MB): {attachmentSizeMB:,.2f}')
It seems that in order to retrieve ALL attachments you need to account for pagination/offset when using the Attachment Manager. With the script below, which uses the "max_records" and "offset" parameters, I was able to finally return all attachments and get a similar number that the item page displays.
import arcgis
itemId = '...'
# connect to portal
p = arcgis.GIS('...')
# set item object
item = p.content.get(itemId)
# get item layers
layers = item.layers
# set place holders
attachmentLoops = 0
attachmentSize = 0
attachmentCount = 0
# iterate over item layers
for l in layers:
# check if layer supports attachments
attachmentSupport = l.properties.hasAttachments
if attachmentSupport:
# set query values
attachmentMax = 1000
attachmentOffset = 0
continueQuery = True
# get attachments
while continueQuery:
attachmentLoops += 1
attachments = l.attachments.search(where='1=1', max_records=attachmentMax, offset=attachmentOffset)
# check for attachments
if attachments:
# increment offset
attachmentOffset += attachmentMax
# iterate over attachments
for a in attachments:
attachmentCount += 1
s = a.get('SIZE')
attachmentSize += s
else:
continueQuery = False
# convert attachment size to MB
attachmentSizeMB = ((float(attachmentSize) / 1024.0) / 1024.0)
# check attachment loops
if attachmentLoops == 0:
attachmentLoops = 0
else:
attachmentLoops = attachmentLoops - 1
print(f'Number of Loops: {attachmentLoops:,}')
print(f'Attachment Count: {attachmentCount:,}')
print(f'Attachment Size: {attachmentSize:,}')
print(f'Attachment Size (MB): {attachmentSizeMB:,.2f}')
Ah that explains! In my test layers the record count was below 2000 records.
Thank you for pasting your code. That should be really useful for anyone else encountering this.
David
I created a script to calculate the file size of hosted feature layers excluding attachments for credit management, based on the code from this post.