Going to post the second half of a question I asked earlier as it's as yet unaddressed and causing me issues. 🙂
After updates to Survey123 on December 6, I've noticed some oddities when testing a survey that triggers a webhook. After filling out the form, people may (doesn't happen every time...) see the following error:
If you select show details it shows the URL to the webhook processing server and claims it timed out after five seconds. However, most of the time, the server processed the webhook just fine. In fact, if I watch Azure's Log Stream, I can see the connection occurring quickly.
Additionally, it now seems that the webhook it being triggered twice per entry. My assumption here is that since Survey123 thinks the server waiting on the webhook is not responding, it sends the data again, but that's just a guess on my part.
Any ideas what may be causing these?
With the 3.16 December update, logic was added to the web app to retry submitting a webhook if the first one failed. Your assumption is likely correct where the first submission is taking longer than the web app is anticipating and running the retry logic sending a second request to the webhook payload URL. We'll test to reproduce the issue on our end and get the issue logged.
Can you please provide some more detail as to how your webhook is working? I had tested with a Power Automate webhook that simply listens for a survey to be submitted, wait's 30 seconds, then sends an email. In each of my tests the request to the payload URL took 350 ms and never triggered the retry logic. We are curious how your webhook is setup that the request to the payload URL is taking longer than 5 seconds?
I take the Survey123 response and create a task in the Wrike task management product with it. I removed the duplicate testing section of the code and ran a test to see how long that takes and was a little suprised:
Executed 'Functions.applyNewSurvey123' (Succeeded, Id=d6e5cb6f-6975-4b48-86f5-ee64b5f29443, Duration=10061ms)
Edit: And dealing with this kind of thing from the Azure Function server...
2022-12-14T22:04:58Z [Verbose] [HostMonitor] Worker status: ID=04626ca9-7269-4155-9f36-8f943b15847d, Latency=6ms
2022-12-14T22:05:09Z [Verbose] [HostMonitor] Worker status: ID=e25ef008-e6a9-4864-8bc5-0c588a0986c1, Latency=11240ms
I decided to break my app into chunks and step through it bit by bit to see how long things were taking. Times below are cumulative (inclusive of all the stuff above it.)
1. Webhook triggered, json read and data assigned to variables = 36 - 60ms
2. Add lookup of x,y coordinates for best guess physical address via geocode.arcgis.com rest services = 290 - 400ms
3. Choose the category it will ultimately end up in Wrike (via dictionary lookup) = 200 - 450ms [15973ms outlier]
4. Prep variables for insertion into Wrike (create Dictionary of necessary variables) then retrieve an email address of responsible person from ArcGIS Online table lookup = 2500 - 3200ms [18329ms outlier]
5. First call to Wrike. Based on email address above, look up user ID. 2600 - 4100ms
6. Build ticket and insert into Wrike. 3100 - 4000ms [13952ms outlier]
7. With data returned from Wrike, update Survey123 table with WrikeID and permalink. 8000 - 10000ms
So, other than outliers, it's step 7 that is consistently causing the issue. Those are two fields I added to the survey table in ArcGIS Online. They are updated by sending an array of changes to the edit_features() method of the necessary layer. Perhaps there is a more efficient way of doing that?
Alternatively, I have to admit, every tutorial on dealing with webhooks has had the final line of whatever the responding code is be the return code. In my case its:
Edit: Rereading through this, it's fairly obvious that the large time takers are interactions with ArcGIS Online, either searches or updates. However, I guess to keep this relevant to Survey123 Questions, it should just be pointed out that these errors did not occur until the latest update.
Edit #2: After optimizing some of the code where I update AGOL fields I have sped up the response time, but it is still consistently coming in just over five seconds. 😞
Hi @RogerAsbury ,
I responded to your post https://community.esri.com/t5/arcgis-survey123-questions/new-errors-since-survey123-update/m-p/12403... regarding what is occurring - the webhook refires after 5 seconds. Recommended design for a function that responds to a webhook is to receive the data, respond to the webhook as immediately as possible, and then process the data. As an example, Microsoft describes a sample approach of passing the message into a queue and then having a function....
I was having a similar issue with my google cloud functions taking too long and been timedout, it usually happened when triggering the webhook after some time of no use. When i would try to resend the s123 answer again from my webform, usually works fine. It seems that my function its taking too long to process. Now migrating all my data processing to another function that will be called as a Task. Thanks!