Hello, I have a data pipeline using Apache Airflow that kicks off our ingestion process on a geoprocessing server. For the most part things work well, however when the gp server gets bogged down with too many jobs in a submitted state, they no longer move to 'executing' when a previous job completes. They are asynchronous jobs so my question is two-fold:
1) What mechanism triggers the job state to move from submitted to executing?
2) How are the number of asynchronous jobs being executing simultaneously determined?
Right now I have 8 different jobs that are submitted with 0 executing, 0 waiting, 0 failed. They have been submitted for over 12 hours. I have to go in and turn off gp server, then turn it back on again in order to get things flowing again. Why is this?
Hi there! same issue over here even with a couple of jobs submitted. Task finally ends up cancelling itself due to service's timeout. Did you ever find out any information regarding this problem?
Unfortunately, support was unable to tell me these mechanisms. What I ended up doing is increasing the power of our EC2 instance the web server sits on. This helped in the short term, however, we developed a new SQL oriented method to ingest data to our tables which we then just register as a new data set in the database and then push our services from that.
Once the table has been registered, as long as the schema does not change (at least outside of arc), the service will update on it's own. Our long running submissions were being clogged by ingesting several 10 million+ record data sets. Using a data pipeline took the pressure off the gp service we were automating our ingest with.
Thanks @JVig
We'll see if I can dig deeper on the issue but the lack of documentation makes it tough to debug. I'll come back here if I ever find out anything else about it.