I really like this idea, admittedly a big ask for S123 developers. Bird surveys are notorious for massive amounts of data that needs inputting in a ridiculously short span of time, ideally without taking your eyes off the action.
In an ideal world the app would work something like:
surveyor says "Collect" <form opens>
"species, red-tailed hawk" <species select_one question populates with 'Red-tailed hawk'>
"quantity, two" <quantity integer field populates with '2'>
"nest present, no" <nest present select_one toggles to 'no'>
"save to Outbox" <form saves to Outbox, chime sounds to notify user that task accomplished>
Of course, to be useful in most field conditions the voice recognition needs to work without having an active internet connection (possibly via user-programmable vocabulary, e.g. ""black-capped vireo" != "dance the rio").