In response to Peter's advice. I checked the mosaic dataset and there were some issues from my previous efforts to build overviews. I determined to rebuild the combined mosaic dataset. Before I did, I reviewed and optimized the MaxPS and MinPS values of the source mosaic datasets. When I recreated the combined mosaic dataset using State Plane NAD83 I checked the Max and Min PS values and indeed they had changed, so I optimized them in the new combined mosaic dataset as well. I then timed how long it took to see these photos the first time in my flex application. It took around 12 seconds for the imagery to appear.
In response to jbswain. I checked the properties of several individual tiffs in Arc Catalog and next to compression it always said 'None', next to pyramids it said 'level: 5, resampling: Bilinear'. The exception was the overviews where it said 'LZW', but next to pyramids it said 'absent'. I may still try building more overviews later.
First I experimented with the coordinate system. I created a new mosaic dataset using the UTM coordinate system of the NAIP imagery and overviews. I created a new map service in UTM for my vector data too. I published a copy of my flex application configured to use these UTM service rather than the State Plane services I used earlier. When I timed how long it took to see the photos the first time it was 4 seconds. Performance was 3x better on the initial load than when I used the coordinate system of the vector data.
I also got a performance boost when I changed the maximum number of rasters per mosaic in the Image Service from the default of 20 to 5. Performance was 2x better on the initial load, going from around 12 seconds to around 6 seconds.
However, these performance results sometimes vary and I am guessing this is due to whether or not the images retrieved are in the arcgisserver output directory, network traffic and other impacts to performance.
Other things I am considering experimenting with: reproject NAIP & local imagery to NAD83 State Plane, add LZW compression to the native images if possible, and supressing pyramids in favor of more overviews.