Mody:
It will be tough to get performance as good as visibility tool. Visibility does not run independent sightlines to each raster cell, it uses an approximate expanding rasterized horizon, so there are more interpolation errors. It also assumes that straight lines on the map are straight on the ground. If that's good enough for your purposes, then use visibility (you should compare results between visibility and geodesic viewshed , along with some ground checks if possible).
Here are a couple of things to check and some additional suggestions.
What percentage of the time is being spent building the grid index for the gpu test? Look for this progress bar message while running the tool interactively:
You'll have to eyeball the time, since this isn't a gp message and doesn't have a time stamp associated with it. If you have lots of observers, there may be multiple observer groups, each requiring a different grid index.
Grid index building is not gpu accelerated. The difference between total time and index build time is what's actually being spent on the gpu. Grid index generation does use multiple cpu cores though (all of them by default, or whatever's specified by the PPF gp env var). We're looking at ways to cache that work so it doesn't have to be recreated every time. For now, the best way to reduce that time is to keep the outer radius as small as possible, and also use an angular restriction if possible (anything to reduce the number of DEM cells that need to be processed).
Make sure your TMP/TEMP variables are pointing at a large fast SSD. That is where the grid index will be created.
You might try the 'perimeter sightlines' method instead of the 'all sightlines' method. The former only runs sightlines to raster cells on the boundary of the area being processed (the edge of the circle defined by the outer radius, for example). Its faster but more approximate (its still totally geodesic though). However, the perimeter sightlines method uses the grid index more, so the TMP/TEMP guidance above will be even more important.
An ampere class gpu would be faster for sightline processing, but i suspect most of the ~2 minutes your seeing for the gpu test are being spent building the grid index, so switching gpus won't help much in that case.
-jt