I don't think that firewall is the only issue here. We have a dev environment set up where the firewall is turned off on both machines we cluster; and the difference in proformance while performing operations is pretty striking. For example, If I go to overwrite over an existing service while the machines are clustered, the service selection step in the dialogue won't show for a good 15 minutes. UNclustered, I'll get a response back in seconds. To esri's credit, once the cluster is up and doing its thing, it ticks like a clock.
If this is a configuration problem, then where is the documentation that tells us how to avoid it?
As it stands now, we only cluster 2 machines. If we do operations in off-peak hours then it's not a big deal uncluster, do your housekeeping and then re-cluster. Running more than 2 however with the additional load it's expected to carry could get messy.