GWR bandwidth vs. AIC calculated kernel

Using a dataset of ~ 1500 locations, I found substantial spatial autocorrelation and nonstationarity (using Moran's I) between 1 single set of dependent vs. independent variables. Using Incremental Spatial Autocorrelation (ISA) in ArcGIS 10, I calculated a distance at which maximum spatial clustering occurs and used this distance as a bandwidth for both Hot Spot analysis and Geographically Weighted Regression.

When I run the GWR for using the ISA derived bandwidth I have greater biological significance in the trends of my coefficient distributions, yet higher AICc and lower Adjusted R-square scores then when I simply run the GWR with an adaptive kernel as calculated using AIC within the ArcGIS 10 program. When I test the GWR residuals using Moran's I, some of the GWR ISA bandwidth models still show significant spatial autocorrelation, which is absent in the GWR AIC adaptive kernel models.

I feel like the ISA derived bandwidth models are the 'correct' ones due to the separate calculation using ISA to determine max clustering, but I cannot ignore that the model fit appears higher using the AIC adaptive kernel.

Any suggestions or insights would be very appreciated!
