Forest-based tool: create AUC diagnostic

patriciacdale

We are writing a research paper and comparing the results of our forest-based model with other models (logistic regression, SMOTE, cost-estimate). The literature review consistently shows that AUC (area under the curve), is an important parameter when evaluating the performance of classification models. AUC represents the area under the Receiver Operating Characteristic (ROC) curve, which plots the true positive rate (sensitivity) against the false positive rate (1-specificity) across different thresholds.

AUC is a comprehensive measure of model performance. It provides a single scalar value that summarizes the performance of the model across all possible classification thresholds, rather than relying on one specific threshold. This gives a more holistic view of the model’s ability to discriminate between positive and negative classes.

AUC is not dependent on a specific decision threshold, unlike accuracy or other metrics that may vary significantly with the choice of threshold. This makes AUC a robust metric for comparing models, as it reflects the model’s ability to distinguish between classes regardless of where the decision boundary is set.

In cases where there is a significant class imbalance, metrics like accuracy can be misleading. AUC, on the other hand, is more informative because it takes into account both the sensitivity and specificity, thus giving a more balanced view of model performance even when classes are imbalanced.

For this and many other reasons, I strongly suggest that the spatial statistics team considers adding the AUC as a diagnostic of the performance of the model when running the forest-based tool in ArcGIS Pro.