Quantcast
Viewing all articles
Browse latest Browse all 12

Answer by augustin4 for Why is accuracy not the best measure for assessing classification models?

One of the problems of accuracy is that it ignores the intrinsic difficulty in the data-generating mechanism. Here, the difficulty refers to the uncertainty of the label, which can be measured by its variance. The point is that when assessing a classifier, a misclassification can be due to high uncertainty of the data-generating mechanism instead of the flaw of the model.

For instance, we predict $Y\in\{0,1\}$ from $X\in \mathcal X$. Assume that there is an unknown data-generating probability function $P(Y=1\mid X=x)$, $x\in\mathcal X$. The variance $$Var(Y=1\mid X=x)=P(Y=1\mid X=x)(1-P(Y=1\mid X=x)),$$which is maximized when $P(Y=1\mid X=x)=0.5$ and minimized when $P(Y=1\mid X=x)=0$ or $1$. Suppose $\exists x_1,x_2\in\mathcal X$ such that$$P(Y=1\mid X=x)=\begin{cases}0.5&\ \text{when } x=x_1,\\0.99&\ \text{when } x=x_2.\end{cases}$$When evaluating the performance of a classifier, a misclassification at $x_1$ should be considered less severe than a misclassification at $x_2$. However, this problem of heteroscedasticity in variance is not reflected in accuracy. Also, the best possible accuracy depends on the $P(Y=1\mid X=x)$ and can be very different on different data sets.

To take the unknown $P(Y=1\mid X=x)$ into account and assess how well the classifier performance compared with the best possible performance, we may apply goodness-of-fit tests, e.g., Pearson's chi-squared test, Residual deviance test, and Hosmer–Lemeshow test. Although the majority of the goodness-of-fit tests only apply to parametric models, there is a recent work Is a Classification Procedure Good Enough?—A Goodness-of-Fit Assessment Tool for Classification Learning that addresses the evaluation problem of general classifiers.


Viewing all articles
Browse latest Browse all 12

Trending Articles