A short post on scoring rules and their connection to a divergence metric.
A scoring rule is a metric that characterises the quality of a probabilistic forecast. If we are interested in forecasting rainfall, then we let the random variable denote a future event and the set of all possible values that could take. In this example, we would be define as it would not make sense to have negative rainfall. Our model is probabilistic and therefore outputs a probability distribution. We use to denote the set of all valid probability distributions with support on . When computing a scoring rule, we seek to compare our model’s forecasted distribution at time against a true observation . In the context of our precipitation example, could be an exponential distribution with a rate parameter of 2 and would be a real value, such as 2.2mm, that would correspond to the true rainfall amount at time .
A scoring rule is then a function . Lower scores are indicative of a higher quality forecast. If we let be the true probability distribution of rainfall, then for any we have
A scoring rule is defined as a proper scoring rule if for all
Similarly, the proper scoring rule is strict if the inequality in (2) becomes a strictly less that. In this case, (2) will only achieve equality if .
Scoring rules are a broad family of functions and there are many connections to statistical divergences. One example is the equivalence of the log-scoring rule and the Kullback-Leibler divergence (KLD) $\operatorname{KL}(q, p).Note the order of arguments in the KL-divergence operator matters as the divergence is asymmetric. To see this connection, we can write
We can see that starting from the log-scoring rule, in just two lines we've arrived at the definition of the KLD.