What is the variance and the coefficient of variation
In probability theory and statistics, variance measures how far a set of numbers is spread out. A variance of zero indicates that all the values are identical. Variance is always non-negative: a small variance indicates that the data points tend to be very close to the mean hence to each other, while a high variance indicates that the data points are very spread out around the mean and from each other.
An equivalent measure is the square root of the variance, called the standard deviation. The standard deviation has the same dimension as the data, and hence is comparable to deviations from the mean.
The variance is one of several descriptors of a probability distribution. In particular, the variance is one of the moments of a distribution. In that context, it forms part of a systematic approach to distinguishing between probability distributions. While other such approaches have been developed, those based on moments are advantageous in terms of mathematical and computational simplicity.
The variance is a parameter that describes, in part, either the actual probability distribution of an observed population of numbers, or the theoretical probability distribution of a not-fully-observed population from which a sample of numbers has been drawn. In the latter case, a sample of data from such a distribution can be used to construct an estimate of the variance of the underlying distribution; in the simplest cases this estimate can be the sample variance.
The variance of a random variable X is its second central moment, the expected value of the squared deviation from the mean μ = E[X]:
This definition encompasses random variables that are discrete, continuous, neither, or mixed. The variance can also be thought of as the covariance of a random variable with itself:
If the random variable X is continuous with probability density function f(x), then the variance is given by
where is the expected value,
and where the integrals are definite integrals taken for x ranging over the range of X.
The coefficient of variation (CV) is defined as the ratio of the standard deviation to the mean :
It shows the extent of variability in relation to mean of the population.
The coefficient of variation should be computed only for data measured on a ratio scale, as these are measurements that can only take non-negative values. The coefficient of variation may not have any meaning for data on aninterval scale.[2] For example, most temperature scales are interval that can take both positive and negative values, whereas the Kelvin scale has an absolute null, and negative values are nonsensical. Hence, the Kelvin scale is a ratio scale. While the standard deviation (SD) can be derived on both the Kelvin and the Celsius scale (with both leading to the same SDs), the CV is only relevant as a measure of relative variability for the Kelvin scale.
Measurements that are log-normally distributed exhibit stationary CV; in contrast, SD would vary depending on the expected value of measurements. This is the case for laboratory values that are measured based on chromatographic methods.
A nonparametric possibility is the quartile coefficient of dispersion, i.e. interquartile range divided by the median .