TIP: Statistics
Introduction
A few weeks ago, I needed to find the correlation between two variables at my work. I began searching the net 'Net for the words "correlation" and "pearson" but couldn't find any decent piece of code.
Let me define the word correlation first:
"A correlation gives the strength of the relationship between variables" (mathworld.wolfram.com).
A normalized correlation is called the Pearson Value. The Pearson Value ranges between -1 to +1. A correlation of +1 means that there is a perfect positive linear relationship between the variables. If it's -1, there is a perfect negative relationship. 0 means no relationship at all.
To get the covariance and Pearson, you need to get a few things first.
Implementation
Average
You sum up all values and divide the sum by the number of values.
/// <summary>
/// Get average
/// </summary>
public static double GetAverage( double[] data )
{
int len = data.Length;
if ( len == 0 )
throw new Exception("No data");
double sum = 0;
for ( int i = 0; i < data.Length; i++ )
sum += data[i];
return sum / len;
}
Variance & Standard Deviation
The variance is the squared differences from the average. The standard deviation is the square root of the variance.
/// <summary>
/// Get variance
/// </summary>
public static double GetVariance( double[] data )
{
int len = data.Length;
// Get average
double avg = GetAverage( data );
double sum = 0;
for ( int i = 0; i < data.Length; i++ )
sum += Math.Pow( ( data[i] - avg ), 2 );
return sum / len;
}
/// <summary>
/// Get standard deviation
/// </summary>
public static double GetStdev( double[] data )
{
return Math.Sqrt( GetVariance( data ) );
}
Covariance & Pearson
To calculate covariance, you need to get the average and standard deviation for each variable. You sum the multiplication of x - Avg(x) and y - Avg(y) and finally divide it by the length of the variables. To get the Pearson value, you divide the covariance by the multiplication of stDevX and stDevY.
/// <summary>
/// Get correlation
/// </summary>
public static void GetCorrelation( double[] x,
double[] y,
ref double covXY,
ref double pearson)
{
if ( x.Length != y.Length )
throw new Exception("Length of sources is different");
double avgX = GetAverage( x );
double stdevX = GetStdev( x );
double avgY = GetAverage( y );
double stdevY = GetStdev( y );
int len = x.Length;
for ( int i = 0; i < len; i++ )
covXY += ( x[i] - avgX ) * ( y[i] - avgY );
covXY /= len;
pearson = covXY / ( stdevX * stdevY );
}

Comments
There are no comments yet. Be the first to comment!