TIP: Statistics


A few weeks ago, I needed to find the correlation between two variables at my work. I began searching the net 'Net for the words "correlation" and "pearson" but couldn't find any decent piece of code.

Let me define the word correlation first:

"A correlation gives the strength of the relationship between variables" (mathworld.wolfram.com).

A normalized correlation is called the Pearson Value. The Pearson Value ranges between -1 to +1. A correlation of +1 means that there is a perfect positive linear relationship between the variables. If it's -1, there is a perfect negative relationship. 0 means no relationship at all.

To get the covariance and Pearson, you need to get a few things first.



You sum up all values and divide the sum by the number of values.

/// <summary>
/// Get average
/// </summary>

public static double GetAverage( double[] data )
   int len = data.Length;
   if ( len == 0 )
      throw new Exception("No data");

      double sum = 0;
      for ( int i = 0; i < data.Length; i++ )
      sum += data[i];
   return sum / len;

Variance & Standard Deviation

The variance is the squared differences from the average. The standard deviation is the square root of the variance.

/// <summary>
/// Get variance
/// </summary>
public static double GetVariance( double[] data )
   int len = data.Length;
   // Get average
   double avg = GetAverage( data );

   double sum = 0;
   for ( int i = 0; i < data.Length; i++ )
      sum += Math.Pow( ( data[i] - avg ), 2 );
   return sum / len;

/// <summary>
/// Get standard deviation
/// </summary>
public static double GetStdev( double[] data )
   return Math.Sqrt( GetVariance( data ) );

Covariance & Pearson

To calculate covariance, you need to get the average and standard deviation for each variable. You sum the multiplication of x - Avg(x) and y - Avg(y) and finally divide it by the length of the variables. To get the Pearson value, you divide the covariance by the multiplication of stDevX and stDevY.

/// <summary>
/// Get correlation
/// </summary>

public static void GetCorrelation( double[] x,
                                   double[] y,
                                   ref double covXY,
                                   ref double pearson)
   if ( x.Length != y.Length )
      throw new Exception("Length of sources is different");
   double avgX = GetAverage( x );
   double stdevX = GetStdev( x );
   double avgY = GetAverage( y );
   double stdevY = GetStdev( y );
   int len = x.Length;

   for ( int i = 0; i < len; i++ )
      covXY += ( x[i] - avgX ) * ( y[i] - avgY );
   covXY /= len;
   pearson = covXY / ( stdevX * stdevY );

About the Author

Eran Aharonovich

Been a programmer since 1999. Experience in: .Net, C++, C#, VB, VB.NET, ASP, ASP.NET, DLLs, COM etc. www.Noviway.com Israel



  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds