# TIP: Statistics

### Introduction

A few weeks ago, I needed to find the correlation between two variables at my work. I began searching the net 'Net for the words "correlation" and "pearson" but couldn't find any decent piece of code.

Let me define the word correlation first:

"A correlation gives the strength of the relationship between variables" (mathworld.wolfram.com).

A normalized correlation is called the Pearson Value. The Pearson Value ranges between -1 to +1. A correlation of +1 means that there is a perfect positive linear relationship between the variables. If it's -1, there is a perfect negative relationship. 0 means no relationship at all.

To get the covariance and Pearson, you need to get a few things first.

### Implementation

#### Average

You sum up all values and divide the sum by the number of values.

```/// <summary>
/// Get average
/// </summary>

public static double GetAverage( double[] data )
{
int len = data.Length;
if ( len == 0 )
throw new Exception("No data");

double sum = 0;
for ( int i = 0; i < data.Length; i++ )
sum += data[i];
return sum / len;
}
```

### Variance & Standard Deviation

The variance is the squared differences from the average. The standard deviation is the square root of the variance.

```/// <summary>
/// Get variance
/// </summary>
public static double GetVariance( double[] data )
{
int len = data.Length;
// Get average
double avg = GetAverage( data );

double sum = 0;
for ( int i = 0; i < data.Length; i++ )
sum += Math.Pow( ( data[i] - avg ), 2 );
return sum / len;
}

/// <summary>
/// Get standard deviation
/// </summary>
public static double GetStdev( double[] data )
{
return Math.Sqrt( GetVariance( data ) );
}
```

### Covariance & Pearson

To calculate covariance, you need to get the average and standard deviation for each variable. You sum the multiplication of x - Avg(x) and y - Avg(y) and finally divide it by the length of the variables. To get the Pearson value, you divide the covariance by the multiplication of stDevX and stDevY.

```/// <summary>
/// Get correlation
/// </summary>

public static void GetCorrelation( double[] x,
double[] y,
ref double covXY,
ref double pearson)
{
if ( x.Length != y.Length )
throw new Exception("Length of sources is different");
double avgX = GetAverage( x );
double stdevX = GetStdev( x );
double avgY = GetAverage( y );
double stdevY = GetStdev( y );
int len = x.Length;

for ( int i = 0; i < len; i++ )
covXY += ( x[i] - avgX ) * ( y[i] - avgY );
covXY /= len;
pearson = covXY / ( stdevX * stdevY );
}
```

#### Eran Aharonovich

Been a programmer since 1999. Experience in: .Net, C++, C#, VB, VB.NET, ASP, ASP.NET, DLLs, COM etc. www.Noviway.com Israel