Correlation and regression using numpy

Blog Post created by Dan_Patterson Champion on Aug 17, 2014

Without any explanation...I just didn't want to forget it.  More to follow when I get the graphing stuff finished.  Served as a good opportunity to explore numpy in more detail.  No effort was made to simplify it down further.  I will be adding shapefile reading capabilities as well.  Just open it and run it...a simple data set is contained within.  I have just copied and pasted it here until the issues with python encoding and IE 11 are sorted out...sorry



Author:  Dan.Patterson@carleton.ca

  calculates the correlation coefficient and regression parameters for simple
  correlation using numpy
import numpy as np


def correlation(xs,ys):
  s_x = np.std(xs, ddof=1, dtype=np.float64)
  s_y = np.std(ys, ddof=1, dtype=np.float64)
  covar = np.cov(xs,ys)[0][1]
  r = covar/(s_x * s_y)
  return r


def regress1D(xs, ys, dim=1):
  '''simple first-order least squares regression in the form y=mx+b'''
  coeff = np.polyfit(xs,ys,dim)
  polynomial = np
  polynomial = np.poly1d(coeff)
  y_cal = polynomial(xs)
  return [coeff, y_cal]


if __name__ == "__main__":
  xs = [0,1,2,3,4,5,6,7,8,9,10]; ys = [0.2,0.7,2.7,2.6,4.1,5,5.7,7.3,7.9,9.1,9.8]
  r = correlation(xs,ys)
  coeff, y_cal = regress1D(xs,ys,1)
  text ='y = %.3fx + %.3f' % (coeff[0],coeff[1])
  print "\nPearson's r: ", r
  print "Equation (y=mx+b): ", text