TLinearRegression class

This article gives the source code for a nice, clean implementation of linear regression by the linear, least-squares error algorithm.

(This article originally appeared in the March 17, 2000 issue

of The Unofficial Newsletter of

Delphi Users

TLinearRegression Class

This article gives the source code for a nice, clean implementation of linear regression by the linear, least-squares error algorithm.

The class provides properties returning the average for both the independent and dependent variables, as well as the variance, covariance, and correlation. A function method is provided to interpolate along the line defined by the computed slope and intercept.

The code is well documented, giving the basic theoretical outline and some notes as to what the various methods are doing.

An example project is also included.

{ ****************************************************************** }

{ }

{ Class implementing linear regression by linear-least-squares }

{ }

{ Copyright © 2000, Berend M. Tober. All rights reserved. }

{ Author's E-mail - mailto:[email protected] }

{ Other components at }

{ }

{ }

{ ****************************************************************** }

unit linregr;

{Linear Regression Class}


  Note Regarding the Problem Statement


  This class is used for finding the best-fit slope (m) and y-intercept (b), as in the equation for a straight line

    y := m*x + b

  in the case where you have more than two (x,y) data point pairs, i.e., you have (Xi, Yi) for i:=1,2,...,n, with n>2. In this situation you can write n equations:

    Y1 := m*X1 + b + E1

    Y2 := m*X2 + b + E2




    Yn := m*Xn + b + En

  giving a system of n linear equations that you need to solve for two unknowns, namely, m and b, where the Ei are error terms. The linear- least-squares error algorithm employed here finds the best m and b such that the sum of the squared error terms is minimized.

  The linear algebra behind the solution is to write an n-by-2 matrix A that has rows looking like [1, Xi] for i=1,..,n and a 2-by-1 vector u of unknowns with the intercept and slope, i.e., u = [b,m]', where (') indicates "take the transpose of". There's a bunch of other technical detail stuff like "finding an orthogonal basis for a sub-space", and "guaranteed unique solutions", etc., that I vaguely recall from my course in real vectors, but it boils down to writing

        e'e = (y - Au)'(y - Au) = |y - Au|^2

  to show the best solution is found by solving for u in the equation

        Au = y

  where, to summarize:

       y = [Y1,Y2,...,Yn]'

       u = [b,m]'

       A = [[1,X1]', [1,X2]',...[1,Xn]']'

  In the process of solving this you eventually get to the explicit solution

        u = inv(A'A)(A'y)

  where inv() means "take the inverse of". This is the equation implemented in this class. It calculates the statistics "on the fly", so it works for any number of two or more data points.

  This is all well-known, straight-forward stuff out of a textbook. I don't claim any credit for the math...I'm not inventing anything new here, just making a slick implementation.

  Note Regarding Exceptions


  This class does not raise exceptions: your controlling application must trap 'divide by zero' errors that can occur in

        function GetCorrelation

        function GetIntercept

        function GetSlope

  I didn't build in this exception handling because it would entail a USES clause. As it is, this unit has NO explicit dependencies on ANY other units, and I couldn't bear to destroy that purity!

  The 'divide by zero' circumstance can occur when accessing any of the derived, covariance-related properties if:

  1. you have accumulated less than two data points, or

  2. you have "degenerate" data, i.e., all the x's are the same (this corresponds to slope of infinity), in which case you will find that CovarianceXX will be zero (or within some small error range of zero).

  Note Regarding Normalizing Data


  It is recommeded that you normalize data in advance of applying this linear-least-squares algorithm, if you can.

  For purely multiplicative scaling, apply the following:

  If you scale data input as (x',y') := (Ax,By), then you must

    1) adjust computed intercept by b := b'/B, and

    2) adjust computed slope by m := m'*A/B





     N :Integer;

{ Count of number of data points accumulated }

     AvgX :Double;

{ Average of independent variables }

     AvgY :Double;

{ Average of dependent variables }

     AvgXX :Double;

{ Average of squared independent variables }

     AvgYY :Double;

{ Average of squared dependent variables }

     AvgXY :Double;

{ Average of cross product }





      function GetCorrelation:Double;

      function GetCovarianceXX:Double;

      function GetCovarianceYY:Double;

      function GetCovarianceXY:Double;

      function GetIntercept:Double;

      function GetSlope:Double;


      constructor Create;

      procedure Clear;

      function Add(X,Y:Double):Integer;

      function Interpolate(x:Double):Double;

      property AverageX:Double Read FRegressionRec.AvgX;

      property AverageY:Double read FRegressionRec.AvgY;

      property Count:Integer read FRegressionRec.N;

      property Correlation:Double Read GetCorrelation;

      property CovarianceXX:Double Read GetCovarianceXX;

      property CovarianceYY:Double Read GetCovarianceYY;

      property CovarianceXY:Double Read GetCovarianceXY;

      property Intercept:Double Read GetIntercept;

      property Slope:Double Read GetSlope;



function Accumulate


  N:Integer; {Sequence number of this new data point}

  a,{The existing average (of N-1 data points)}

  x:Double{The new data value to be included in running average}):Double;

{Returns the running average}



  Result:=a*Result + x/N;


constructor TLinearRegression.Create;


  inherited Create;



function TLinearRegression.Add(X,Y:Double):Integer;

  {This function 'accumulates' another (x,y) data point}


  with FRegressionRec DO



    AvgX := Accumulate(N,AvgX,X);

    AvgY := Accumulate(N,AvgY,Y);

    AvgXX := Accumulate(N,AvgXX,X*X);

    AvgYY := Accumulate(N,AvgYY,Y*Y);

    AvgXY := Accumulate(N,AvgXY,X*Y);




procedure TLinearRegression.Clear;


  with FRegressionRec do


    N := 0;

    AvgX := 0;

    AvgY := 0;

    AvgXX:= 0;

    AvgYY:= 0;

    AvgXY:= 0;



function TLinearRegression.GetCovarianceXX:Double;


  with FRegressionRec do

    Result := AvgXX-AvgX*AvgX;


function TLinearRegression.GetCovarianceYY:Double;


  with FRegressionRec do

    Result := AvgYY-AvgY*AvgY;


function TLinearRegression.GetCovarianceXY:Double;


    with FRegressionRec do

      Result := AvgXY-AvgX*AvgY;


function TLinearRegression.GetCorrelation:Double;

  var CovXX,CovYY:Double;


    with FRegressionRec do





        if CovXX = 0.0 then

          Result := 1.0

{Degenerate case with infinite slope}

        else if CovYY = 0.0 then

          Result := 1.0

{Data has zero slope}


          Result := GetCovarianceXY/Sqrt(CovXX*CovYY);



function TLinearRegression.GetIntercept:Double;


  {Note that if CovXX is zero, then no unique y-intercept exists}

  with FRegressionRec do



function TLinearRegression.GetSlope:Double;


  {Note that if CovXX is zero, then slope is infinite}

  with FRegressionRec do

    Result := GetCovarianceXY/GetCovarianceXX;


function TLinearRegression.Interpolate(x:Double):Double;


  with FRegressionRec do

    Result:=x*Slope + Intercept;



Sample project file follows:

program Example;



type TDataArray=Array[1..10,1..2]

of Real;
















  These are the "correct" results for the data set


  r=0.891, m=0.513, b=39.658



  ( {Zero slope}













  ( {Infinite slope}












procedure TestRegression(Data:TDataArray);

  var i:Word;


  with TLinearRegression.Create do


    for i:= 1 to 10 do






      writeln('Average of X''s',AverageX:12:3);

      writeln('Average of Y''s',AverageY:12:3);

      writeln('Std Dev of X''s',sqrt(CovarianceXX):12:3);

      writeln('Std Dev of Y''s',sqrt(CovarianceYY):12:3);








    writeln('This is what the correlation, slope and intercept values should be:');


    writeln('Sample results:');


    writeln(#13'Data with zero slope');


    writeln(#13'Degenerate data with infinite slope');




Share this article!

Follow us!

Find more helpful articles: