Linear regression is a statistical method of modeling the relationship between the dependent variable Y and independent X by estimating the coefficients $b_0,...b_p$ of the linear form:

where each terms $x_i$ is a certain expression with the original independent variables ($X^{(1)}...X^{(k)}$). For example, it could be that $x_1 = X, x_2 = X^2$.

Least Squares Method

In least squares method, the coefficients of linear regression are selected in a way to minimize the sum of squared deviations between observations and their estimates:

Surface Fit Example

As an example we will take a certain bi-quadratic form

then add a small amount of noise, to simulate observed data, and try to reconstruct the coefficients using the least squares method.

inline:lsq_form.png

inline:lsq_data.png

inline:lsq_estm.png

'surface'plot X1;X2;FORM

'surface'plot X1;X2;DATA

'surface'plot X1;X2;COEF mp XMAT

   load 'plot'
   mp =: +/ . *

      'X1 X2' =: |: ,"0/~ i:8
      $XMAT   =: 1 , X1 , (X1^2) , X2 , (X1*X2) ,: (X2^2)
6 17 17
   
      FORM    =: 1   0     0.2     0.3   0    _0.4 mp XMAT
      FORM    -: 1 + (0.2*X1^2) + (0.3*X2) + (_0.4*X2^2)
1
   
      NOISE   =: 4 * _0.5 + ($X1) ?.@$ 0
      $DATA   =: FORM + NOISE
17 17   
         COEF  =: (,DATA) %. |:,"2 XMAT

Now we can compare the obtained coefficients with the original formula.

   0j4": COEF  ,: (,FORM) %. |:,"2 XMAT
1.0011 _0.0144 0.2005 0.3104 0.0024 _0.4013
1.0000  0.0000 0.2000 0.3000 0.0000 _0.4000

Additional regression analysis is provided in the 'stats' package.

   load 'stats'
   (|:}.,"2 XMAT) regression ,DATA

             Var.       Coeff.         S.E.           t  
              0        1.00105        0.12654        7.91
              1       _0.01444        0.01375       _1.05
              2        0.20052        0.00316       63.55
              3        0.31036        0.01375       22.56
              4        0.00241        0.00281        0.86
              5       _0.40131        0.00316     _127.17
                                                         
  Source     D.F.        S.S.          M.S.           F  
Regression    5    27192.76720     5438.55344     4144.49
Error       283      371.36300        1.31224            
Total       288    27564.13020                           
                                                         
S.E. of estimate         1.14553                         
Corr. coeff. squared     0.98653                         

The $R^2$ index shows high degree of match between the observations and their estimates.

See Also

Essays/Linear Regression (last edited 2008-12-08 10:45:29 by )