## R: Matrix Algebra v. Regression Output

In order to do matrix algebra in R, you have to use what sometimes seems to be awkward notation. Assuming matrix A and B are square and conformable, in order to:

multiply using A %*% B

transpose using t(A)

inverse by solve(A)

There are many good primers on the internet to learn R matrix notation.^[http://www.r-tutor.com/r-introduction/matrix]

A=matrix(c(4,8,6,13),2,2)
A

##      [,1] [,2]
## [1,]    4    6
## [2,]    8   13

B=matrix(c(13/4,-2,-3/2,1),2,2)
B

##       [,1] [,2]
## [1,]  3.25 -1.5
## [2,] -2.00  1.0

A+B

##      [,1] [,2]
## [1,] 7.25  4.5
## [2,] 6.00 14.0

ab = A %*% B
ab

##      [,1] [,2]
## [1,]    1    0
## [2,]    0    1

t(ab)

##      [,1] [,2]
## [1,]    1    0
## [2,]    0    1

btat = t(B) %*% t(A)
# Notice that (AB)' = B'A' =
C=matrix(c(1,3,4,2),2,2)
C

##      [,1] [,2]
## [1,]    1    4
## [2,]    3    2

Cinv=solve(C)
Cinv

##      [,1] [,2]
## [1,] -0.2  0.4
## [2,]  0.3 -0.1


You can copy the R code here, and put it into your own code. First time through, you would need to install packages by removing the “#” on each line, and only run this code if you have not installed the packages before. Then you need to attach all the “libaries” that you just downloaded and installed.

options(repos = "https://cran.cnr.berkeley.edu/")
# install.packages("devtools")
# install.packages("dplyr")
# install_github("JustinMShea/wooldridge")
# install.packages("backports")
library(devtools)

## Warning: package 'devtools' was built under R version 3.5.3

library(dplyr)

## Warning: package 'dplyr' was built under R version 3.5.3

##
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
##
##     filter, lag

## The following objects are masked from 'package:base':
##
##     intersect, setdiff, setequal, union

library(wooldridge)
library(backports)

## Warning: package 'readxl' was built under R version 3.5.3


## A Wooldridge Example

Using R’s Matrix language,, and the “Wooldridge” package, we download the dataset “wage2” and get our results by using the solution to b = (XX)−1(Xy):

yX=na.omit(data.matrix(wage2))
y=data.matrix(yX[,c(1)])
const=rep(1,nrow(y))
sibs=data.matrix(yX[,c(13)])
meduc=data.matrix(yX[,c(15)])
feduc=data.matrix(yX[,c(16)])
X=cbind(const, sibs, meduc, feduc)
b=solve(t(X)%*%X)%*%(t(X)%*%y)
b

##            [,1]
## const 624.04105
##       -12.81975
##        16.72589
##        21.39708


Note that we get the same output using R’s “lm” function which is code for ‘linear model’

model <- lm(y ~ sibs + meduc + feduc)
summary(model)

##
## Call:
## lm(formula = y ~ sibs + meduc + feduc)
##
## Residuals:
##    Min     1Q Median     3Q    Max
## -847.6 -269.2  -43.7  198.0 2022.1
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  624.041     72.638   8.591  < 2e-16 ***
## sibs         -12.820      7.039  -1.821  0.06903 .
## meduc         16.726      6.716   2.490  0.01301 *
## feduc         21.397      5.658   3.782  0.00017 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 390.7 on 659 degrees of freedom
## Multiple R-squared:  0.08065,    Adjusted R-squared:  0.07646
## F-statistic: 19.27 on 3 and 659 DF,  p-value: 5.494e-12


## A Studenmund Example

In this example, we download a file from my website called “FINAID.xlsx”, and put the data in matrix form. We again get our results by using the solution to b = (XX)−1(Xy):

url1<-'https://aneveu.com/data/FINAID.xlsx'
p1f <- tempfile()
FINAID <-read_excel(path = p1f, sheet = 1)
# View(FINAID)

ones <- rep(1,nrow(FINAID))
rhs <- as.matrix(cbind(ones,FINAID[1:nrow(FINAID),3:4]))
lhs <- as.matrix(FINAID[1:nrow(FINAID),2])
Y=lhs;
X=rhs;
b=solve(t(X)%*%X)%*%(t(X)%*%Y)
b

##              FINAID
## ones   8926.9290669
## PARENT   -0.3567721
## HSRANK   87.3781524


Note that we get the same coefficeints using R’s “lm” function which is code for ‘linear model’

fa=data.matrix(FINAID[,2])
parent=data.matrix(FINAID[,3])
hsrank=data.matrix(FINAID[,4])
model <- lm(fa ~ parent + hsrank)
summary(model)

##
## Call:
## lm(formula = fa ~ parent + hsrank)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -6353.1 -1905.9   280.9  1942.5  6675.5
##
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8926.92907 1739.08253   5.133 5.36e-06 ***
## parent        -0.35677    0.03169 -11.260 6.06e-15 ***
## hsrank        87.37815   20.67413   4.226 0.000108 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2771 on 47 degrees of freedom
## Multiple R-squared:  0.7441, Adjusted R-squared:  0.7332
## F-statistic: 68.33 on 2 and 47 DF,  p-value: 1.229e-14