R: Matrix Algebra v. Regression Output
In order to do matrix algebra in R, you have to use what sometimes seems to be awkward notation. Assuming matrix A and B are square and conformable, in order to:
multiply using A %*% B
transpose using t(A)
inverse by solve(A)
There are many good primers on the internet to learn R matrix notation.^[http://www.r-tutor.com/r-introduction/matrix]
A=matrix(c(4,8,6,13),2,2)
A
## [,1] [,2]
## [1,] 4 6
## [2,] 8 13
B=matrix(c(13/4,-2,-3/2,1),2,2)
B
## [,1] [,2]
## [1,] 3.25 -1.5
## [2,] -2.00 1.0
A+B
## [,1] [,2]
## [1,] 7.25 4.5
## [2,] 6.00 14.0
ab = A %*% B
ab
## [,1] [,2]
## [1,] 1 0
## [2,] 0 1
t(ab)
## [,1] [,2]
## [1,] 1 0
## [2,] 0 1
btat = t(B) %*% t(A)
# Notice that (AB)' = B'A' =
C=matrix(c(1,3,4,2),2,2)
C
## [,1] [,2]
## [1,] 1 4
## [2,] 3 2
Cinv=solve(C)
Cinv
## [,1] [,2]
## [1,] -0.2 0.4
## [2,] 0.3 -0.1
You can copy the R code here, and put it into your own code. First time through, you would need to install packages by removing the “#” on each line, and only run this code if you have not installed the packages before. Then you need to attach all the “libaries” that you just downloaded and installed.
options(repos = "https://cran.cnr.berkeley.edu/")
# install.packages("devtools")
# install.packages("dplyr")
# install_github("JustinMShea/wooldridge")
# install.packages("backports")
library(devtools)
## Warning: package 'devtools' was built under R version 3.5.3
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.5.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(wooldridge)
library(backports)
library(readxl)
## Warning: package 'readxl' was built under R version 3.5.3
A Wooldridge Example
Using R’s Matrix language,, and the “Wooldridge” package, we download the dataset “wage2” and get our results by using the solution to b = (X′X)−1(X′y):
yX=na.omit(data.matrix(wage2))
y=data.matrix(yX[,c(1)])
const=rep(1,nrow(y))
sibs=data.matrix(yX[,c(13)])
meduc=data.matrix(yX[,c(15)])
feduc=data.matrix(yX[,c(16)])
X=cbind(const, sibs, meduc, feduc)
b=solve(t(X)%*%X)%*%(t(X)%*%y)
b
## [,1]
## const 624.04105
## -12.81975
## 16.72589
## 21.39708
Note that we get the same output using R’s “lm” function which is code for ‘linear model’
model <- lm(y ~ sibs + meduc + feduc)
summary(model)
##
## Call:
## lm(formula = y ~ sibs + meduc + feduc)
##
## Residuals:
## Min 1Q Median 3Q Max
## -847.6 -269.2 -43.7 198.0 2022.1
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 624.041 72.638 8.591 < 2e-16 ***
## sibs -12.820 7.039 -1.821 0.06903 .
## meduc 16.726 6.716 2.490 0.01301 *
## feduc 21.397 5.658 3.782 0.00017 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 390.7 on 659 degrees of freedom
## Multiple R-squared: 0.08065, Adjusted R-squared: 0.07646
## F-statistic: 19.27 on 3 and 659 DF, p-value: 5.494e-12
A Studenmund Example
In this example, we download a file from my website called “FINAID.xlsx”, and put the data in matrix form. We again get our results by using the solution to b = (X′X)−1(X′y):
url1<-'https://aneveu.com/data/FINAID.xlsx'
p1f <- tempfile()
download.file(url1, p1f, mode="wb");
FINAID <-read_excel(path = p1f, sheet = 1)
# View(FINAID)
ones <- rep(1,nrow(FINAID))
rhs <- as.matrix(cbind(ones,FINAID[1:nrow(FINAID),3:4]))
lhs <- as.matrix(FINAID[1:nrow(FINAID),2])
Y=lhs;
X=rhs;
b=solve(t(X)%*%X)%*%(t(X)%*%Y)
b
## FINAID
## ones 8926.9290669
## PARENT -0.3567721
## HSRANK 87.3781524
Note that we get the same coefficeints using R’s “lm” function which is code for ‘linear model’
fa=data.matrix(FINAID[,2])
parent=data.matrix(FINAID[,3])
hsrank=data.matrix(FINAID[,4])
model <- lm(fa ~ parent + hsrank)
summary(model)
##
## Call:
## lm(formula = fa ~ parent + hsrank)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6353.1 -1905.9 280.9 1942.5 6675.5
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8926.92907 1739.08253 5.133 5.36e-06 ***
## parent -0.35677 0.03169 -11.260 6.06e-15 ***
## hsrank 87.37815 20.67413 4.226 0.000108 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2771 on 47 degrees of freedom
## Multiple R-squared: 0.7441, Adjusted R-squared: 0.7332
## F-statistic: 68.33 on 2 and 47 DF, p-value: 1.229e-14