---
title: "Properties of Least Squares"
author: "Rob"
date: "2/5/2018"
output:
  beamer_presentation:
    keep_tex: true
    toc: true
    slide_level: 2

fontsize: 10pt
  
linkcolor: blue

urlcolor: blue

header-includes:
   - \usepackage[]{graphicx}
   - \usepackage[]{color}
   - \usepackage{amsmath}
   - \usepackage{relsize}
   - \usepackage{algorithm2e}
   - \usepackage{animate}
   - \newcommand{\sko}{\vspace{.1in}}
   - \newcommand{\skoo}{\vspace{.2in}}
   - \newcommand{\skooo}{\vspace{.3in}}
   - \newcommand{\rd}[1]{\textcolor{red}{#1}}
   - \newcommand{\bl}[1]{\textcolor{blue}{#1}}
   - \newcommand{\C}{\; | \;}
   - \newcommand{\tbf}[1]{\textbf{\texttt{#1}}}
   - \newcommand{\ird}[1]{\textit{\textcolor{red}{#1}}}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(dev = 'pdf')
library(BART)
```


<!-- Section: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  -->
# The Data
***

Let's read in some data that we will use to illustrate properties of
least squares regression.  

\scriptsize
```{r rdat,include=TRUE,echo=TRUE}
yx = read.csv("sim-reg-data.csv")
print(summary(yx))
```
\normalsize

So we have a y and 5 x variables.  

<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  -->
***

Let's go ahead and run the regression:

\scriptsize
```{r rreg,include=TRUE,echo=TRUE}
lmf= lm(y~.,yx)
print(summary(lmf))
```
\normalsize

What do *most people think the stars mean * ???

<!-- Section: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  -->
# Train/Test Loop
***

Let's see if x1 and x2 are the *best* x's.  

We will compare that subset to the subset which is just x3.

\scriptsize
```{r ttloop,include=TRUE,echo=TRUE,results='hide',cache=TRUE}
n=nrow(yx)
nd = 100
set.seed(99)
rmse =function(y,yhat) {sqrt(mean((y-yhat)^2))}
ntrain = floor(n*.75)
resM = matrix(0.0,nd,2)
for(i in 1:nd) {
  print(i)
  ii = sample(1:n,ntrain)
  dftrain= yx[ii,]; dftest = yx[-ii,]
  
  lm12 = lm(y~x1+x2,dftrain)
  resM[i,1] = rmse(dftest$y,predict(lm12,dftest))
  
  lm3 = lm(y~x3,dftrain)
  resM[i,2] = rmse(dftest$y,predict(lm3,dftest))
}
```
\normalsize

<!-- slide: %%%%%%%%%%%%%%%%%% -->
***

Ok, now let's use boxplots to look at the columns of resM.  

\scriptsize 
```{r plrMat,include=TRUE,echo=TRUE,out.width='60%',fig.align='center',dependson='ttloop'}
colnames(resM)= c("x12","x3")
boxplot(resM)
```
\normalsize




