Introduction to Statistical Learning
with R
Seungju Chae
January 27, 2025
At the beginning of the 19th century, Legendre and Gauss published papers on the method of least squares, which implemented the earliest form of linear regression.
Linear regression is used for predicting quantitative values, such as an individual’s salary............... In order to predict qualitative values, such as whether a patient survives or dies, or whether the stock market increases or decreases, Fisher proposed linear discriminant analysis in 1936
In the 1940s, various authors put forth an alternative approach, logistic regression. In the early 1970s, Nelder and Wedderburn coined the term generalized linear models for an entire class of statistical learning methods that include both linear and logistic regression as special cases
In 1980s, Breiman, Friedman, Olshen and Stone introduced classification.and regression trees, and were among the first to demonstrate the power of a detailed practical implementation of a method, including cross-validation for model selection
Statistical Learning is a set of tools for understanding datas, either supervised or unsuprvised

Let X be a set of input variables(x1,x2,…,xp) and Let Y be a quantitative response and ϵ to be a random error term. The function f can take multiple inputs.
Prediction: Trying to reduce reducible errors but not irreducible errors by optimization like GD in order to improve its accuracy.

In many cases, a set of Inputs X is given.
represents estimate for f
Ŷ represents estimate for Y
The plot displays income as a function of years of education and seniority in the Income data set. The blue surface represents the true underlying relationship between income and years of education and seniority
The accuracy of Ŷ depends on reducible errors, from f and irreducible errors, ϵ
Inference: Studying relationships between dependent variable and indepedent variables.
Interested in how Y is affected as x1, x2,…,xp changes
Understanding the relationship between X and Y
Here, f can not be treated as black box as we need to understand
what happens inside
so that it affects Y
Which predictors are associated with the response? It is often the case that only a small fraction of the available predictors are substantially associated with Y . Identifying the few important predictors among a large set of possible variables can be extremely useful.
What is the relationship between the response and each predictor? Some predictors may have a positive relationship with Y while other predictors may have the opposite relationship.
Through
Linear approaches
Non-linear approaches
Using
A set of n different data points(training datas/observations) which we will use to train or teach our methods how to estimate f.
With the goal of
Applying different approaches/methods to the training data in order to estimate the f function.
In other words, we would like to find
that is Y ≈
(X)
Most statistical methods can be characterised by
Parametric methods
We can make an assumption about the f shape
e.g Linear form:

. Here we estimate the parameters β0,β1,β2…
After model is selected, we need to train mode we need to train model
Find the real value of these parameters such that:

For this linear form, we can use (ordinary) least squares but there are many ways to fit the linear model.
This approach reduces the problem of estimating f down to one of
estimating a set of parameters
Simplify the problem and easier
Disadvantage is that the accuracy of
might not be good enough.
Example:

Since we have assumed a linear relationship between the response and the two predictors, the entire fitting problem reduces to estimating β0, β1, and β2, which we do using least squares linear regression.

The first picture is f and 2nd picture is our estimate
with least
square in parametric method
The true f has some curvatures but
does not have. However,
It is good at capturing the positive relationship between years of
education and income, as well as the slightly less positive relationship
between seniority and income
It may be the best we can do with a small number of observations
Non-parametric methods