Islr | Seungju Chae

Introduction to Statistical Learning with R

Seungju Chae

January 27, 2025

1 Introduction
1.1 Statistical Learning
  1.1.1 A Brief History of Statistical Learning
  1.1.2 What is Statistical Learning?
  1.1.3 Why estimate ?f?
  1.1.4 How we estimate f?

Chapter 1
Introduction

1.1 Statistical Learning

1.1.1 A Brief History of Statistical Learning

At the beginning of the 19th century, Legendre and Gauss published papers on the method of least squares, which implemented the earliest form of linear regression.

Linear regression is used for predicting quantitative values, such as an individual’s salary............... In order to predict qualitative values, such as whether a patient survives or dies, or whether the stock market increases or decreases, Fisher proposed linear discriminant analysis in 1936

In the 1940s, various authors put forth an alternative approach, logistic regression. In the early 1970s, Nelder and Wedderburn coined the term generalized linear models for an entire class of statistical learning methods that include both linear and logistic regression as special cases

In 1980s, Breiman, Friedman, Olshen and Stone introduced classification.and regression trees, and were among the first to demonstrate the power of a detailed practical implementation of a method, including cross-validation for model selection

1.1.2 What is Statistical Learning?

Statistical Learning is a set of tools for understanding datas, either supervised or unsuprvised

Y = f(X)+ ϵ.

Let X be a set of input variables(x₁,x₂,…,x_p) and Let Y be a quantitative response and ϵ to be a random error term. The function f can take multiple inputs.

1.1.3 Why estimate f?

Prediction: Trying to reduce reducible errors but not irreducible errors by optimization like GD in order to improve its accuracy.
- $ˆY = fˆ(X ),$
- In many cases, a set of Inputs X is given.
- represents estimate for f
- Ŷ represents estimate for Y
- The plot displays income as a function of years of education and seniority in the Income data set. The blue surface represents the true underlying relationship between income and years of education and seniority
- The accuracy of Ŷ depends on reducible errors, from f and irreducible errors, ϵ
Inference: Studying relationships between dependent variable and indepedent variables.
- Interested in how Y is affected as x₁, x₂,…,x_p changes Understanding the relationship between X and Y
- Here, f can not be treated as black box as we need to understand what happens inside so that it affects Y
- Which predictors are associated with the response? It is often the case that only a small fraction of the available predictors are substantially associated with Y . Identifying the few important predictors among a large set of possible variables can be extremely useful.
- What is the relationship between the response and each predictor? Some predictors may have a positive relationship with Y while other predictors may have the opposite relationship.

1.1.4 How we estimate f?

Through

Linear approaches
Non-linear approaches

Using

A set of n different data points(training datas/observations) which we will use to train or teach our methods how to estimate f.

With the goal of

Applying different approaches/methods to the training data in order to estimate the f function.
In other words, we would like to find that is Y ≈ (X)

Most statistical methods can be characterised by

Parametric methods
- We can make an assumption about the f shape
- e.g Linear form:
  $f(X) = β0 + β1X1 +β2X2 + ⋅⋅⋅+ βpXp.$
  . Here we estimate the parameters β₀,β₁,β₂…
- After model is selected, we need to train mode we need to train model Find the real value of these parameters such that:
  $Y ≈ β + β X + β X + ⋅⋅⋅+ β X . 0 1 1 2 2 p p$
- For this linear form, we can use (ordinary) least squares but there are many ways to fit the linear model.
- This approach reduces the problem of estimating f down to one of estimating a set of parameters Simplify the problem and easier
- Disadvantage is that the accuracy of might not be good enough.
- Example:
  $income ≈ β0 + β1 × education+ β2 × seniority.$
  Since we have assumed a linear relationship between the response and the two predictors, the entire fitting problem reduces to estimating β₀, β₁, and β₂, which we do using least squares linear regression.
- The first picture is f and 2nd picture is our estimate with least square in parametric method
- The true f has some curvatures but does not have. However, It is good at capturing the positive relationship between years of education and income, as well as the slightly less positive relationship between seniority and income
- It may be the best we can do with a small number of observations
Non-parametric methods