๐Ÿ“— -> 04/02/25: Machine Learning Basics


โ€” ML Basics

๐ŸŽค Vocab

โ— Unit and Larger Context

Intro to common ML techniques. Not deep learning yet

โœ’๏ธ -> Scratch Notes

Machine Learning

Computers learning from historical data, and using for future

Main objective

Learn a function mapping from historical data to predict:

  • Discrete class labels - classification, supervised
  • Continuous values - Regression, supervised
  • Cluster indicators - Clustering, unsupervised

Also have semi-supervised and reinforcement learning

  • is a placeholder, could be x, etc.
  • is learnable parameters
  • is hyper parameters

Describe architecture by depth (how many layers it passes through) and dimensions (number of parameters in a layer)

Supervised Learning

Training: learn a model with a training data set
Testing: test the model using unseen test data set
Evaluation: measure the model performance on testing set

A computer system is said to learn from data D to perform task T if after fitting the computer system with data D, it performs better on task T evaluated by metric M.

Assumptions: It is assumed that the distribution of training data instances is identical to the distribution of testing data instances (including future data instances, ones the model will see in application ex)

  • Not always true in reality. Violations of this assumptions will result in poor model performance

Training vs Test

Training is used to train and fit our model
Testing is used only to evaluate model

How to partition data?

  • Pre-defined (sometimes you are given split data, Kaggle contests ex.)
  • Random sampling (not stable)
    • Randomly split data into train and test (70% train, 30% test)
    • Not stable, different results everytime
  • Cross Validation (K fold CV)
    • Split the data into k partitions, with each partition/fold having a different train/test partition
    • If results are good, train a final model on full dataset and deploy
  • Leave one out (LOOCV)
    • The case of K folder where k=n.
    • Leave a single data point out, and repeat for the entire data set.
      Sometimes features of the data will define how you can split it.
  • Stock market data is time dependent, and you might use the oldest 75% for training, and the newest 25% for testing.
    • In this case, random sampling wouldnโ€™t make sense

Classification: SVM

SVM:

  • A linear classifier finding a hyperplane to separate two classes of data (i.e. positive vs negative)
  • A kernel function can allow the SVM to perform non linear substitution
  • Given training examples: where and is the class label:
    • SVM finds a linear function parameterized by and such that:

can be thought of as:

๐Ÿงช -> Refresh the Info

Did you generally find the overall content understandable or compelling or relevant or not, and why, or which aspects of the reading were most novel or challenging for you and which aspects were most familiar or straightforward?)

Did a specific aspect of the reading raise questions for you or relate to other ideas and findings youโ€™ve encountered, or are there other related issues you wish had been covered?)

Resources

  • Put useful links here

Connections

  • Link all related words