๐ -> 04/02/25: Machine Learning Basics
๐ค Vocab
โ Unit and Larger Context
Intro to common ML techniques. Not deep learning yet
โ๏ธ -> Scratch Notes
Machine Learning
Computers learning from historical data, and using for future
Main objective
Learn a function mapping from historical data to predict:
- Discrete class labels - classification, supervised
- Continuous values - Regression, supervised
- Cluster indicators - Clustering, unsupervised
Also have semi-supervised and reinforcement learning
is a placeholder, could be x, etc. is learnable parameters is hyper parameters
Describe architecture by depth (how many layers it passes through) and dimensions (number of parameters in a layer)
Supervised Learning
Training: learn a model with a training data set
Testing: test the model using unseen test data set
Evaluation: measure the model performance on testing set
A computer system is said to learn from data D to perform task T if after fitting the computer system with data D, it performs better on task T evaluated by metric M.
Assumptions: It is assumed that the distribution of training data instances is identical to the distribution of testing data instances (including future data instances, ones the model will see in application ex)
- Not always true in reality. Violations of this assumptions will result in poor model performance
Training vs Test
Training is used to train and fit our model
Testing is used only to evaluate model
How to partition data?
- Pre-defined (sometimes you are given split data, Kaggle contests ex.)
- Random sampling (not stable)
- Randomly split data into train and test (70% train, 30% test)
- Not stable, different results everytime
- Cross Validation (K fold CV)
- Split the data into k partitions, with each partition/fold having a different train/test partition
- If results are good, train a final model on full dataset and deploy
- Leave one out (LOOCV)
- The case of K folder where k=n.
- Leave a single data point out, and repeat for the entire data set.
Sometimes features of the data will define how you can split it.
- Stock market data is time dependent, and you might use the oldest 75% for training, and the newest 25% for testing.
- In this case, random sampling wouldnโt make sense
Classification: SVM
SVM:
- A linear classifier finding a hyperplane to separate two classes of data (i.e. positive vs negative)
- A kernel function can allow the SVM to perform non linear substitution
- Given training examples:
where and is the class label: - SVM finds a linear function parameterized by
and such that:
- SVM finds a linear function parameterized by
can be thought of as:
๐งช -> Refresh the Info
Did you generally find the overall content understandable or compelling or relevant or not, and why, or which aspects of the reading were most novel or challenging for you and which aspects were most familiar or straightforward?)
Did a specific aspect of the reading raise questions for you or relate to other ideas and findings youโve encountered, or are there other related issues you wish had been covered?)
๐ -> Links
Resources
- Put useful links here
Connections
- Link all related words