Dataview:
list from [[]] and !outgoing([[]])๐ -> Insert word
๐ค Vocab
- Skewing ML analysis โ Distorting machine learning model performance and accuracy.
- Isolation forest โ Algorithm isolating anomalies by randomly partitioning data.
- Autoencoder โ Neural network compressing and reconstructing data to find anomalies.
โ Information
Outlier detection is key to prevent data from getting biased or skewed.
Outliers can be abnormal data points, fraudulent transactions, faulty sensor readings, etc. Detecting outliers is important for data cleaning so as to avoid skewing ML analysis.
๐ -> Methodology
Different methods and detection criteria exist
Statistical
- Z-score: Measures SD away from mean on standardized data
- IQR / Quantile Range:
ML:
- Isolation Forests: Isolate anomalies. They build decision trees that partition data recursively, thus isolating outliers quicker, and with fewer partitions
- One-class SVMs: Learn a boundary around normal data points. Samples outside are flagged as anomalies.
- Autoencoders: Learn compressed representations of data. If they cannot be reconstructed well (high error), they have potential to be outliers.
โ๏ธ -> Usage
- An EllipticEnvelope estimator is effective on Gaussian data:
from sklearn.covariance import EllipticEnvelope
๐งช-> Example
- Define examples where it can be used
๐ -> Related Word
- Link all related words