Dataview:

list from [[]] and !outgoing([[]])

๐Ÿ“— -> Insert word

Medium Article

๐ŸŽค Vocab

  • Skewing ML analysis โ€” Distorting machine learning model performance and accuracy.
  • Isolation forest โ€” Algorithm isolating anomalies by randomly partitioning data.
  • Autoencoder โ€” Neural network compressing and reconstructing data to find anomalies.

โ— Information

Outlier detection is key to prevent data from getting biased or skewed.

Outliers can be abnormal data points, fraudulent transactions, faulty sensor readings, etc. Detecting outliers is important for data cleaning so as to avoid skewing ML analysis.

๐Ÿ“„ -> Methodology

Different methods and detection criteria exist

Statistical

  • Z-score: Measures SD away from mean on standardized data
  • IQR / Quantile Range:

ML:

  • Isolation Forests: Isolate anomalies. They build decision trees that partition data recursively, thus isolating outliers quicker, and with fewer partitions
  • One-class SVMs: Learn a boundary around normal data points. Samples outside are flagged as anomalies.
  • Autoencoders: Learn compressed representations of data. If they cannot be reconstructed well (high error), they have potential to be outliers.

โœ’๏ธ -> Usage

  • An EllipticEnvelope estimator is effective on Gaussian data:
    • from sklearn.covariance import EllipticEnvelope

๐Ÿงช-> Example

  • Define examples where it can be used
  • Link all related words