Lectures 9-12

  • Autoencoders
  • Kmeans/DBSCAN
  • GMM Biclustering
  • Dimensionality Reduction

Autoencoders

Pretty simple, compress data set into a latent feature representation. More features = better representation
Topics:

  • Loss functions?
  • Benefits?
  • Outlier detection?
  • Variations beyond FFA?
  • Varietional AE?
  • Properties?

Clustering

Kmeans, hierarchical clustering, and DBSCAN, straightforward clustering algos
Topics:

  • Kmeans algo
  • Properties of kmeans, convergence, failure cases?
  • Finding number of clusters?
    • MDL, BIC, AIC, Hierarchical?
  • Types of hier clust
    • Agglomerative, divisive (bottom up vs top down)
  • Distance metrics
  • Density based clustering (DBSCAN)
  • How and why?
  • Kmeans Algo :(
    • Parameters? varying epsilon?
  • Hierarchical DBSCAN
    • What does it change
  • Comparison of algos
  • Similarity measures
  • Norms
  • Pearson corr (r)

GMMs (Gaussian Mixing Models)

Showing more clustering algos, GMMs, biclustering algos
Topics:

  • Approach of GMMs
    • Multiple components, soft clustering, expectation maximization (EM)
  • Component conditional density, mixture density, point membership, log likelihood
  • Algo to find the parameters:
    • Expectation step, maximization step
    • Limitations?
  • Biclusering, why?
  • Bisclustering, how?
  • Algos to find biclusters: MSR, Plaid Models, Spectral co-clustering, ITCC
  • Measuring accuracy of clustering result

Dimensionality Reduction

Showing PCA and tSNE
Topics:

  • PCA:
    • Why?
    • Pitfalls
  • PCA Algo
  • tSNE