Review Seshes

Content Review:

  • Suffix Trees
  • Optimality Recurrence
  • BLAST Hash Tables
  • Additive Matrix
  • UPGMA

Content Review:

  • MT 2
    • 5b part 3
    • part 2d
  • Writing pseudo code?
  • Light perl scripting?

ECS124-Final-Prep 2025-03-17 16.03.31.excalidraw

Content

Labs:
  1. 1
  2. 2
  3. 3
  4. 4
Midterm 1

Sequence alignment

  • Global / Local / Semi-global
    • Global: Needleman–Wunsch algorithm
    • Local: Smith-Waterman
    • Sem
    • Biological uses for each one
  • Longest common substring and subsequence
    • How do LCS and global alignment relate?
      Blast
      Recurrence relations
      Runtime analysis (Big-O)
      Writing code - Perl and Python
  • Working with files, opening/writing/reading
  • Regex, matching and replacing
Midterm 2

Clustering and Trees
Bayes Method
K-means
De Bruin Graph

  • Genome assembly?
    Suffix Trees
  • Common sequences between paths
Post Midterm 2

Hierarchical Clustering

  • UPGMA
  • Nearest neighbor
  • General probability of a path and model combo.
  • Viterbi algo

Ultrametric Tree

  • A tree where all leaves are equidistant from the root

  • In an ultrametric tree, the two longest distances among any three points must be identical.

    • There is no unique maximum, it will repeat
  • Matrix M is only ultrametric if the maximum value in any 3 rows is not unique
    UPGMA - Unweighted Pair Group Method with Arithmetic Mean

  • Go through a matrix combining rows/columns as you combine nodes (species/taxa)

  • Combines the ‘closest’ neighbors iteratively

    • A and B make node AB combined
  • With an ultrametric matrix, it will create a tree that is ultrametric and additive

  • Additive trees

  • Jukes Cantor

Nearest Neighbor?
Neighbor Joining

Hidden Markov Models
Viterbi

  • The viterbi algorithm is a Dynamic Programming algorithm that uses a hidden markov model and its transitions to find the most likely path to have produced emissions. It iteratively goes through the most likely thing to have emitted a character at every stage, and finds probability for both models
  • Uses the max probability of arriving to a state, factoring in the immediately preceding state. Linear time, DP, very cool!
ABCD
A854
B73
C2
D
Nearest joining:

r(i) = d(A,C) + d(B,C) + d(C,C) + d(D,C)

  • 6+7+0+2 = 15 / 2 = 7.5
    r(j) = d(A,D) + d(B,D) + d(C,D) + d(D,D)
  • 4+3+2 + 0 = 9 / 2 = 4.5
    D(C,D) = 2 - 7.5 - 4.5 = -10
UPGMA
ABCD
A0246
B046
C06
D0

Heigh of AB is the average of their cluster distance

dist(AB,C) = ( D(A,C) + D(B,C) ) / 2 = 4
dist(AB,D) = ( D(A,D) + D(B,D) ) / 2 = 6

ABCD
AB146
C06
D0
D(ABC, D) = ( D(AB, D) + D(C, D) ) / 2 = 6
  • Distance between old node (AB) and new node (C) to D
ABCD
ABC26
D0

  • Makes this tree at the end.
  • Final height is 3