๐Ÿ“— -> 03/11/25: ECS124-L16


[Lecture Slide Link]

๐ŸŽค Vocab

โ— Unit and Larger Context

Small summary

โœ’๏ธ -> Scratch Notes

Which model - Classifiation is responsible for sequence

Transclude of ECS124-L16-2025-03-11-18.22.33.excalidraw

What is the most likely path that generated observation GGCATGAA
What is the prob of set path:

  • LLHHHLLL
  • GGCATGAA

  • .5 is the probability of moving into either state to start
  • Then, probability of selecting any letter
  • Then, probability of state transitioning * the probability of any letter in that state

Dynamic Probability: Alignment Recurrence

x[1...n-1]x(n)
y[1...m-1]y(m)
OPT(n,m) = 

GGCAT
OPT(T,5) = probability of letter T in position 5 printing from model H
= P_H(T)

  • In english:
    • The best path that takes you to A
      times
    • The highest probability of producing T from model H
// E = transition edge probabilities
OPT_L(C,3)=P_L(C) * max( 
	OPT_H(G,2) * E[E, L], 
	OPT_L(G,2) * E[L, L], 
)
12345
GGCAT
H->
L
  • Probability of up to (H,j) = (H,1)

Then, to calculate subsequent cells:

OPT(H,2)=prob(G in H) * max(
	OPT(H,1) * E[H,H],
	OPT(L,1) * E[L,H]
)
// Solving gives:
= (.3) * max(
	(.5)(.3) * (.5),
	(.5)(.2) * (.4)
)
// Max is the first option


OPT(L,2)=prob(G in L) * max(
	OPT(L,1) * E[L,L],
	OPT(H,1) * E[H,L]
)

topics?

Markov Chains

  1. Given a path and emissions, calc the prob the path geenrated the emissions
  2. Provide the most prob path given emissions, viterbi algo

Phylogeny

  1. Type characteristics, ultrametric tress, adaptive trees
  2. 2 algo for finding phylogenetic trees
    1. UPGMA
    2. Neighbor-joining -> u(cluster)

Suffix Tree / Data Structure

Genome assembly to find reference locations
or if searching for a gene in a genome:

  • If we have a long text, we can search for short patterns
  1. Helpful genome assembly for assembly w/ reference genome

Clustering

  1. K-means
  2. Heirarchical
    1. UPGMA
    2. Nearest Neighbor
Pros and Cons:

K-Means

  • Good for learning about data
  • First step to classification

Hierarchical

  • The hierarchical data is good for phylogeny

Supervised Learning

Good for classification
We learned about Naive Bayes

BLAST

Preprocessing
Running

๐Ÿงช -> Refresh the Info

Did you generally find the overall content understandable or compelling or relevant or not, and why, or which aspects of the reading were most novel or challenging for you and which aspects were most familiar or straightforward?)

Did a specific aspect of the reading raise questions for you or relate to other ideas and findings youโ€™ve encountered, or are there other related issues you wish had been covered?)

Resources

  • Put useful links here

Connections

  • Link all related words