📗 -> 03/11/25: ECS124-L16

[Lecture Slide Link]

🎤 Vocab

❗ Unit and Larger Context

Small summary

✒️ -> Scratch Notes

Which model - Classifiation is responsible for sequence

Transclude of ECS124-L16-2025-03-11-18.22.33.excalidraw

What is the most likely path that generated observation GGCATGAA
What is the prob of set path:

LLHHHLLL
GGCATGAA

.5 is the probability of moving into either state to start
Then, probability of selecting any letter
Then, probability of state transitioning * the probability of any letter in that state

Dynamic Probability: Alignment Recurrence

x[1...n-1]x(n)
y[1...m-1]y(m)
OPT(n,m) = 

GGCAT
OPT(T,5) = probability of letter T in position 5 printing from model H
= P_H(T)

In english:
- The best path that takes you to A
  times
- The highest probability of producing T from model H

// E = transition edge probabilities
OPT_L(C,3)=P_L(C) * max( 
	OPT_H(G,2) * E[E, L], 
	OPT_L(G,2) * E[L, L], 
)

	1	2	3	4	5
	G	G	C	A	T
H		->
L

Probability of up to (H,j) = (H,1)

Then, to calculate subsequent cells:

OPT(H,2)=prob(G in H) * max(
	OPT(H,1) * E[H,H],
	OPT(L,1) * E[L,H]
)
// Solving gives:
= (.3) * max(
	(.5)(.3) * (.5),
	(.5)(.2) * (.4)
)
// Max is the first option


OPT(L,2)=prob(G in L) * max(
	OPT(L,1) * E[L,L],
	OPT(H,1) * E[H,L]
)

topics?

Markov Chains

Given a path and emissions, calc the prob the path geenrated the emissions
Provide the most prob path given emissions, viterbi algo

Phylogeny

Type characteristics, ultrametric tress, adaptive trees
2 algo for finding phylogenetic trees
1. UPGMA
2. Neighbor-joining -> u(cluster)

Suffix Tree / Data Structure

Genome assembly to find reference locations
or if searching for a gene in a genome:

If we have a long text, we can search for short patterns

Helpful genome assembly for assembly w/ reference genome

Clustering

K-means
Heirarchical
1. UPGMA
2. Nearest Neighbor

Pros and Cons:

K-Means

Good for learning about data
First step to classification

Hierarchical

The hierarchical data is good for phylogeny

Supervised Learning

Good for classification
We learned about Naive Bayes

BLAST

Preprocessing

Running

🧪 -> Refresh the Info

Did you generally find the overall content understandable or compelling or relevant or not, and why, or which aspects of the reading were most novel or challenging for you and which aspects were most familiar or straightforward?)

Did a specific aspect of the reading raise questions for you or relate to other ideas and findings you’ve encountered, or are there other related issues you wish had been covered?)

Vault

Explorer

ECS124-L16