๐ -> 03/11/25: ECS124-L16
[Lecture Slide Link]
๐ค Vocab
โ Unit and Larger Context
Small summary
โ๏ธ -> Scratch Notes
Which model - Classifiation is responsible for sequence
Transclude of ECS124-L16-2025-03-11-18.22.33.excalidraw
What is the most likely path that generated observation GGCATGAA
What is the prob of set path:
LLHHHLLLGGCATGAA
.5is the probability of moving into either state to start- Then, probability of selecting any letter
- Then, probability of state transitioning * the probability of any letter in that state
Dynamic Probability: Alignment Recurrence
x[1...n-1]x(n)
y[1...m-1]y(m)
OPT(n,m) =
GGCAT
OPT(T,5) = probability of letter T in position 5 printing from model H
= P_H(T)
- In english:
- The best path that takes you to A
times - The highest probability of producing T from model H
- The best path that takes you to A
// E = transition edge probabilities
OPT_L(C,3)=P_L(C) * max(
OPT_H(G,2) * E[E, L],
OPT_L(G,2) * E[L, L],
)
| 1 | 2 | 3 | 4 | 5 | |
|---|---|---|---|---|---|
| G | G | C | A | T | |
| H | -> | ||||
| L |
- Probability of up to (H,j) = (H,1)
Then, to calculate subsequent cells:
OPT(H,2)=prob(G in H) * max(
OPT(H,1) * E[H,H],
OPT(L,1) * E[L,H]
)
// Solving gives:
= (.3) * max(
(.5)(.3) * (.5),
(.5)(.2) * (.4)
)
// Max is the first option
OPT(L,2)=prob(G in L) * max(
OPT(L,1) * E[L,L],
OPT(H,1) * E[H,L]
)
topics?
Markov Chains
- Given a path and emissions, calc the prob the path geenrated the emissions
- Provide the most prob path given emissions, viterbi algo
Phylogeny
- Type characteristics, ultrametric tress, adaptive trees
- 2 algo for finding phylogenetic trees
- UPGMA
- Neighbor-joining -> u(cluster)
Suffix Tree / Data Structure
Genome assembly to find reference locations
or if searching for a gene in a genome:
- If we have a long text, we can search for short patterns
- Helpful genome assembly for assembly w/ reference genome
Clustering
- K-means
- Heirarchical
- UPGMA
- Nearest Neighbor
Pros and Cons:
K-Means
- Good for learning about data
- First step to classification
Hierarchical
- The hierarchical data is good for phylogeny
Supervised Learning
Good for classification
We learned about Naive Bayes
BLAST
Preprocessing
Running
๐งช -> Refresh the Info
Did you generally find the overall content understandable or compelling or relevant or not, and why, or which aspects of the reading were most novel or challenging for you and which aspects were most familiar or straightforward?)
Did a specific aspect of the reading raise questions for you or relate to other ideas and findings youโve encountered, or are there other related issues you wish had been covered?)
๐ -> Links
Resources
- Put useful links here
Connections
- Link all related words