📗 -> 06/04/25: ECS189G-L27

Graph Bert
Final Overview

🎤 Vocab

❗ Unit and Larger Context

Summary of Sec_16 Graph Bert

SGC operator and its problem

What is suspended animation problem?
GResNet with Graph Residual Learning
Graph residual terms in GResNet
GResNet architecture and Performance
Graph-Bert
Graph-Bert overall architecture
Sub-graph batching with PageRank
Positional embedding
Graph-Bert pre-training and ne-tuning

✒️ -> Scratch Notes

Starts from “A dive into graph-bert model architecture”

Step 2: Initial Embeddings for any Node in subgraph

Step 3: Graph-Transformer Based Encoder for Subgraph Representation Learning

Step 4: Subgraph Representation Fusion as Target Node Final Representation

For each node in graph, define a corresponding embedding vector . This is the final representation learned by model

Similar to GCN architecture

Subgraph sample:

Minibatch

Step 5: Pre-Training, Transfer, Fine-Tuning

Unsupervised Pre-training

Node Raw attribute reconstruction (predicting ground truth with predictions from representation)
Graph structure recovery

Fine-Tuning

Node classification - cross entropy loss for classifcation for prediction
Graph clustering - Distance from mean of cluster to node representation

Moving onto Final Review Slides

Final Exam

Take home, Sat+Sun (48 hrs)
Open everything, but individual

25% of grade, cumulative

1 calc question for transformer:

Given tables (x, etc.), calculate resulting Z

Topics Review

RNN & LSTM

Sequential data
RNN specifically, architecture
Gradient exploding/vanishing problem
LSTM

NLP & Word Embedding

NLP history
Word embeddings - specifically word2vec
text generation with RNN

Transformer & BERT

Attention
Transformer
BERT

GNN

Graph neural network intro
SGC operator and GCN
GAT
GNN Application

GNN Applications & Network Embedding

Heterogenous information networks (HIN)
GNN for heterogenous information network embedding
GNN for recommender system learning

GResNet & Graph-Bert

SGC operator and its problem
GResNet with graph REsidual LEarning
Graph-bert

What the course did and didnt

Covered:

architecture
Motivation
Didn’t cover:
“tricks”
- dropout
- weight decay
- batchnorm
- Xavier initialization
- Data augmentation
- etc..

Suggestions

CS changes quickly
Find an interest, an study hard on it. Self-study even
Open mind, but not swept away by hype.

Covered

Existing DL Models/Algorithms & their proposed year

Back-propagation Algorithm (1974),
- GD, SGD, Mini-Batch SGD
- Momentum (1999), Adagrad (2011), Adam (2014)
Perceptron (1958), Multi-Layer Perceptron (1974)
Auto-Encoder (1986), VAE (2014)
CNN (1998), ResNet (2015)
GAN (2014)
RNN (1986), Bi-RNN (1997)
LSTM (1997), GRU (2014)
Skip-gram (2013), CBOW (2013)
Transformer (2017), BERT (2019)
GCN (2017), GAT (2018)
RGCN (2017), BPR (2009)
GResNet (2019), Graph-Bert (2020)

🧪 -> Refresh the Info

Did you generally find the overall content understandable or compelling or relevant or not, and why, or which aspects of the reading were most novel or challenging for you and which aspects were most familiar or straightforward?)

Did a specific aspect of the reading raise questions for you or relate to other ideas and findings you’ve encountered, or are there other related issues you wish had been covered?)

Vault

Explorer

ECS189G-L27