📗 -> 06/04/25: ECS189G-L27


Graph Bert
Final Overview

🎤 Vocab

❗ Unit and Larger Context

Summary of Sec_16 Graph Bert

SGC operator and its problem

  • What is suspended animation problem?
    GResNet with Graph Residual Learning
  • Graph residual terms in GResNet
  • GResNet architecture and Performance
    Graph-Bert
  • Graph-Bert overall architecture
  • Sub-graph batching with PageRank
  • Positional embedding
  • Graph-Bert pre-training and ne-tuning

✒️ -> Scratch Notes

Starts from “A dive into graph-bert model architecture”

Step 2: Initial Embeddings for any Node in subgraph

Step 3: Graph-Transformer Based Encoder for Subgraph Representation Learning

Step 4: Subgraph Representation Fusion as Target Node Final Representation

For each node in graph, define a corresponding embedding vector . This is the final representation learned by model

  • Similar to GCN architecture

Subgraph sample:

  • Minibatch

Step 5: Pre-Training, Transfer, Fine-Tuning

Unsupervised Pre-training
  • Node Raw attribute reconstruction (predicting ground truth with predictions from representation)
  • Graph structure recovery
Fine-Tuning
  • Node classification - cross entropy loss for classifcation for prediction
  • Graph clustering - Distance from mean of cluster to node representation

Moving onto Final Review Slides

Final Exam

Take home, Sat+Sun (48 hrs)
Open everything, but individual

25% of grade, cumulative

1 calc question for transformer:

  • Given tables (x, etc.), calculate resulting Z

Topics Review

RNN & LSTM

Sequential data
RNN specifically, architecture
Gradient exploding/vanishing problem
LSTM

NLP & Word Embedding

NLP history
Word embeddings - specifically word2vec
text generation with RNN

Transformer & BERT

Attention
Transformer
BERT

GNN

Graph neural network intro
SGC operator and GCN
GAT
GNN Application

GNN Applications & Network Embedding

Heterogenous information networks (HIN)
GNN for heterogenous information network embedding
GNN for recommender system learning

GResNet & Graph-Bert

SGC operator and its problem
GResNet with graph REsidual LEarning
Graph-bert

What the course did and didnt

Covered:

  • architecture
  • Motivation
    Didn’t cover:
  • “tricks”
    • dropout
    • weight decay
    • batchnorm
    • Xavier initialization
    • Data augmentation
    • etc..

Suggestions

CS changes quickly
Find an interest, an study hard on it. Self-study even
Open mind, but not swept away by hype.

Covered

Existing DL Models/Algorithms & their proposed year

  • Back-propagation Algorithm (1974),
    • GD, SGD, Mini-Batch SGD
    • Momentum (1999), Adagrad (2011), Adam (2014)
  • Perceptron (1958), Multi-Layer Perceptron (1974)
  • Auto-Encoder (1986), VAE (2014)
  • CNN (1998), ResNet (2015)
  • GAN (2014)
  • RNN (1986), Bi-RNN (1997)
  • LSTM (1997), GRU (2014)
  • Skip-gram (2013), CBOW (2013)
  • Transformer (2017), BERT (2019)
  • GCN (2017), GAT (2018)
  • RGCN (2017), BPR (2009)
  • GResNet (2019), Graph-Bert (2020)

🧪 -> Refresh the Info

Did you generally find the overall content understandable or compelling or relevant or not, and why, or which aspects of the reading were most novel or challenging for you and which aspects were most familiar or straightforward?)

Did a specific aspect of the reading raise questions for you or relate to other ideas and findings you’ve encountered, or are there other related issues you wish had been covered?)

Resources

  • Put useful links here

Connections

  • Link all related words