CourseNana | 1DT038 Computer Architecture 2024 Spring Final Project Part 2: ViT (Vision Transformer)

Computer Architecture 2024 Spring CourseNana.COM

Final Project Part 2 CourseNana.COM

Overview CourseNana.COM

Tutorial CourseNana.COM

● Gem5 Introduction CourseNana.COM
● Environment Setup CourseNana.COM

Projects CourseNana.COM

● Part 1 (5%) CourseNana.COM

○ Write C++ program to analyze the specification of L1 data cache. ● Part 2 (5%) CourseNana.COM

○ Given the hardware specifications, try to get the best performance for more complicated program. CourseNana.COM

2 CourseNana.COM

Project 2 CourseNana.COM

3 CourseNana.COM

Description CourseNana.COM

In this project, we will use a two-level cache computer system. Your task is to write a ViT(Vision Transformer) in C++ and optimize it. You can see more details of the system specification on the next page. CourseNana.COM

4 CourseNana.COM

System Specifications CourseNana.COM

● ISA: X86 CourseNana.COM
● CPU: TimingSimpleCPU (no pipeline, CPU stalls on every memory request) ● Caches CourseNana.COM

	I cache size CourseNana.COM	I cache associativity CourseNana.COM	D cache size CourseNana.COM	D cache associativity CourseNana.COM	Policy CourseNana.COM	Block size CourseNana.COM
L1 cache CourseNana.COM	16KB CourseNana.COM	8 CourseNana.COM	16KB CourseNana.COM	4 CourseNana.COM	LRU CourseNana.COM	32B CourseNana.COM
L2 cache CourseNana.COM	– CourseNana.COM	– CourseNana.COM	1MB CourseNana.COM	16 CourseNana.COM	LRU CourseNana.COM	32B CourseNana.COM

* L1 I cache and L1 D cache connect to the same L2 cache ● Memory size: 8192MB CourseNana.COM

5 CourseNana.COM

ViT(Vision Transformer) – Transformer Overview CourseNana.COM

● A basic transformer block consists of CourseNana.COM
- ○ Layer Normalization CourseNana.COM
- ○ MultiHead Self-Attention (MHSA) CourseNana.COM
- ○ Feed Forward Network (FFN) CourseNana.COM
- ○ Residual connection (Add) CourseNana.COM
● You only need to focus on how to CourseNana.COM

implement the function in the red box CourseNana.COM
● If you only want to complete the project CourseNana.COM

instead of understanding the full algorithm about ViT, you can skip the section masked as red CourseNana.COM

6 CourseNana.COM

ViT(Vision Transformer) – Image Pre-processing ● Normalize, resize to (300,300,3) and center crop to (224,224,3) CourseNana.COM

7 CourseNana.COM

ViT(Vision Transformer) – Patch Encoder CourseNana.COM

● In this project, we use Conv2D as Patch Encoder with kernel_size = (16,16), stride = (16,16) and output_channel = 768 CourseNana.COM
● (224,224,3) -> (14,14, 16*16*3) -> (196, 768) CourseNana.COM

8 CourseNana.COM

ViT(Vision Transformer) – Class Token CourseNana.COM

● Now we have 196 tokens and each token has 768 features CourseNana.COM
● In order to record global information, we need concatenate one learnable class token with 196 tokens CourseNana.COM
● (196,768) -> (197,768) CourseNana.COM

9 CourseNana.COM

ViT(Vision Transformer) – Position Embedding CourseNana.COM

● Add the learnable position information on the patch embedding CourseNana.COM
● (197,768) + position_embedding(197,768) -> (197,768) CourseNana.COM

10 CourseNana.COM

ViT(Vision Transformer) – Layer Normalization CourseNana.COM

● Normalize each token CourseNana.COM
● You need to normalize with the formula CourseNana.COM

C CourseNana.COM

embedded dimension CourseNana.COM

T CourseNana.COM

# of tokens CourseNana.COM

11 CourseNana.COM

ViT(Vision Transformer) – MultiHead Self Attention (1) CourseNana.COM

● Wk, Wq, Wv ∈ RC✕C CourseNana.COM
● bq , bk, bv∈ RC CourseNana.COM
● Wo ∈ RC✕C ● bo∈RC CourseNana.COM

X CourseNana.COM

Input Linear Projection CourseNana.COM

split into heads CourseNana.COM

Attention Attention CourseNana.COM

merge heads CourseNana.COM

Output Linear Projection CourseNana.COM

Y CourseNana.COM

Attention CourseNana.COM

Wk, Wq, Wv CourseNana.COM

bq , bk, bv CourseNana.COM

Wo bo CourseNana.COM

12 CourseNana.COM

ViT(Vision Transformer) – MultiHead Self Attention (2) CourseNana.COM

● Get Q, K, V ∈ RT✕(NH*H) after input linear projection CourseNana.COM
● Split Q, K, V into Q1, Q2, Q3,..., QNH K1, K2, K3,..., KNH V1, V2, V3,..., VNH ∈ RT✕H CourseNana.COM

Linear Projection CourseNana.COM

Q = XW T + b q q CourseNana.COM

C CourseNana.COM

embedded dimension CourseNana.COM

# of tokens CourseNana.COM

H CourseNana.COM

hidden dimension CourseNana.COM

NH C = H * NH CourseNana.COM

# of head CourseNana.COM

K = XW CourseNana.COM

T + b kk CourseNana.COM

T CourseNana.COM

V = XW T + b vv CourseNana.COM

Linear Projection and split into heads CourseNana.COM

13 CourseNana.COM

ViT(Vision Transformer) – MultiHead Self Attention (2) CourseNana.COM

● CourseNana.COM

For each head i, compute S = Q K T/square_root(H) ∈ RT✕T iii CourseNana.COM

Pi = Softmax(Si ) ∈ RT✕T, Softmax is a row-wise function Oi = Pi Vi ∈ RT✕H CourseNana.COM

Softmax CourseNana.COM

Qi CourseNana.COM

Matrix Multiplication and scale CourseNana.COM

K Vi CourseNana.COM

Softmax CourseNana.COM

i CourseNana.COM

Matrix Multiplication CourseNana.COM

Oi CourseNana.COM

14 CourseNana.COM

ViT(Vision Transformer) – MultiHead Self Attention (3) CourseNana.COM

● Oi ∈ RT✕H, O = [O1, O2,...,O2 ] Linear Projection CourseNana.COM

H CourseNana.COM

hidden dimension CourseNana.COM

NH CourseNana.COM

# of head CourseNana.COM

C CourseNana.COM

embedded dimension CourseNana.COM

T CourseNana.COM

# of tokens CourseNana.COM

output = OW CourseNana.COM

T + b oo CourseNana.COM

merge heads and Linear Projection CourseNana.COM

15 CourseNana.COM

ViT(Vision Transformer) – Feed Forward Network CourseNana.COM

● CourseNana.COM

Get Q, K, V ∈ RT✕(h*H) after input linear projection
Split Q, K, V into Q1, Q2, Q3,..., Qh K1, K2, K3,..., Kh V1, V2, V3,..., Vh ∈ RT✕H CourseNana.COM

C OC CourseNana.COM

embedded dimension CourseNana.COM

hidden dimension CourseNana.COM

T CourseNana.COM

# of tokens CourseNana.COM

T CourseNana.COM

# of tokens CourseNana.COM

Input Linear Projection CourseNana.COM

GeLU CourseNana.COM

output CourseNana.COM

Linear Projection CourseNana.COM

16 CourseNana.COM

ViT(Vision Transformer) – GeLU CourseNana.COM

17 CourseNana.COM

ViT(Vision Transformer) – Classifier CourseNana.COM

● Contains a Linear layer to transform 768 features to 200 class ○ (197, 768) -> (197, 200) CourseNana.COM

● Only refer to the first token (class token) ○ (197, 200) -> (1, 200) CourseNana.COM

18 CourseNana.COM

$ make layernorm_tb CourseNana.COM

$ make MHSA_tb CourseNana.COM

$ make matmul_tb CourseNana.COM

$ make gelu_tb CourseNana.COM

$ run_all.sh CourseNana.COM

$ make transformer_tb CourseNana.COM

$ make feedforward_tb CourseNana.COM

ViT(Vision Transformer) – Work Flow CourseNana.COM

layernorm CourseNana.COM

matmul attention CourseNana.COM

layernorm MHSA CourseNana.COM

Pre-pocessing Embedder CourseNana.COM

Transformer x12 layernorm CourseNana.COM

Classifier CourseNana.COM

Argmax CourseNana.COM

Black footed Albatross CourseNana.COM

Load_weight m5_dump_init CourseNana.COM

layernorm MHSA residual CourseNana.COM

m5_dump_stat CourseNana.COM

matmul CourseNana.COM

matmul + layernorm CourseNana.COM

matmul CourseNana.COM

FFN CourseNana.COM

gelu gelu + matmul CourseNana.COM

19 CourseNana.COM

ViT(Vision Transformer) – Shape of array CourseNana.COM

layernorm input/output [T*C] MHSA input/output/o [T*C] CourseNana.COM

MHSA qkv [T*3*C] q token 1 CourseNana.COM

token 1 CourseNana.COM

C CourseNana.COM

v token 1 CourseNana.COM

token 2 ...... token T CourseNana.COM

k token 1 CourseNana.COM

...... q token T CourseNana.COM

k token T CourseNana.COM

v token T CourseNana.COM

feedforward feedforward CourseNana.COM

C input/output [T*C] CourseNana.COM

gelu [T*OC] CourseNana.COM

token 1 CourseNana.COM

OC CourseNana.COM

token 2 CourseNana.COM

...... CourseNana.COM

token T CourseNana.COM

20 CourseNana.COM

Common problem CourseNana.COM

● Segmentation fault CourseNana.COM

○ ensure that you are not accessing a nonexistent memory address CourseNana.COM
○ Enter the command $ulimit -s unlimited CourseNana.COM

21 CourseNana.COM

All you have to do is CourseNana.COM

● Download TA’s Gem5 image
○ docker pull yenzu/ca_final_part2:2024 CourseNana.COM

● Write C++ with understanding the algorithm in ./layer folder CourseNana.COM

○ make clean CourseNana.COM
○ make <layer>_tb ○ ./<layer>_tb CourseNana.COM

22 CourseNana.COM

All you have to do is CourseNana.COM

● Ensure the ViT will successfully classify the bird CourseNana.COM
- ○ python3 embedder.py --image_path images/Black_Footed_Albatross_0001_796111.jpg CourseNana.COM
  
  --embedder_path weights/embedder.pth --output_path embedded_image.bin CourseNana.COM
- ○ g++ -static main.cpp layer/*.cpp -o process CourseNana.COM
- ○ ./process CourseNana.COM
- ○ python3 run_model.py --input_path result.bin --output_path torch_pred.bin --model_path CourseNana.COM
  
  weights/model.pth CourseNana.COM
- ○ python3 classifier.py --prediction_path torch_pred.bin --classifier_path CourseNana.COM
  
  weights/classifier.pth CourseNana.COM
- ○ After running the above commands, you will get the following top5 prediction. CourseNana.COM
● Evaluate the performance of part of ViT, that is layernorm+MHSA+residual CourseNana.COM
- ○ Need about 3.5 hours to finish the simulation CourseNana.COM
- ○ Check stat.txt CourseNana.COM

23 CourseNana.COM

Grading Policy CourseNana.COM

● (50%) Verification CourseNana.COM
- ○ (10%) matmul_tb CourseNana.COM
- ○ (10%) layernorm_tb CourseNana.COM
- ○ (10%) gelu_tb CourseNana.COM
- ○ (10%) MHSA_tb CourseNana.COM
- ○ (10%) transformer_tb CourseNana.COM
● (50%) Performance CourseNana.COM

○ max(sigmoid((27.74 - student latency)/student latency))*70, 50) CourseNana.COM

● You will get 0 performance point if your design is not verified. CourseNana.COM

24 CourseNana.COM

Submission CourseNana.COM

● Please submit code on E3 before 23:59 on June 20, 2024. ● Format CourseNana.COM

○ Code: please put your code in a folder
named FP2_team<ID>_code and compress 2 it into a zip file. CourseNana.COM

● Late submission is not allowed. CourseNana.COM
● Plagiarism is forbidden, otherwise you will get 0 point!!! CourseNana.COM

2 2 CourseNana.COM

25 CourseNana.COM

FP2_team<ID>_code folder CourseNana.COM

● You should attach the following documents ○ matmul.cpp CourseNana.COM

○ layernorm.cpp ○ gelu.cpp
○ attention.cpp ○ residual.cpp CourseNana.COM

26 CourseNana.COM

1DT038 Computer Architecture 2024 Spring Final Project Part 2: ViT (Vision Transformer)

Get in Touch with Our Experts