1. Homepage
  2. Programming
  3. 1DT038 Computer Architecture 2024 Spring Final Project Part 2: ViT (Vision Transformer)

1DT038 Computer Architecture 2024 Spring Final Project Part 2: ViT (Vision Transformer)

Engage in a Conversation
UPPSALA1DT038Computer ArchitectureC++Vision TransformerCacheViT

Computer Architecture 2024 Spring CourseNana.COM

Final Project Part 2 CourseNana.COM

Overview CourseNana.COM

Tutorial CourseNana.COM

Part 1 (5%) CourseNana.COM

Write C++ program to analyze the specification of L1 data cache. Part 2 (5%) CourseNana.COM

Given the hardware specifications, try to get the best performance for more complicated program. CourseNana.COM

Description CourseNana.COM

In this project, we will use a two-level cache computer system. Your task is to write a ViT(Vision Transformer) in C++ and optimize it. You can see more details of the system specification on the next page. CourseNana.COM

System Specifications CourseNana.COM

I cache size CourseNana.COM

I cache associativity CourseNana.COM

D cache size CourseNana.COM

D cache associativity CourseNana.COM

Block size CourseNana.COM

L1 cache CourseNana.COM

L2 cache CourseNana.COM

* L1 I cache and L1 D cache connect to the same L2 cache Memory size: 8192MB CourseNana.COM

ViT(Vision Transformer) – Transformer Overview CourseNana.COM

ViT(Vision Transformer) – Image Pre-processing Normalize, resize to (300,300,3) and center crop to (224,224,3) CourseNana.COM

ViT(Vision Transformer) – Patch Encoder CourseNana.COM

  • ●  In this project, we use Conv2D as Patch Encoder with kernel_size = (16,16), stride = (16,16) and output_channel = 768 CourseNana.COM

  • ●  (224,224,3) -> (14,14, 16*16*3) -> (196, 768) CourseNana.COM

ViT(Vision Transformer) – Class Token CourseNana.COM

  • ●  Now we have 196 tokens and each token has 768 features CourseNana.COM

  • ●  In order to record global information, we need concatenate one learnable class token with 196 tokens CourseNana.COM

  • ●  (196,768) -> (197,768) CourseNana.COM

ViT(Vision Transformer) – Position Embedding CourseNana.COM

  • ●  Add the learnable position information on the patch embedding CourseNana.COM

  • ●  (197,768) + position_embedding(197,768) -> (197,768) CourseNana.COM

ViT(Vision Transformer) – Layer Normalization CourseNana.COM

ViT(Vision Transformer) – MultiHead Self Attention (1) CourseNana.COM

Input Linear Projection CourseNana.COM

split into heads CourseNana.COM

Attention Attention CourseNana.COM

merge heads CourseNana.COM

Output Linear Projection CourseNana.COM

Attention CourseNana.COM

Wk, Wq, Wv CourseNana.COM

bq , bk, bv CourseNana.COM

ViT(Vision Transformer) – MultiHead Self Attention (2) CourseNana.COM

  • ●  Get Q, K, V RT(NH*H) after input linear projection CourseNana.COM

  • ●  Split Q, K, V into Q1, Q2, Q3,..., QNH K1, K2, K3,..., KNH V1, V2, V3,..., VNH RTH CourseNana.COM

Linear Projection CourseNana.COM

Q = XW T + b q q CourseNana.COM

C CourseNana.COM

embedded dimension CourseNana.COM

# of tokens CourseNana.COM

H CourseNana.COM

hidden dimension CourseNana.COM

NH C = H * NH CourseNana.COM

# of head CourseNana.COM

V = XW T + b vv CourseNana.COM

Linear Projection and split into heads CourseNana.COM

ViT(Vision Transformer) – MultiHead Self Attention (2) CourseNana.COM

For each head i, compute S = Q K T/square_root(H) RTT iii CourseNana.COM

Pi = Softmax(Si ) RTT, Softmax is a row-wise function Oi = Pi Vi RTH CourseNana.COM

Softmax CourseNana.COM

Matrix Multiplication and scale CourseNana.COM

Matrix Multiplication CourseNana.COM

ViT(Vision Transformer) – MultiHead Self Attention (3) CourseNana.COM

Oi RTH, O = [O1, O2,...,O2 ] Linear Projection CourseNana.COM

C CourseNana.COM

embedded dimension CourseNana.COM

T CourseNana.COM

# of tokens CourseNana.COM

output = OW CourseNana.COM

T + b oo CourseNana.COM

merge heads and Linear Projection CourseNana.COM

ViT(Vision Transformer) – Feed Forward Network CourseNana.COM

Get Q, K, V RT(h*H) after input linear projection
Split Q, K, V into Q
1, Q2, Q3,..., Qh K1, K2, K3,..., Kh V1, V2, V3,..., Vh RTH CourseNana.COM

embedded dimension CourseNana.COM

hidden dimension CourseNana.COM

T CourseNana.COM

# of tokens CourseNana.COM

Input Linear Projection CourseNana.COM

output CourseNana.COM

Linear Projection CourseNana.COM

ViT(Vision Transformer) – GeLU CourseNana.COM

ViT(Vision Transformer) – Classifier CourseNana.COM

Contains a Linear layer to transform 768 features to 200 class (197, 768) -> (197, 200) CourseNana.COM

Only refer to the first token (class token) (197, 200) -> (1, 200) CourseNana.COM

$ make layernorm_tb CourseNana.COM

$ make MHSA_tb CourseNana.COM

$ make matmul_tb CourseNana.COM

$ make gelu_tb CourseNana.COM

$ run_all.sh CourseNana.COM

$ make transformer_tb CourseNana.COM

$ make feedforward_tb CourseNana.COM

ViT(Vision Transformer) – Work Flow CourseNana.COM

layernorm CourseNana.COM

matmul attention CourseNana.COM

layernorm MHSA CourseNana.COM

Pre-pocessing Embedder CourseNana.COM

Transformer x12 layernorm CourseNana.COM

Classifier CourseNana.COM

Argmax CourseNana.COM

Black footed Albatross CourseNana.COM

Load_weight m5_dump_init CourseNana.COM

layernorm MHSA residual CourseNana.COM

m5_dump_stat CourseNana.COM

matmul + layernorm CourseNana.COM

gelu gelu + matmul CourseNana.COM

ViT(Vision Transformer) – Shape of array CourseNana.COM

layernorm input/output [T*C] MHSA input/output/o [T*C] CourseNana.COM

MHSA qkv [T*3*C] q token 1 CourseNana.COM

token 2 ...... token T CourseNana.COM

k token 1 CourseNana.COM

...... q token T CourseNana.COM

k token T CourseNana.COM

v token T CourseNana.COM

feedforward feedforward CourseNana.COM

C input/output [T*C] CourseNana.COM

gelu [T*OC] CourseNana.COM

token 2 CourseNana.COM

token T CourseNana.COM

Common problem CourseNana.COM

Segmentation fault CourseNana.COM

  • ○  ensure that you are not accessing a nonexistent memory address CourseNana.COM

  • ○  Enter the command $ulimit -s unlimited CourseNana.COM

All you have to do is CourseNana.COM

Download TA’s Gem5 image
docker pull yenzu/ca_final_part2:2024 CourseNana.COM

Write C++ with understanding the algorithm in ./layer folder CourseNana.COM

All you have to do is CourseNana.COM

  • ●  Ensure the ViT will successfully classify the bird CourseNana.COM

    • ○  python3 embedder.py --image_path images/Black_Footed_Albatross_0001_796111.jpg CourseNana.COM

      --embedder_path weights/embedder.pth --output_path embedded_image.bin CourseNana.COM

    • ○  g++ -static main.cpp layer/*.cpp -o process CourseNana.COM

    • ○  ./process CourseNana.COM

    • ○  python3 run_model.py --input_path result.bin --output_path torch_pred.bin --model_path CourseNana.COM

      weights/model.pth CourseNana.COM

    • ○  python3 classifier.py --prediction_path torch_pred.bin --classifier_path CourseNana.COM

      weights/classifier.pth CourseNana.COM

    • ○  After running the above commands, you will get the following top5 prediction. CourseNana.COM

  • ●  Evaluate the performance of part of ViT, that is layernorm+MHSA+residual CourseNana.COM

Grading Policy CourseNana.COM

max(sigmoid((27.74 - student latency)/student latency))*70, 50) CourseNana.COM

You will get 0 performance point if your design is not verified. CourseNana.COM

Submission CourseNana.COM

Please submit code on E3 before 23:59 on June 20, 2024. Format CourseNana.COM

Code: please put your code in a folder
named
FP2_team<ID>_code and compress 2 it into a zip file. CourseNana.COM

CourseNana.COM

FP2_team<ID>_code folder CourseNana.COM

You should attach the following documents matmul.cpp CourseNana.COM

layernorm.cpp gelu.cpp
attention.cpp residual.cpp CourseNana.COM

Get in Touch with Our Experts

WeChat (微信) WeChat (微信)
Whatsapp WhatsApp
UPPSALA代写,1DT038代写,Computer Architecture代写,C++代写,Vision Transformer代写,Cache代写,ViT代写,UPPSALA代编,1DT038代编,Computer Architecture代编,C++代编,Vision Transformer代编,Cache代编,ViT代编,UPPSALA代考,1DT038代考,Computer Architecture代考,C++代考,Vision Transformer代考,Cache代考,ViT代考,UPPSALAhelp,1DT038help,Computer Architecturehelp,C++help,Vision Transformerhelp,Cachehelp,ViThelp,UPPSALA作业代写,1DT038作业代写,Computer Architecture作业代写,C++作业代写,Vision Transformer作业代写,Cache作业代写,ViT作业代写,UPPSALA编程代写,1DT038编程代写,Computer Architecture编程代写,C++编程代写,Vision Transformer编程代写,Cache编程代写,ViT编程代写,UPPSALAprogramming help,1DT038programming help,Computer Architectureprogramming help,C++programming help,Vision Transformerprogramming help,Cacheprogramming help,ViTprogramming help,UPPSALAassignment help,1DT038assignment help,Computer Architectureassignment help,C++assignment help,Vision Transformerassignment help,Cacheassignment help,ViTassignment help,UPPSALAsolution,1DT038solution,Computer Architecturesolution,C++solution,Vision Transformersolution,Cachesolution,ViTsolution,