Untitled

Programming GPUs

CUDA
- write C-like code that runs directly on GPU
- Optimized by APIs: cuBLAS, CuFFT, cuDNN, etc
openCL
- like CUDA, but slow and runs on anything
HIP
- CUDA → AMD GPUS

Deep Learning Software

Pytorch

Tensor : Imperative ndarray, but runs on GPU
Variable : Node in a computational graph
Module : A NN layer

import torch
N, D_in, H, D_out = 64, 1000, 100,10 

x = torch.randn(N, D_in)
y = torch.randn(N, D_out)
w1 = torch.randn(D_in, H, requires_grad=True)
w2 = torch.randn(H, D_out, requires_grad=True)

learn_rate= 1e-16

for t in range(500):
    y_pred = x.mm(w1).clamp(min=0).mm(w2)
    loss = (y_pred-y).pow(2).sum()
    
    loss.backward()
  
    with torch.no_grad():
        w1 -= learn_rate  * w1.grad
        w2 -= learn_rate  * w2.grad
        w1.grad.zero_()
        w2.grad.zero_()

Static Vs Dynamic Graphs

Static Graphs
- Framework can optimize the graph for you before it runs

Untitled

once graph is built, can serialize it and run it without the code that built the graph
Dynamic Graphs
- Graph building and execution are intertwined, so always need to keep code around
- Conditional Operations can be implemented through python
- Recurrent networks
- Recursive networks
- Modular Networks