
Programming GPUs
- CUDA
- write C-like code that runs directly on GPU
- Optimized by APIs: cuBLAS, CuFFT, cuDNN, etc
- openCL
- like CUDA, but slow and runs on anything
- HIP
Deep Learning Software
Pytorch
- Tensor : Imperative ndarray, but runs on GPU
- Variable : Node in a computational graph
- Module : A NN layer
import torch
N, D_in, H, D_out = 64, 1000, 100,10
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)
w1 = torch.randn(D_in, H, requires_grad=True)
w2 = torch.randn(H, D_out, requires_grad=True)
learn_rate= 1e-16
for t in range(500):
y_pred = x.mm(w1).clamp(min=0).mm(w2)
loss = (y_pred-y).pow(2).sum()
loss.backward()
with torch.no_grad():
w1 -= learn_rate * w1.grad
w2 -= learn_rate * w2.grad
w1.grad.zero_()
w2.grad.zero_()
Static Vs Dynamic Graphs
- Static Graphs
- Framework can optimize the graph for you before it runs

- once graph is built, can serialize it and run it without the code that built the graph
- Dynamic Graphs
- Graph building and execution are intertwined, so always need to keep code around
- Conditional Operations can be implemented through python
- Recurrent networks
- Recursive networks
- Modular Networks