Concept Graph & Resume using Claude 3 Opus | Chat GPT4 | Gemini Adv | Llama 3:
Resume:
1.-Speaker is recovering speech recognition scientist now working on image recognition in deep learning infrastructure group at Google.
2.-Convolutional nets are a powerful "hammer" approach that work well across many computer vision tasks like search, labeling, segmentation, detection.
3.-Example of fine-grained dog breed classification - using pre-trained ImageNet model with new data gets 2nd place, showing transfer learning power.
4.-Goal is to build better convolutional nets faster, make training more efficient as you scale to larger data and models.
5.-Two approaches to parallelize neural net training - model parallelism (split network across machines) and data parallelism (copy model, split data).
6.-Model parallelism has communication penalty exchanging data between machines. Data parallelism has communication penalty synchronizing parameters between workers.
7.-Implemented distributed system doing both model and data parallelism. Uses asynchronous SGD. But efficiency still poor as you add machines.
8.-Works best with low compute density (fast network, slow cores). Breaks down with high-density cores like GPUs due to communication bottleneck.
9.-Goal is to design parallelization approach that works across different compute topologies and densities to keep up with fast-changing hardware.
10.-New idea from Alex Krizhevsky - use model parallelism for convolutional layers, data parallelism for fully connected layers.
11.-Challenge is fully-connected layers need all data from convolutional layer, causing communication bottleneck when switching parallelization approaches.
12.-Solution - broadcast convolutional layer output to all workers, have them work on chunks of next layer in parallel.
13.-Clever pipelining - start broadcasting next batch to fully-connected during computation of current batch to overlap communication and computation.
14.-Can get 3.74x speedup on 4 GPUs (near optimal 4x), 6.32x on 8 GPUs. Faster than other approaches in literature.
15.-Next topic - making convnets faster and more efficient. Filters are often redundant, e.g. RGB filters in first layer very similar.
16.-Separable convolutions - first convolve each input channel independently to make many thinner feature maps, then 1x1 convolution to project.
17.-Uses many fewer parameters, 5-10x reduction typical. Just as accurate, converges faster. Simple to implement. Works best for large-scale tasks.
18.-Next frontier is scaling object detection with convnets to more classes and data. Current approaches slow and class-specific.
19.-New approach - build generic "salient object" detector using conv features to directly propose object regions, no sliding windows.
20.-Gets competitive results on VOC and ImageNet detection in a much more scalable way by restricting model and region proposal complexity.
21.-Video classification next challenge. No clear best approaches yet - late fusion, early fusion, 3D convolution, hybrid? Computationally limited.
22.-But seeing promising transfer learning - convnet features learned on Youtube videos beat state-of-the-art on UCF-101 benchmark, without convnet finetuning.
23.-Over and over, big convnet models plus smaller-scale task-specific finetuning yields state-of-the-art results, much more robust than past ML.
24.-This robustness to new data suggests deep learning models are closer to "right" approach, not just overfitting.
25.-Biggest bottleneck is computation - with unlimited compute, could train much bigger nets with more dropout regularization and almost surely improve.
26.-Currently model size limited by GPU memory. Some models train for months to eke out small gains. More computation would help a lot.
27.-In the separable convolutions, typically use a "depth multiplier" of 8 - turn each input channel into 8 feature maps before 1x1.
28.-Very large minibatches work surprisingly well, don't usually hurt convergence. Allows better overlapping of communication and computation during distributed training.
29.-Physical modeling and inverting graphics pipeline could help video, strong prior for data compression. But unclear how much it helps in general.
30.-Details of bounding box inference approach in slides not discussed in depth - left for audience to discuss with authors afterwards.
Knowledge Vault built byDavid Vivancos 2024