Concept Graph & Resume using Claude 3 Opus | Chat GPT4o | Llama 3:
Resume:
1.- ResNet won 5 main tracks in 2015 ImageNet & COCO competitions, often by a large margin
2.- ImageNet benchmark shows increasing network depth over time, from non-deep methods to 150+ layer networks
3.- Increasing depth has greatly improved results on tasks like Pascal VOC object detection
4.- AlexNet (8 layers) was state-of-the-art in 2012, VGGNet/GoogleNet (20 layers) in 2014, ResNet (150+ layers) in 2015
5.- Simply stacking more layers does not guarantee better performance
6.- Experiments show deeper plain networks can have higher training and test error than shallower networks
7.- Intuition: Deeper models have richer solution space so shouldn't have higher training error
8.- Hypothesis: Current solvers (SGD, backprop) have optimization difficulties for very deep networks
9.- ResNet solution: Have layers learn residual functions with reference to layer inputs, using identity skip connections
10.- Hypothesis: Easier to set weights to 0 if identity is optimal, easier to learn small fluctuations on identity
11.- ResNet design: Similar to VGG - 3x3 conv layers, double filters when halving spatial size. Convert to ResNet with skip connections
12.- Results on CIFAR-10: Plain nets' error increases with depth, ResNets' error decreases even past 100 layers
13.- ImageNet: 34-layer ResNet outperforms 18-layer one, error decreases up to 152 layers while keeping lower complexity than VGG
14.- Hypothesis: Expressiveness of deeper models means fewer filters needed, allowing deeper ResNets with low complexity
15.- ResNets are useful as feature extractors for other vision tasks beyond just classifiers
16.- ResNet-101 features gave 28% gain over VGG-16 for object detection
17.- COCO object detection: 80-category detector trained on ResNet-101 features detects many object classes in images/video
18.- ResNets lead on many benchmarks - PASCAL VOC, VQA challenge, human pose est., depth est., segment proposal
19.- ResNets also used beyond vision - image generation, NLP, speech recognition, computational advertising
20.- Central idea is going deeper by making it easier to train very deep nets
21.- Conclusions: ResNets are easy to train, gain accuracy from depth, and provide good transferable features
22.- Follow-up work: 200-layer ImageNet, 1000-layer CIFAR-10 ResNets
23.- Released pretrained ImageNet models in Caffe, Facebook released Torch training code. Many 3rd party implementations available
24.- Author doesn't expect million layer networks by next CVPR
25.- Depth is one dimension of network design space to explore, along with width etc.
26.- Going deeper not always most economical for a given computational budget
27.- ResNets enable training deeper nets but optimal balancing with other factors needed
28.- Deeper models more expressive so can potentially use fewer filters
29.- Simply replacing VGG-16 with ResNet-101 gave large object detection gains, showing feature transferability
30.- ResNets are state-of-the-art across many vision benchmarks and have applications beyond vision as well
Knowledge Vault built byDavid Vivancos 2024