Concept Graph & Resume using Claude 3 Opus | Chat GPT4 | Gemini Adv | Llama 3:
Resume:
1.-Convolutional networks have been getting deeper over time to improve performance on ImageNet classification.
2.-The work evaluates convolutional networks of different depths on ImageNet that share the same architecture design except for depth.
3.-Models are much deeper compared to prior state-of-the-art like AlexNet.
4.-Deeper features are evaluated on other datasets.
5.-Models were made publicly available for the community to download and use.
6.-A single family of networks is explored where only the depth differs, fixing other key design choices.
7.-Very small 3x3 convolutional kernels are used in all layers with stride 1, differing from prior work.
8.-Other conventional details are used like max pooling, dropout, fully connected layers, with the last layer performing classification.
9.-Stacked 3x3 conv layers without pooling in between have a larger receptive field than a single layer.
10.-Stacked 3x3 layers have more non-linearity making the decision function more discriminative, and have fewer parameters.
11.-Committing to 3x3 kernels throughout makes architecture design easier.
12.-Architectures are constructed by starting with 11 layers and injecting more 3x3 conv layers to get 13, 16, 19 layers.
13.-Input is a fixed 224x224 image. Conventional approach is rescaling to preserve aspect ratio then taking a random crop.
14.-Multi-scale training is used, rescaling each image to a randomly sampled size between 256-512 before taking a fixed crop.
15.-Standard augmentation like horizontal flips and RGB offsets are used, but no advanced automatic distortions.
16.-Networks are optimized with mini-batch gradient descent with momentum. Convergence is fast in ~74 epochs due to small kernels.
17.-11-layer model is initialized from Gaussian and used to initialize deeper nets without fixing the layers.
18.-Entirely random initialization per layer is also possible if scaled to preserve magnitudes.
19.-Two testing approaches: Random crop sampling with prediction combination, and fully convolutional evaluation to get class score maps.
20.-Both testing approaches are tried along with combining their predictions. Multi-scale evaluation by applying to multiple resolutions helps.
21.-Implementation used modified Caffe toolbox supporting multiple GPUs with synchronous data parallelism. 3.7x speedup with 4 GPUs.
22.-Results show depth is important, with 16 and 19-layer nets beating 11-layer nets substantially. Multi-scale training and testing help.
23.-Dense evaluation and multiple crop evaluation yield comparable results and are complementary when combined.
24.-The approach won the 2014 ImageNet localization challenge and got 2nd in classification after GoogleNet. Single model got 7% error.
25.-Both VGG and GoogleNet used very deep networks with multi-scale training. VGG used simple 3x3 kernels, GoogleNet used complex Inception.
26.-Even better results reported after, building on the deep 3x3 VGG nets but wider with more aggressive augmentation.
27.-Deep representations work well as feature extractors on other datasets. Deeper features beat less deep ones even with simple classifiers.
28.-The publicly released 16 and 19-layer models enabled advances in object detection, segmentation, captioning after release.
29.-Convolutional depth is very important for ImageNet classification. Networks built with stacked 3x3 conv layers work well.
30.-The 16 and 19-layer models were released and can be used in any package with a Caffe or Torch backend.
Knowledge Vault built byDavid Vivancos 2024