The End Of Knowledge - Vault 5/91 - CVPR - 2024 - Computer vision at scale: Driving customer innovation and industry adoption

graph LR classDef main fill:#f9d4d4, font-weight:bold, font-size:14px classDef amazon fill:#d4f9d4, font-weight:bold, font-size:14px classDef ai_services fill:#d4d4f9, font-weight:bold, font-size:14px classDef ml_tech fill:#f9f9d4, font-weight:bold, font-size:14px classDef challenges fill:#f9d4f9, font-weight:bold, font-size:14px Main[Computer vision at
scale: Driving customer
innovation and industry
adoption] --> A[Amazon AI and ML] Main --> B[AI Services and Tools] Main --> C[ML Technologies] Main --> D[Challenges and Solutions] A --> A1[25-year AI/ML innovation journey 2] A --> A2[Customer-focused innovation scales solutions 3] A --> A3[AWS makes ML accessible
to organizations 8] B --> B1[AWS VP oversees AI
services, tools 1] B --> B2[Bedrock offers foundational
AI model access 12] B --> B3[Titan generates images,
mitigates bias 13] B --> B4[Rekognition extracts info from
images, video 9] B --> B5[Textract analyzes text from
various documents 10] B --> B6[Panorama brings vision to
on-premise cameras 11] C --> C1[AI detects product defects,
reports damage 4] C --> C2[AI generates ad creatives
from images 5] C --> C3[Palm recognition for contactless
payments 6] C --> C4[AI enhances NFL viewer
experience 7] C --> C5[Trainium: efficient ML training
chip 22] C --> C6[B-Mojo: open-source modular hybrid
architecture 21] D --> D1[Hallucination and control] D --> D2[Research and development] D --> D3[Enterprise challenges] D1 --> D1a[Hallucination: generated data misaligns
with facts 15] D1 --> D1b[Visual grounding controls multimodal
AI hallucinations 16] D1 --> D1c[THRONE measures vision-language model
hallucinations 17] D2 --> D2a[SSMs improve memory, reduce
hallucinations 20] D2 --> D2b[Hybrid models gain AI
research popularity 30] D2 --> D2c[Research credits for Trainium
experiments 23] D3 --> D3a[Enterprises scale foundational model
applications 24] D3 --> D3b[Accessible ML tools crucial
for adoption 25] D3 --> D3c[AWS focuses on domain-specific
model customization 26] class Main main class A,A1,A2,A3 amazon class B,B1,B2,B3,B4,B5,B6 ai_services class C,C1,C2,C3,C4,C5,C6 ml_tech class D,D1,D2,D3,D1a,D1b,D1c,D2a,D2b,D2c,D3a,D3b,D3c challenges

Resume:

1.- Dr. Swami Sivasubramanian is VP of AI and data at AWS, overseeing AI services and tools supporting innovation across multiple levels of the AI stack.

2.- Amazon has been working on AI and ML for over 25 years, including ongoing innovations in computer vision used throughout their operations.

3.- Amazon's approach to innovation focuses on customer obsession, working backwards from customer problems, and scaling solutions effectively.

4.- Project PI uses multi-modal foundational models to identify product defects in Amazon fulfillment centers, reporting damage in plain language.

5.- Amazon Ads Image Generator uses AI to create multiple ad creatives from product images, logos, and text prompts.

6.- Amazon One uses computer vision to recognize palm prints for contactless payments and identification, trained on synthetic data.

7.- Prime Video uses computer vision and AI to provide next-gen stats during NFL games, enhancing the viewer experience.

8.- AWS aims to make machine learning and computer vision accessible to millions of organizations through a comprehensive set of tools.

9.- Amazon Rekognition is a fully managed service that extracts information from images and video files using machine learning.

10.- Amazon Textract uses complex deep learning models to extract and analyze text from various document types.

11.- AWS Panorama allows organizations to bring computer vision to on-premise cameras for local predictions and insights.

12.- Amazon Bedrock is a generative AI platform service offering access to various foundational models from Amazon and third-party providers.

13.- Titan Image Generator produces high-quality, realistic images using natural language prompts, with built-in mitigations for toxic or biased content.

14.- AWS implements invisible watermarks in AI-generated images to help reduce the spread of misinformation.

15.- Hallucination in AI models occurs when generated data doesn't align with reality or the knowledge base of facts.

16.- Visual grounding is crucial for controlling hallucinations in multimodal AI models.

17.- THRONE is a benchmark developed by Amazon's team to measure hallucinations in vision-language models.

18.- Controlling generation and grounding to knowledge bases can help reduce hallucination rates in multimodal foundational models.

19.- Transformer-based models may hallucinate due to limited ability to retain information about input prompts beyond their context window.

20.- State space models (SSMs) offer potential improvements in memory retention and hallucination control compared to transformer architectures.

21.- Amazon plans to open-source B-Mojo, a class of modular hybrid architectures designed for efficient memory and inference computation.

22.- AWS Trainium is a purpose-built chip for training machine learning models, optimized for efficient computation.

23.- Amazon Research Awards offer promotional credits for researchers to run experiments on Trainium.

24.- Enterprises are moving from experimentation to scaling foundational model applications, facing challenges like hallucination detection and compliance.

25.- Making ML tools more accessible to non-ML experts is crucial for wider adoption of generative AI applications.

26.- Customization of foundational models for specific domains is becoming easier and is an area of focus for AWS.

27.- Amazon's computer vision technology powers recommendation engines, robotic picking in fulfillment centers, and Prime Video drones.

28.- Phillips 66 uses AWS Panorama for real-time monitoring and data gathering in their connected stores.

29.- Visual perception in AI can be described as controlled hallucination, where internal representations generate data aligned with reality.

30.- Hybrid variants of state space models and attention mechanisms are gaining popularity in AI research.

Knowledge Vault built byDavid Vivancos 2024