1. Artificial Intelligence (AI)
Artificial Intelligence is the broadest field of computer science that focuses on building systems capable of simulating human intelligence such as reasoning, perception, learning, decision-making, and natural language understanding. It includes both rule-based systems and data-driven learning systems.
Key Points
- AI represents the highest-level umbrella discipline covering all intelligent systems
- It simulates human cognitive abilities like reasoning, perception, and decision-making
- AI systems may operate using explicit rules or learned patterns from data
- It includes subfields such as Machine Learning, Deep Learning, and Generative AI
- AI is used across automation, robotics, analytics, and intelligent assistants
Example Scenario
A smart assistant answering questions, controlling devices, and recognizing speech is an AI system combining multiple intelligence techniques.
Exam Tips
- AI is always the parent category in exam questions
- Do not confuse AI with Machine Learning or Deep Learning
- Focus on capability-based descriptions, not algorithms
2. Machine Learning (ML)
Machine Learning is a subset of AI that enables systems to automatically learn patterns from data and improve performance without explicit programming. It relies on statistical models trained using historical data.
Key Points
- ML enables systems to learn patterns directly from structured and unstructured data
- It eliminates the need for explicit rule-based programming logic
- Includes supervised, unsupervised, and reinforcement learning techniques
- Widely used for classification, regression, prediction, and clustering tasks
- Performance improves as more data becomes available over time
Example Scenario
A fraud detection system learns from past transactions to classify future transactions as legitimate or suspicious.
Exam Tips
- ML always means learning from data, not hard-coded rules
- Know differences between supervised and unsupervised learning
- Regression and decision tree algorithms are frequently tested
3. Deep Learning (DL)
Deep Learning is a specialized subset of Machine Learning that uses multi-layer neural networks to learn hierarchical representations of data. It is especially powerful for complex tasks involving images, speech, and language.
CNN (Convolutional Neural Network) is a deep learning architecture primarily designed for processing image and visual data. It automatically detects patterns such as edges, shapes, textures, and objects using convolution operations.
RNN (Recurrent Neural Network) is a deep learning architecture designed for sequential and time-dependent data where previous inputs influence future outputs.
Key Points
- Deep Learning uses multi-layer artificial neural network architectures
- Requires large datasets and high computational power for training
- Automatically extracts hierarchical and complex features from raw data
- Highly effective in image recognition, speech processing, and NLP tasks
- Forms the foundation of modern transformer-based architectures
Example Scenario
A facial recognition system identifies individuals from images using deep convolutional neural networks.
Exam Tips
- CNN is used for image-based processing tasks
- RNN is used for sequential and time-series data
- Transformers power modern Large Language Models
4. Generative AI
Generative AI refers to systems that create new content such as text, images, audio, or code by learning patterns from large datasets. It is built on foundation models that are pre-trained and adaptable.
Key Points
- Generative AI focuses on creating new content instead of only predicting outcomes
- It is built using large-scale foundation models trained on massive datasets
- Supports multiple output types including text, images, audio, and code
- Can be adapted using fine-tuning or retrieval-based augmentation methods
- Used in creative applications such as chatbots, design, and content generation
Example Scenario
A system generating marketing text, product descriptions, and images for an e-commerce platform.
Exam Tips
- GenAI = content generation, not classification
- Foundation models are the core building blocks
- Often confused with traditional predictive ML
5. Large Language Models (LLMs)
Large Language Models are deep learning models trained on massive text datasets to understand, generate, and process human language. They use transformer architecture and token-based processing.
Key Points
- LLMs are specialized models for language understanding and generation
- Built using transformer-based neural network architectures
- Trained on extremely large datasets containing diverse text sources
- Support tasks like summarization, translation, Q&A, and reasoning
- Use tokens and context windows for processing input and output
Example Scenario
A chatbot answering technical questions using a transformer-based language model.
Exam Tips
- LLMs are a subset of Generative AI
- Context window defines model memory capacity
- Token processing affects cost and performance
AI vs ML vs DL vs LLMs
| Feature | Artificial Intelligence (AI) | Machine Learning (ML) | Deep Learning (DL) | Large Language Models (LLMs) |
|---|---|---|---|---|
| Definition | Broad field focused on simulating human intelligence | Subset of AI that learns from data | Subset of ML using deep neural networks | Specialized deep learning models for language understanding |
| Scope | Broadest domain | Narrower than AI | Narrower than ML | Narrowest specialized category |
| Main Goal | Intelligent behavior and decision-making | Learn patterns and make predictions | Learn complex representations automatically | Understand and generate human language |
| Learning Method | Rule-based or learning-based | Statistical learning from data | Multi-layer neural networks | Transformer-based deep learning |
| Data Requirement | Can work with limited data | Requires moderate data | Requires massive datasets | Requires extremely large text datasets |
| Common Techniques | Logic systems, search, ML | Regression, decision trees, clustering | CNN, RNN, Transformers | Transformer architectures |
| Primary Use Cases | Robotics, automation, assistants | Prediction and classification | Vision, speech, NLP | Chatbots, summarization, code generation |
| Input Types | Structured and unstructured | Mostly structured/unstructured data | Images, audio, text, video | Primarily text tokens |
| Output Type | Decisions or actions | Predictions and classifications | Complex pattern recognition | Human-like language generation |
| Examples | Virtual assistants, robotics | Fraud detection, recommendation systems | Image recognition, speech recognition | OpenAI GPT, Anthropic Claude, Meta Llama |
6. Tokens & Context Window
Tokens are the smallest units of text processed by language models, while context window defines how many tokens a model can process in a single request.
Key Points
- Tokens represent words, subwords, or characters depending on model design
- Context window defines the maximum input + output token limit
- Larger context windows allow processing of longer documents
- Token usage directly impacts inference cost in AI systems
- Exceeding context limits results in truncation of earlier content
Example Scenario
A long document exceeding token limits gets partially truncated during summarization.
Exam Tips
- Token count directly affects pricing in Bedrock
- Context window = model memory size
- Very common exam calculation concept
Tokens & Context Quick Reference
| Concept | Meaning | Exam Importance |
|---|---|---|
| Token | Smallest text unit processed by model | Directly impacts cost |
| Context Window | Maximum tokens model can process at once | Defines model memory |
| Input Tokens | Tokens sent to model | Affects pricing |
| Output Tokens | Tokens generated by model | Affects pricing |
| Truncation | Earlier content removed when limit exceeded | Common scenario question |
| Latency | Time taken to generate response | Important for real-time apps |
7. Embeddings
Embeddings are numerical vector representations of text that capture semantic meaning, enabling machines to understand relationships between words and sentences.
Key Points
- Embeddings convert text into high-dimensional numerical vector space
- They capture semantic similarity between different pieces of text
- Enable advanced search systems like semantic search and recommendation engines
- Serve as the foundation for retrieval-augmented generation systems
- Improve search accuracy beyond keyword-based matching
Example Scenario
“Car” and “automobile” return similar search results due to semantic closeness in embeddings.
Exam Tips
- Embeddings are NOT raw text or keywords
- Used heavily in vector search and RAG systems
- Similarity is measured using distance metrics
Tokens vs Embeddings
| Feature | Tokens | Embeddings |
|---|---|---|
| Purpose | Text processing | Semantic representation |
| Used For | Generation | Similarity search |
| Format | Raw text units | Numerical vectors |
| Impacts Cost | Yes directly | Usually storage/search cost |
| Used in RAG | Generation stage | Retrieval stage |
| Example | “Hello world” split into units | Vector representation of sentence |
8. Vector Databases
Vector databases are specialized storage systems designed to store embeddings and perform similarity searches using algorithms like k-nearest neighbors (k-NN).
Key Points
- Store high-dimensional embedding vectors efficiently at scale
- Enable semantic similarity search instead of keyword matching
- Use k-NN algorithms to find closest matching vectors
- Power recommendation systems and retrieval-augmented generation
- Optimized for fast and scalable AI retrieval operations
Example Scenario
A search engine returning semantically similar documents instead of exact keyword matches.
Exam Tips
- Amazon OpenSearch supports vector search capabilities
- Core component in RAG architecture
- Understand k-NN similarity concept clearly
Vector Database vs Traditional Database
| Feature | Vector Database | Traditional Database |
|---|---|---|
| Stores | Embeddings/vectors | Rows & columns |
| Search Type | Similarity search | Exact matching |
| Best For | AI retrieval systems | Transactional systems |
| Data Structure | High-dimensional vectors | Structured tables |
| AI Optimized | Yes | No |
| Common Use Cases | RAG, recommendation systems | Banking, ERP |
| AWS Example | OpenSearch vector engine | RDS, DynamoDB |
9. Prompt Engineering
Prompt engineering is the process of designing effective input instructions to guide large language models toward producing accurate and relevant outputs.
Key Points
- Controls and shapes behavior of language models through input design
- Includes zero-shot, one-shot, and few-shot prompting techniques
- Chain-of-thought prompting improves multi-step reasoning accuracy
- Well-designed prompts significantly improve response quality
- Used to optimize generative AI performance without retraining models
Example Scenario
A structured prompt guiding a model to generate step-by-step troubleshooting instructions.
Exam Tips
- Few-shot prompting improves accuracy using examples
- Chain-of-thought is used for reasoning problems
- Common scenario-based exam topic
10. Retrieval-Augmented Generation (RAG)
RAG combines information retrieval systems with generative AI to improve accuracy by grounding responses in external knowledge sources.
Key Points
- Combines search systems with generative AI models
- Retrieves relevant documents before generating responses
- Converts retrieved data into embeddings for similarity matching
- Reduces hallucinations by grounding responses in real data
- Widely used in enterprise knowledge-based chatbots
Example Scenario
A chatbot answering company policy questions using internal document retrieval.
Exam Tips
- RAG is the primary solution for hallucination problems
- Understand full flow: retrieve → embed → generate
- Frequently tested architecture question
11. Amazon Bedrock
Amazon Bedrock is a fully managed service providing access to foundation models via API without infrastructure management.
Key Points
- Fully managed serverless platform for generative AI applications
- Provides access to multiple foundation models via API
- Supports chatbots, embeddings, and RAG-based systems
- Includes guardrails for safety and compliance enforcement
- No need to deploy or manage underlying infrastructure
Example Scenario
A chatbot built using Claude model in Bedrock with enterprise data integration.
Exam Tips
- Bedrock = inference service, not training platform
- No infrastructure management required
- Key AWS generative AI service
12. Amazon SageMaker
Amazon SageMaker is a fully managed machine learning platform for building, training, deploying, and monitoring custom ML models.
Key Points
- End-to-end platform for custom machine learning model lifecycle
- Supports training, tuning, deployment, and monitoring workflows
- Includes tools like Studio, Canvas, and JumpStart
- Used for predictive analytics and classification problems
- Supports fine-tuning of custom ML models
Example Scenario
A company building a demand forecasting model using historical sales data.
Exam Tips
- SageMaker = custom ML development platform
- Not used for foundation model APIs like Bedrock
- Frequently compared with Bedrock in exams
Amazon Bedrock vs Amazon SageMaker
| Feature | Amazon Bedrock | Amazon SageMaker |
|---|---|---|
| Primary Purpose | Managed foundation model access for generative AI applications | End-to-end machine learning platform for building, training, and deploying ML models |
| Main Focus | Generative AI inference | Full ML lifecycle |
| Infrastructure Management | Fully serverless and managed by AWS | Customer manages more configurations and resources |
| Model Hosting | AWS hosts foundation models | Customer deploys and manages models/endpoints |
| Model Training | No training required for basic use | Supports full custom model training |
| Foundation Models | Access to multiple FMs through API | Can deploy and fine-tune open-source FMs |
| Typical Use Cases | Chatbots, summarization, Q&A, RAG, image generation | Fraud detection, forecasting, recommendation systems, predictive analytics |
| Generative AI Support | Native and primary purpose | Supported through JumpStart and custom deployments |
| Custom ML Algorithms | Limited | Full support |
| Fine-Tuning | Supported for selected Bedrock models | Extensive fine-tuning and custom training support |
| Continued Pre-training | Supported in limited Bedrock workflows | Fully supported |
| Inference Management | Fully managed inference APIs | Managed endpoints created and maintained |
| RAG Support | Built-in Knowledge Bases and embeddings integration | Requires custom architecture setup |
| Vector Database Integration | Easier integration for GenAI workflows | More manual integration work |
| Prompt Engineering | Core usage pattern | Optional depending on ML workload |
| Experiment Tracking | Limited | Extensive experiment tracking |
| Hyperparameter Tuning | Not primary feature | Native feature |
| AutoML | No | SageMaker Autopilot |
| Monitoring | CloudWatch integration | Model Monitor + CloudWatch |
| Governance | Guardrails and IAM integration | Model Registry, lineage, approvals |
| Security Responsibility | Mostly AWS-managed | More customer responsibility |
| Compliance Controls | Guardrails, IAM, KMS | IAM, VPC, encryption, registry |
| Cost Model | Token-based pricing | Compute/storage/inference pricing |
| Scaling | Automatic serverless scaling | Endpoint-based scaling |
| Batch Processing | Batch inference supported | Batch transform jobs |
| Latency Optimization | Provisioned Throughput | Real-time inference endpoints |
| Multimodal Support | Text, image, embeddings | Depends on deployed model |
| Guardrails | Native Bedrock Guardrails | Requires custom implementation |
13. Inference in AI
Inference is the process where a trained machine learning or foundation model uses learned patterns to generate predictions, classifications, recommendations, or content from new input data. In generative AI, inference occurs whenever a user sends prompts to a model and receives generated outputs.
Key Points
- Inference happens after model training is completed
- Uses trained models to process real-time or batch input requests
- In generative AI, inference generates text, images, audio, or code outputs
- Inference performance depends on latency, throughput, scalability, and token usage
- Inference cost is usually based on compute resources or token consumption
- Foundation models in Amazon Bedrock are primarily used for inference workloads
- Inference can be real-time, streaming, asynchronous, or batch-based
- Optimization techniques improve response speed and reduce operational cost
Example Scenario
A customer support chatbot uses a foundation model in Amazon Web Services Amazon Bedrock to generate answers instantly when users submit questions.
Types of Inference
1. Real-Time Inference
Real-time inference processes requests immediately and returns low-latency responses for interactive applications.
Key Points
- Designed for immediate response generation
- Used in chatbots, fraud detection, and recommendation systems
- Requires low latency and high availability
- Common in APIs and interactive AI systems
- Supports synchronous request-response workflows
Example Scenario
An online shopping website instantly recommends products while a customer browses items.
Exam Tips
- Real-time inference = immediate prediction or response
- Low latency is the primary requirement
- Commonly tested with chatbot and recommendation scenarios
2. Batch Inference
Batch inference processes large volumes of data together instead of individually in real time.
Key Points
- Optimized for cost-efficient large-scale processing
- Does not require immediate responses
- Suitable for offline analytics and reporting workloads
- Processes data in scheduled batches
- Lower operational cost compared to real-time inference
Example Scenario
A bank processes millions of transaction records overnight to identify fraud patterns.
Exam Tips
- Batch inference = cheapest inference option
- Used when latency is not important
- Common in analytics and large-scale prediction tasks
3. Streaming Inference
Streaming inference continuously processes incoming real-time data streams.
Key Points
- Handles continuous event-driven data processing
- Used for IoT, sensors, clickstreams, and live analytics
- Supports near real-time predictions
- Continuously updates predictions from incoming events
- Requires scalable infrastructure for high-throughput workloads
Example Scenario
A smart traffic system continuously analyzes live camera feeds to optimize traffic signals.
Exam Tips
- Streaming inference = continuous data processing
- Common with IoT and sensor-based workloads
- Frequently associated with event-driven architectures
Inference in Amazon Bedrock
Amazon Bedrock provides fully managed inference access to foundation models through APIs without infrastructure management.
Key Points
- Bedrock focuses mainly on inference, not full model training
- Supports text, image, embedding, and multimodal inference
- Uses serverless architecture for automatic scaling
- Supports multiple foundation models through a unified API
- Token-based pricing applies during inference requests
- Includes Guardrails for safe inference outputs
- Integrates with Knowledge Bases for RAG inference workflows
Example Scenario
A company uses Bedrock Titan Text model to summarize customer support tickets automatically.
Exam Tips
- Bedrock = managed inference service
- No infrastructure management required
- Token usage directly impacts inference cost
Inference Optimization in Amazon Bedrock
Inference optimization balances performance, latency, throughput, and operational cost.
1. On-Demand Inference
Processes requests dynamically without reserved capacity.
Key Points
- No long-term commitment required
- Automatically scales based on demand
- Best for unpredictable workloads
- Pay only for usage consumed
- Simplest deployment option
Exam Tips
- On-demand = flexible and serverless
- Best for variable traffic workloads
2. Provisioned Throughput
Reserved inference capacity for predictable workloads.
Key Points
- Provides guaranteed throughput and lower latency
- Best for production enterprise applications
- Reduces performance variability during peak usage
- More expensive but predictable
- Suitable for mission-critical applications
Example Scenario
A banking chatbot reserves inference capacity to maintain consistent response times during peak hours.
Exam Tips
- Provisioned Throughput = steady workloads
- Used for predictable traffic and low latency requirements
3. Batch Inference in Bedrock
Processes multiple inference requests together asynchronously.
Key Points
- Lowest-cost inference processing option
- Suitable for summarization and document analysis at scale
- Optimized for high-volume offline processing
- Does not provide immediate responses
- Reduces operational expenses significantly
Exam Tips
- Batch inference = cost optimization strategy
- Best for large-scale offline workloads
4. Priority Inference
Provides higher priority request handling for latency-sensitive applications.
Key Points
- Optimized for mission-critical low-latency applications
- Faster request processing compared to standard inference
- Suitable for customer-facing production systems
- Higher operational cost compared to normal inference
- Ensures predictable response performance
Exam Tips
- Priority = lowest latency option
- Used for premium production workloads
Inference Modes Comparison
| Feature | Real-Time Inference | Batch Inference | Streaming Inference |
|---|---|---|---|
| Response Type | Immediate full response | Delayed bulk response | Incremental response |
| Latency | Low | High | Very low perceived latency |
| Cost | Medium to High | Lowest | Medium |
| Throughput | Moderate | Very High | Moderate |
| Processing Style | One request at a time | Large grouped requests | Continuous token streaming |
| Best For | Interactive apps | Offline processing | Conversational AI |
| Infrastructure Need | Always active | Scheduled workloads | Persistent connection |
| Common Protocols | REST/HTTP APIs | Batch jobs | WebSocket/Event streams |
| Typical AWS Services | Bedrock, Lambda | S3, Batch, SageMaker | Bedrock Streaming, API Gateway WebSocket |
14. Selection of API Layer
API layers in Generative AI architectures manage how applications communicate with foundation models and backend services. AWS provides multiple API management options such as Amazon API Gateway and AWS AppSync depending on latency, streaming, real-time communication, and data orchestration requirements.
1. Amazon API Gateway
Amazon API Gateway is a fully managed AWS service used to create, publish, secure, monitor, and scale REST, HTTP, and WebSocket APIs. In Generative AI systems, it commonly acts as the entry point between client applications and services such as AWS Lambda or Amazon Bedrock.
Key Points
- API Gateway securely exposes GenAI APIs to applications and users
- Supports REST APIs, HTTP APIs, and WebSocket APIs
- Frequently integrated with AWS Lambda for GenAI orchestration
- Commonly used with Amazon Bedrock inference endpoints
- Provides throttling, authentication, authorization, and monitoring
- Supports synchronous and asynchronous communication patterns
- Helps scale GenAI APIs automatically based on request traffic
- Integrates with IAM, Cognito, CloudWatch, and WAF for security
1.1 Synchronous Invocation (Request/Response)
Synchronous invocation means the client waits until the model finishes generating the response. This pattern is commonly used for interactive AI applications where immediate answers are required.
Key Points
- Client sends prompt and waits for immediate model response
- API Gateway invokes Lambda synchronously
- Lambda interacts with Bedrock or other GenAI services
- Suitable for low-latency conversational applications
- Common in chatbots, summarization, and Q&A systems
- Works well when response generation finishes quickly
- Best for interactive user experiences requiring instant output
Example Scenario
A customer support chatbot sends a user question through API Gateway → Lambda → Bedrock Claude model → response immediately returned to the web application.
Common GenAI Use Cases
- AI chatbots
- Text summarization APIs
- Q&A assistants
- Content generation APIs
- Real-time prompt-response applications
- AI-powered search assistants
Challenges and Issues
- REST and HTTP APIs have approximately 29-second timeout limits
- Long-running GenAI responses may fail before completion
- Large LLM outputs increase latency significantly
- Streaming responses are difficult with basic REST APIs
- High concurrency may increase Lambda execution costs
1.2 WebSocket API (Persistent Streaming)
WebSocket APIs maintain persistent bidirectional connections between client and server, enabling streaming AI responses in real time.
Key Points
- Supports continuous two-way communication
- Ideal for token streaming from LLMs
- Improves user experience for long responses
- Reduces perceived latency during generation
- Common for AI chat applications and live assistants
- Enables incremental response delivery token-by-token
Example Scenario
A coding assistant streams generated code gradually to the browser instead of waiting for the full response to complete.
Exam Tips
- REST API = request/response pattern
- WebSocket = persistent streaming communication
- API Gateway commonly integrates with Lambda and Bedrock
- 29-second timeout is a highly tested exam topic
- WebSocket is preferred for long GenAI responses
1.3 Asynchronous Invocation Pattern
Asynchronous processing is used when GenAI tasks require long execution time and users do not need immediate responses.
Key Points
- Client submits request without waiting for completion
- Backend processes request independently
- Results stored in S3, database, or notification system
- Improves scalability for long-running AI tasks
- Reduces timeout limitations in synchronous APIs
- Often uses SQS, EventBridge, or Step Functions
Example Scenario
A document analysis system uploads PDFs for AI summarization and sends completion notifications after processing finishes.
Common Async Use Cases
- Large document summarization
- Batch inference workloads
- Video or audio processing
- Long-running content generation
- Enterprise report generation
Exam Tips
- Async pattern avoids API timeout issues
- Good for heavy inference workloads
- Step Functions often orchestrate async GenAI workflows
- SQS frequently appears in decoupled AI architectures
2. AWS AppSync
AWS AppSync is a fully managed AWS service used to build GraphQL APIs with real-time synchronization and offline capabilities. It is commonly used for GenAI applications requiring live updates and efficient data retrieval.
Key Points
- Fully managed GraphQL API service
- Allows clients to request only required data
- Combines multiple backend sources in one query
- Supports real-time subscriptions using WebSockets
- Integrates with Lambda and Bedrock for AI workflows
- Provides offline synchronization for mobile/web apps
- Reduces over-fetching and under-fetching problems
- Supports DynamoDB, Lambda, OpenSearch, and HTTP sources
2.1 GraphQL Concept
GraphQL is a query language for APIs where clients request exactly the data they need instead of receiving fixed responses.
Key Points
- Client controls requested response structure
- Multiple resources retrieved in one request
- Efficient for frontend-heavy applications
- Reduces unnecessary network traffic
- Useful for dynamic GenAI interfaces and dashboards
2.2 Real-Time Subscriptions
AppSync supports live data streaming using GraphQL subscriptions over WebSockets.
Key Points
- Enables instant frontend updates
- Supports collaborative and live AI applications
- Automatically pushes updates to connected clients
- Useful for streaming AI responses and live dashboards
- Reduces polling overhead on frontend applications
Example Scenario
A collaborative AI writing application streams generated content updates to multiple connected users in real time.
API Layer Comparison
| Feature | HTTP API | REST API | WebSocket API | Private API |
|---|---|---|---|---|
| Primary Purpose | Lightweight API communication | Full-featured REST management | Persistent bidirectional communication | Internal-only API access |
| Communication Type | Request/Response | Request/Response | Full duplex real-time | Internal request/response |
| Protocol | HTTP | HTTP/HTTPS | WebSocket | HTTP/HTTPS |
| Real-Time Support | Limited | Limited | Excellent | Depends on implementation |
| Persistent Connection | No | No | Yes | No |
| Best Use Case | Simple low-cost APIs | Enterprise REST APIs | Streaming and live apps | Internal enterprise systems |
| Latency | Low | Moderate | Very low after connection established | Low |
| Cost | Lowest | Higher | Connection + message based | Similar to REST/HTTP |
| API Features | Basic routing and auth | Advanced API management | Real-time messaging | Restricted internal access |
| Authorization Options | IAM, JWT, Lambda auth | IAM, Cognito, Lambda auth | IAM, Lambda auth | VPC-based access |
| Caching Support | Limited | Advanced caching | Not typical | Depends on gateway type |
| Request Validation | Basic | Advanced validation | Minimal | Depends on implementation |
| Transformation Support | Limited | Strong mapping/transformation | Minimal | Supported |
| Usage Plans & API Keys | Limited | Full support | Limited | Supported |
| Custom Domain Support | Yes | Yes | Yes | Yes |
| Monitoring | CloudWatch | CloudWatch | CloudWatch | CloudWatch |
| Security Level | High | Very High | High | Highest internal isolation |
| Internet Accessible | Yes | Yes | Yes | No |
| VPC Integration | Supported | Supported | Supported | Mandatory |
| Streaming Capability | Limited | Limited | Native support | Depends |
| Common GenAI Usage | Lightweight AI APIs | Enterprise AI APIs | LLM streaming/chat | Internal AI systems |
15. Amazon Comprehend
Amazon Comprehend is a natural language processing service that extracts insights from text such as sentiment, entities, key phrases, and PII.
Key Points
- Fully managed NLP service for text analysis
- Performs sentiment analysis and entity recognition
- Detects key phrases and language automatically
- Identifies sensitive PII data in text documents
- No model training required from customers
Example Scenario
Analyzing customer reviews to detect sentiment trends and product feedback.
Exam Tips
- Works only on text-based input
- No training required
- Focused on extraction, not generation
16. Guardrails for Amazon Bedrock
Guardrails provide safety and compliance controls for generative AI systems by filtering unsafe content and enforcing policy rules.
Key Points
- Acts as safety layer for generative AI input and output
- Filters harmful, toxic, or restricted content automatically
- Detects and removes sensitive personal data (PII)
- Ensures compliance with organizational policies and regulations
- Helps enforce responsible AI usage in production systems
Example Scenario
Blocking a chatbot response that contains sensitive personal information.
Exam Tips
- Guardrails apply before and after model inference
- Essential for responsible AI compliance
- Frequently tested in security-related questions
17. Monitoring (CloudWatch & CloudTrail)
Monitoring ensures observability, performance tracking, and auditing of AI systems using AWS services.
Key Points
- CloudWatch monitors system performance metrics in real time
- Tracks latency, errors, and usage patterns in AI applications
- CloudTrail records all API calls for auditing and compliance
- Provides security traceability and operational visibility
- Helps detect anomalies and performance issues
Example Scenario
Tracking high latency in a generative AI application using CloudWatch metrics.
Exam Tips
- CloudWatch = metrics and performance monitoring
- CloudTrail = API activity logging and auditing
- Very commonly tested distinction
18. Cost Management in AI Systems
Cost management in AI focuses on optimizing spending across model usage, training, inference, and infrastructure while maintaining performance and scalability.
Key Points
- Bedrock uses token-based pricing for both input and output usage
- Provisioned throughput allows reserved capacity for predictable workloads
- SageMaker charges based on compute instances, storage, and training time
- Batch processing reduces cost by running inference offline instead of real-time
- AWS Cost Explorer helps analyze and visualize AI service spending trends
- AWS Budgets enables proactive alerts when AI usage exceeds thresholds
Example Scenario
A company reduces chatbot cost by switching from real-time inference to batch summarization jobs for non-urgent queries.
Exam Tips
- Bedrock cost = tokens consumed (input + output)
- Batch inference is always cheaper than real-time inference
- Provisioned throughput = predictable high-volume workloads
19. Observability (Monitoring & Logging for AI Systems)
Observability ensures visibility into AI system performance, behavior, and failures using metrics, logs, and traces.
Key Points
- CloudWatch provides real-time monitoring of latency, errors, and usage metrics
- CloudWatch Logs store application and model execution logs for debugging
- CloudTrail captures all API calls for auditing and traceability
- Observability helps detect performance degradation in AI applications
- Enables proactive alerting using alarms and anomaly detection
- Essential for production-grade generative AI systems
Example Scenario
Detecting increased response latency in a Bedrock-powered chatbot using CloudWatch alarms.
Exam Tips
- CloudWatch = metrics + performance monitoring
- CloudTrail = API audit history ("who did what and when")
- Logs + metrics + traces together = full observability
20. Security for AI Systems
Security ensures protection of data, models, and AI outputs from unauthorized access, leakage, and misuse.
Key Points
- Encryption at rest is enforced using AWS KMS and secure S3 storage
- Encryption in transit uses TLS/SSL for secure API communication
- VPC endpoints allow private access to AI services without internet exposure
- IAM policies enforce least-privilege access control for AI resources
- Bedrock ensures customer data is not used for training base models
- Amazon Macie helps detect sensitive data such as PII in datasets
Example Scenario
A healthcare application securely processes patient data using encrypted S3 storage and private VPC endpoints.
Exam Tips
- Encryption at rest = stored data protection
- Encryption in transit = API communication security
- IAM roles are preferred for temporary ML access
21. Governance for AI Systems
Governance ensures AI systems follow organizational policies, regulatory requirements, and audit standards.
Key Points
- AWS CloudTrail provides audit logs for all AI service activity
- AWS Config tracks configuration changes in AI infrastructure over time
- AWS Artifact provides compliance reports such as SOC, ISO, and HIPAA
- Model registries track versions, approvals, and lifecycle of ML models
- Governance ensures compliance with GDPR, HIPAA, and internal policies
- Enables accountability, traceability, and controlled model deployment
Example Scenario
An enterprise uses model registry approval workflows before deploying a new fraud detection model.
Exam Tips
- CloudTrail = audit logs (critical governance tool)
- Model registry = version control + approval pipeline
- Governance = compliance + traceability + control
