Here is a comprehensive explanation of AWS CDK, the older approaches it replaced, and a complete example of building an image analysis pipeline that triggers on S3 uploads.
---
## 1. What is AWS CDK?
**AWS CDK (Cloud Development Kit)** is an infrastructure-as-code (IaC) framework that allows you to define AWS cloud resources using familiar programming languages like **TypeScript, Python, Java, C#, Go, and others**.
Instead of writing YAML or JSON templates manually, you write code that leverages object-oriented programming concepts:
- **Constructs**: The basic building blocks of CDK apps. They can be low-level resources (like an S3 bucket) or high-level components that encapsulate multiple resources.
- **Stacks**: A unit of deployment. All resources defined within a stack are deployed together.
- **Apps**: A container for one or more stacks.
The CDK synthesizes (compiles) your code into AWS CloudFormation templates and then deploys them, giving you the benefits of both programming (loops, conditionals, reuse) and CloudFormation's managed deployment capabilities.
---
## 2. Older Ways: Before CDK
Prior to the CDK, AWS infrastructure provisioning evolved through several approaches, each with significant limitations:
### A. Manual Console Clicking
- **How it worked**: Engineers would log into the AWS Management Console and manually click through menus to create resources.
- **Problems**: Error-prone, unrepeatable, no version control, "snowflake" environments, and impossible to scale or audit.
### B. AWS CLI Scripts
- **How it worked**: Bash or PowerShell scripts that called `aws` commands (e.g., `aws s3 mb`, `aws lambda create-function`).
- **Problems**: Scripts became complex and fragile. Error handling was manual. No dependency management—if Lambda depended on DynamoDB, you had to ensure the database was created first. Teardown was nearly impossible.
### C. AWS CloudFormation (YAML/JSON)
- **How it worked**: Engineers wrote YAML or JSON templates describing all resources and their relationships. CloudFormation handled deployment order, rollbacks, and drift detection.
- **Problems**:
- YAML/JSON is **not a programming language**. You couldn't write loops, conditionals, or reuse logic easily.
- Templates became massive and unreadable (1000+ lines of YAML).
- You had to use intrinsic functions (`!Ref`, `!GetAtt`, `!Sub`) which were hard to debug.
- Sharing logic across stacks required manual copy-paste or nested stacks.
### D. Terraform (HCL)
- **How it worked**: HashiCorp Terraform used HCL (HashiCorp Configuration Language), which was more expressive than YAML but still not a full programming language.
- **Problems**: While better than CloudFormation, HCL lacked native programming constructs, and managing state files introduced operational overhead.
### The CDK Revolution
CDK solved these problems by bringing **real programming** to infrastructure:
```python
# CDK: Clean, readable, reusable
bucket = s3.Bucket(self, "MyBucket",
versioned=True,
removal_policy=RemovalPolicy.DESTROY
)
# vs CloudFormation: Verbose, repetitive
# MyBucket:
# Type: AWS::S3::Bucket
# Properties:
# VersioningConfiguration:
# Status: Enabled
# DeletionPolicy: Delete
```
---
## 3. Complete Example: Image Analysis Pipeline
Let's build a CDK application in **Python** that creates:
1. An **S3 bucket** for image uploads
2. A **Lambda function** that triggers when images are uploaded
3. The Lambda uses **Amazon Rekognition** to analyze the image
4. Results are stored in **DynamoDB**
### Prerequisites
- AWS CLI configured
- Node.js (for CDK)
- Python 3.8+
- Docker (for Lambda layer packaging)
### Step 1: Initialize CDK App
```bash
mkdir image-analyzer
cd image-analyzer
cdk init app --language python
source .venv/bin/activate
pip install aws-cdk-lib boto3
```
### Step 2: Create the Lambda Function Code
Create `lambda/analyze_image.py`:
```python
import boto3
import json
import os
from datetime import datetime
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
# Initialize AWS clients
rekognition = boto3.client('rekognition')
dynamodb = boto3.resource('dynamodb')
s3 = boto3.client('s3')
# Environment variables
TABLE_NAME = os.environ['TABLE_NAME']
def lambda_handler(event, context):
"""
Triggered by S3 PUT events.
Analyzes uploaded image with Rekognition and stores results in DynamoDB.
"""
logger.info(f"Received event: {json.dumps(event)}")
# Extract S3 object details from event
for record in event['Records']:
bucket = record['s3']['bucket']['name']
key = record['s3']['object']['key']
logger.info(f"Processing {key} from {bucket}")
try:
# Step 1: Detect labels in the image
response = rekognition.detect_labels(
Image={
'S3Object': {
'Bucket': bucket,
'Name': key
}
},
MaxLabels=10,
MinConfidence=70
)
labels = [
{
'name': label['Name'],
'confidence': label['Confidence']
}
for label in response['Labels']
]
# Step 2: Detect faces (if any)
face_response = rekognition.detect_faces(
Image={
'S3Object': {
'Bucket': bucket,
'Name': key
}
},
Attributes=['ALL']
)
face_count = len(face_response['FaceDetails'])
# Step 3: Detect text in image
text_response = rekognition.detect_text(
Image={
'S3Object': {
'Bucket': bucket,
'Name': key
}
}
)
detected_text = [
text['DetectedText']
for text in text_response['TextDetections']
if text['Type'] == 'WORD'
]
# Step 4: Prepare DynamoDB item
table = dynamodb.Table(TABLE_NAME)
item = {
'image_id': key, # Partition key
'timestamp': datetime.utcnow().isoformat(),
'bucket': bucket,
'labels': labels,
'face_count': face_count,
'detected_text': detected_text,
'processed': True
}
# Step 5: Store in DynamoDB
table.put_item(Item=item)
logger.info(f"Successfully processed {key}. Found {len(labels)} labels, {face_count} faces")
except Exception as e:
logger.error(f"Error processing {key}: {str(e)}")
# Store failed item for debugging
table = dynamodb.Table(TABLE_NAME)
table.put_item(Item={
'image_id': key,
'timestamp': datetime.utcnow().isoformat(),
'error': str(e),
'processed': False
})
raise
return {
'statusCode': 200,
'body': json.dumps('Processing complete')
}
```
### Step 3: Create the CDK Stack
Create `stacks/image_analyzer_stack.py`:
```python
from aws_cdk import (
Stack,
aws_s3 as s3,
aws_lambda as lambda_,
aws_dynamodb as dynamodb,
aws_lambda_event_sources as event_sources,
aws_iam as iam,
Duration,
RemovalPolicy,
CfnOutput
)
from constructs import Construct
import os
class ImageAnalyzerStack(Stack):
def __init__(self, scope: Construct, id: str, **kwargs) -> None:
super().__init__(scope, id, **kwargs)
# Step 1: Create DynamoDB Table
table = dynamodb.Table(
self, "ImageAnalysisTable",
table_name="image-analysis-results",
partition_key=dynamodb.Attribute(
name="image_id",
type=dynamodb.AttributeType.STRING
),
billing_mode=dynamodb.BillingMode.PAY_PER_REQUEST,
removal_policy=RemovalPolicy.DESTROY # Only for dev/demo
)
# Step 2: Create S3 Bucket for image uploads
bucket = s3.Bucket(
self, "ImageUploadBucket",
bucket_name=f"image-uploads-{self.account}-{self.region}",
versioned=True,
removal_policy=RemovalPolicy.DESTROY,
auto_delete_objects=True, # Clean up when stack is destroyed
encryption=s3.BucketEncryption.S3_MANAGED,
block_public_access=s3.BlockPublicAccess.BLOCK_ALL
)
# Step 3: Create Lambda Function
# Package the Lambda code
lambda_function = lambda_.Function(
self, "ImageAnalyzerFunction",
function_name="image-analyzer",
runtime=lambda_.Runtime.PYTHON_3_12,
handler="analyze_image.lambda_handler",
code=lambda_.Code.from_asset("lambda"),
timeout=Duration.seconds(60),
memory_size=512,
environment={
"TABLE_NAME": table.table_name
}
)
# Step 4: Add S3 trigger to Lambda
lambda_function.add_event_source(
event_sources.S3EventSource(
bucket,
events=[s3.EventType.OBJECT_CREATED],
filters=[s3.NotificationKeyFilter(suffix=".jpg"),
s3.NotificationKeyFilter(suffix=".png"),
s3.NotificationKeyFilter(suffix=".jpeg")]
)
)
# Step 5: Grant permissions
# Grant Lambda permissions to read from S3
bucket.grant_read(lambda_function)
# Grant Lambda permissions to write to DynamoDB
table.grant_write_data(lambda_function)
# Grant Lambda permissions to use Rekognition
lambda_function.add_to_role_policy(
iam.PolicyStatement(
actions=[
"rekognition:DetectLabels",
"rekognition:DetectFaces",
"rekognition:DetectText"
],
resources=["*"] # Rekognition doesn't support resource-level permissions
)
)
# Step 6: Outputs for reference
CfnOutput(self, "BucketName", value=bucket.bucket_name)
CfnOutput(self, "TableName", value=table.table_name)
CfnOutput(self, "LambdaFunctionName", value=lambda_function.function_name)
```
### Step 4: Update App Entry Point
Update `app.py`:
```python
#!/usr/bin/env python3
import aws_cdk as cdk
from stacks.image_analyzer_stack import ImageAnalyzerStack
app = cdk.App()
ImageAnalyzerStack(app, "ImageAnalyzerStack",
env=cdk.Environment(
account=os.environ.get("CDK_DEFAULT_ACCOUNT"),
region=os.environ.get("CDK_DEFAULT_REGION", "us-east-1")
)
)
app.synth()
```
### Step 5: Deploy the Application
```bash
# Bootstrap CDK (only once per account/region)
cdk bootstrap
# Synthesize and view the CloudFormation template
cdk synth
# Deploy the stack
cdk deploy
```
### Step 6: Test the Application
```bash
# Upload an image to test
aws s3 cp test-image.jpg s3://image-uploads-123456789-us-east-1/
# Check DynamoDB for results
aws dynamodb scan --table-name image-analysis-results
```
---
## 4. What Happens Behind the Scenes?
When you upload an image to the S3 bucket:
1. **S3 Event Notification**: S3 detects the PUT event and invokes the Lambda function with event details.
2. **Lambda Execution**:
- Extracts bucket name and object key from the event
- Calls Amazon Rekognition's `detect_labels()` to identify objects, scenes, and concepts
- Calls `detect_faces()` to identify if people are in the image
- Calls `detect_text()` to extract any text visible
3. **Rekognition Analysis**:
- Returns labels like "Person", "Dog", "Beach" with confidence scores
- Returns facial attributes (age range, emotions, etc.)
- Returns detected words and their bounding boxes
4. **DynamoDB Storage**: All results are stored with the image ID as the partition key for fast retrieval
5. **Error Handling**: If any step fails, the error is logged and stored in DynamoDB with `processed: false`
---
## 5. Key CDK Advantages Demonstrated
| Feature | Benefit in This Example |
|---------|------------------------|
| **Resource Relationships** | `bucket.grant_read(lambda_function)` automatically creates the correct IAM policies |
| **Environment Variables** | `TABLE_NAME` is injected automatically from the created DynamoDB table |
| **Event Sources** | `S3EventSource` handles all the complex S3 notification configuration |
| **Type Safety** | IDE autocomplete prevents typos in resource names and methods |
| **Reusability** | The entire stack can be deployed to multiple environments with one line change |
---
## 6. Extending the Example
You could easily extend this CDK application to:
- Add **Amazon SQS** for dead-letter queue to handle failed processing
- Add **CloudFront** distribution for serving processed results
- Add **API Gateway** to expose a REST API for querying results
- Add **Step Functions** for complex workflows (multiple analysis stages)
- Deploy to **multiple environments** (dev, staging, prod) with different configurations
This demonstrates the power of CDK: **infrastructure defined in real code** that is testable, shareable, and maintainable.
Would you like me to explain any specific part in more detail, or show how to add additional features like a frontend or API?
No comments:
Post a Comment