Saturday, March 28, 2026

What is CDK and given an example

 Here is a comprehensive explanation of AWS CDK, the older approaches it replaced, and a complete example of building an image analysis pipeline that triggers on S3 uploads.


---


## 1. What is AWS CDK?


**AWS CDK (Cloud Development Kit)** is an infrastructure-as-code (IaC) framework that allows you to define AWS cloud resources using familiar programming languages like **TypeScript, Python, Java, C#, Go, and others**.


Instead of writing YAML or JSON templates manually, you write code that leverages object-oriented programming concepts:

- **Constructs**: The basic building blocks of CDK apps. They can be low-level resources (like an S3 bucket) or high-level components that encapsulate multiple resources.

- **Stacks**: A unit of deployment. All resources defined within a stack are deployed together.

- **Apps**: A container for one or more stacks.


The CDK synthesizes (compiles) your code into AWS CloudFormation templates and then deploys them, giving you the benefits of both programming (loops, conditionals, reuse) and CloudFormation's managed deployment capabilities.


---


## 2. Older Ways: Before CDK


Prior to the CDK, AWS infrastructure provisioning evolved through several approaches, each with significant limitations:


### A. Manual Console Clicking

- **How it worked**: Engineers would log into the AWS Management Console and manually click through menus to create resources.

- **Problems**: Error-prone, unrepeatable, no version control, "snowflake" environments, and impossible to scale or audit.


### B. AWS CLI Scripts

- **How it worked**: Bash or PowerShell scripts that called `aws` commands (e.g., `aws s3 mb`, `aws lambda create-function`).

- **Problems**: Scripts became complex and fragile. Error handling was manual. No dependency management—if Lambda depended on DynamoDB, you had to ensure the database was created first. Teardown was nearly impossible.


### C. AWS CloudFormation (YAML/JSON)

- **How it worked**: Engineers wrote YAML or JSON templates describing all resources and their relationships. CloudFormation handled deployment order, rollbacks, and drift detection.

- **Problems**: 

  - YAML/JSON is **not a programming language**. You couldn't write loops, conditionals, or reuse logic easily.

  - Templates became massive and unreadable (1000+ lines of YAML).

  - You had to use intrinsic functions (`!Ref`, `!GetAtt`, `!Sub`) which were hard to debug.

  - Sharing logic across stacks required manual copy-paste or nested stacks.


### D. Terraform (HCL)

- **How it worked**: HashiCorp Terraform used HCL (HashiCorp Configuration Language), which was more expressive than YAML but still not a full programming language.

- **Problems**: While better than CloudFormation, HCL lacked native programming constructs, and managing state files introduced operational overhead.


### The CDK Revolution

CDK solved these problems by bringing **real programming** to infrastructure:

```python

# CDK: Clean, readable, reusable

bucket = s3.Bucket(self, "MyBucket",

    versioned=True,

    removal_policy=RemovalPolicy.DESTROY

)


# vs CloudFormation: Verbose, repetitive

# MyBucket:

#   Type: AWS::S3::Bucket

#   Properties:

#     VersioningConfiguration:

#       Status: Enabled

#   DeletionPolicy: Delete

```


---


## 3. Complete Example: Image Analysis Pipeline


Let's build a CDK application in **Python** that creates:

1. An **S3 bucket** for image uploads

2. A **Lambda function** that triggers when images are uploaded

3. The Lambda uses **Amazon Rekognition** to analyze the image

4. Results are stored in **DynamoDB**


### Prerequisites

- AWS CLI configured

- Node.js (for CDK)

- Python 3.8+

- Docker (for Lambda layer packaging)


### Step 1: Initialize CDK App

```bash

mkdir image-analyzer

cd image-analyzer

cdk init app --language python

source .venv/bin/activate

pip install aws-cdk-lib boto3

```


### Step 2: Create the Lambda Function Code

Create `lambda/analyze_image.py`:


```python

import boto3

import json

import os

from datetime import datetime

import logging


logger = logging.getLogger()

logger.setLevel(logging.INFO)


# Initialize AWS clients

rekognition = boto3.client('rekognition')

dynamodb = boto3.resource('dynamodb')

s3 = boto3.client('s3')


# Environment variables

TABLE_NAME = os.environ['TABLE_NAME']


def lambda_handler(event, context):

    """

    Triggered by S3 PUT events.

    Analyzes uploaded image with Rekognition and stores results in DynamoDB.

    """

    logger.info(f"Received event: {json.dumps(event)}")

    

    # Extract S3 object details from event

    for record in event['Records']:

        bucket = record['s3']['bucket']['name']

        key = record['s3']['object']['key']

        

        logger.info(f"Processing {key} from {bucket}")

        

        try:

            # Step 1: Detect labels in the image

            response = rekognition.detect_labels(

                Image={

                    'S3Object': {

                        'Bucket': bucket,

                        'Name': key

                    }

                },

                MaxLabels=10,

                MinConfidence=70

            )

            

            labels = [

                {

                    'name': label['Name'],

                    'confidence': label['Confidence']

                }

                for label in response['Labels']

            ]

            

            # Step 2: Detect faces (if any)

            face_response = rekognition.detect_faces(

                Image={

                    'S3Object': {

                        'Bucket': bucket,

                        'Name': key

                    }

                },

                Attributes=['ALL']

            )

            

            face_count = len(face_response['FaceDetails'])

            

            # Step 3: Detect text in image

            text_response = rekognition.detect_text(

                Image={

                    'S3Object': {

                        'Bucket': bucket,

                        'Name': key

                    }

                }

            )

            

            detected_text = [

                text['DetectedText'] 

                for text in text_response['TextDetections']

                if text['Type'] == 'WORD'

            ]

            

            # Step 4: Prepare DynamoDB item

            table = dynamodb.Table(TABLE_NAME)

            item = {

                'image_id': key,  # Partition key

                'timestamp': datetime.utcnow().isoformat(),

                'bucket': bucket,

                'labels': labels,

                'face_count': face_count,

                'detected_text': detected_text,

                'processed': True

            }

            

            # Step 5: Store in DynamoDB

            table.put_item(Item=item)

            

            logger.info(f"Successfully processed {key}. Found {len(labels)} labels, {face_count} faces")

            

        except Exception as e:

            logger.error(f"Error processing {key}: {str(e)}")

            # Store failed item for debugging

            table = dynamodb.Table(TABLE_NAME)

            table.put_item(Item={

                'image_id': key,

                'timestamp': datetime.utcnow().isoformat(),

                'error': str(e),

                'processed': False

            })

            raise

    

    return {

        'statusCode': 200,

        'body': json.dumps('Processing complete')

    }

```


### Step 3: Create the CDK Stack

Create `stacks/image_analyzer_stack.py`:


```python

from aws_cdk import (

    Stack,

    aws_s3 as s3,

    aws_lambda as lambda_,

    aws_dynamodb as dynamodb,

    aws_lambda_event_sources as event_sources,

    aws_iam as iam,

    Duration,

    RemovalPolicy,

    CfnOutput

)

from constructs import Construct

import os


class ImageAnalyzerStack(Stack):

    def __init__(self, scope: Construct, id: str, **kwargs) -> None:

        super().__init__(scope, id, **kwargs)

        

        # Step 1: Create DynamoDB Table

        table = dynamodb.Table(

            self, "ImageAnalysisTable",

            table_name="image-analysis-results",

            partition_key=dynamodb.Attribute(

                name="image_id",

                type=dynamodb.AttributeType.STRING

            ),

            billing_mode=dynamodb.BillingMode.PAY_PER_REQUEST,

            removal_policy=RemovalPolicy.DESTROY  # Only for dev/demo

        )

        

        # Step 2: Create S3 Bucket for image uploads

        bucket = s3.Bucket(

            self, "ImageUploadBucket",

            bucket_name=f"image-uploads-{self.account}-{self.region}",

            versioned=True,

            removal_policy=RemovalPolicy.DESTROY,

            auto_delete_objects=True,  # Clean up when stack is destroyed

            encryption=s3.BucketEncryption.S3_MANAGED,

            block_public_access=s3.BlockPublicAccess.BLOCK_ALL

        )

        

        # Step 3: Create Lambda Function

        # Package the Lambda code

        lambda_function = lambda_.Function(

            self, "ImageAnalyzerFunction",

            function_name="image-analyzer",

            runtime=lambda_.Runtime.PYTHON_3_12,

            handler="analyze_image.lambda_handler",

            code=lambda_.Code.from_asset("lambda"),

            timeout=Duration.seconds(60),

            memory_size=512,

            environment={

                "TABLE_NAME": table.table_name

            }

        )

        

        # Step 4: Add S3 trigger to Lambda

        lambda_function.add_event_source(

            event_sources.S3EventSource(

                bucket,

                events=[s3.EventType.OBJECT_CREATED],

                filters=[s3.NotificationKeyFilter(suffix=".jpg"), 

                        s3.NotificationKeyFilter(suffix=".png"),

                        s3.NotificationKeyFilter(suffix=".jpeg")]

            )

        )

        

        # Step 5: Grant permissions

        

        # Grant Lambda permissions to read from S3

        bucket.grant_read(lambda_function)

        

        # Grant Lambda permissions to write to DynamoDB

        table.grant_write_data(lambda_function)

        

        # Grant Lambda permissions to use Rekognition

        lambda_function.add_to_role_policy(

            iam.PolicyStatement(

                actions=[

                    "rekognition:DetectLabels",

                    "rekognition:DetectFaces",

                    "rekognition:DetectText"

                ],

                resources=["*"]  # Rekognition doesn't support resource-level permissions

            )

        )

        

        # Step 6: Outputs for reference

        CfnOutput(self, "BucketName", value=bucket.bucket_name)

        CfnOutput(self, "TableName", value=table.table_name)

        CfnOutput(self, "LambdaFunctionName", value=lambda_function.function_name)

```


### Step 4: Update App Entry Point

Update `app.py`:


```python

#!/usr/bin/env python3

import aws_cdk as cdk

from stacks.image_analyzer_stack import ImageAnalyzerStack


app = cdk.App()

ImageAnalyzerStack(app, "ImageAnalyzerStack",

    env=cdk.Environment(

        account=os.environ.get("CDK_DEFAULT_ACCOUNT"),

        region=os.environ.get("CDK_DEFAULT_REGION", "us-east-1")

    )

)


app.synth()

```


### Step 5: Deploy the Application

```bash

# Bootstrap CDK (only once per account/region)

cdk bootstrap


# Synthesize and view the CloudFormation template

cdk synth


# Deploy the stack

cdk deploy

```


### Step 6: Test the Application

```bash

# Upload an image to test

aws s3 cp test-image.jpg s3://image-uploads-123456789-us-east-1/


# Check DynamoDB for results

aws dynamodb scan --table-name image-analysis-results

```


---


## 4. What Happens Behind the Scenes?


When you upload an image to the S3 bucket:


1. **S3 Event Notification**: S3 detects the PUT event and invokes the Lambda function with event details.


2. **Lambda Execution**:

   - Extracts bucket name and object key from the event

   - Calls Amazon Rekognition's `detect_labels()` to identify objects, scenes, and concepts

   - Calls `detect_faces()` to identify if people are in the image

   - Calls `detect_text()` to extract any text visible


3. **Rekognition Analysis**:

   - Returns labels like "Person", "Dog", "Beach" with confidence scores

   - Returns facial attributes (age range, emotions, etc.)

   - Returns detected words and their bounding boxes


4. **DynamoDB Storage**: All results are stored with the image ID as the partition key for fast retrieval


5. **Error Handling**: If any step fails, the error is logged and stored in DynamoDB with `processed: false`


---


## 5. Key CDK Advantages Demonstrated


| Feature | Benefit in This Example |

|---------|------------------------|

| **Resource Relationships** | `bucket.grant_read(lambda_function)` automatically creates the correct IAM policies |

| **Environment Variables** | `TABLE_NAME` is injected automatically from the created DynamoDB table |

| **Event Sources** | `S3EventSource` handles all the complex S3 notification configuration |

| **Type Safety** | IDE autocomplete prevents typos in resource names and methods |

| **Reusability** | The entire stack can be deployed to multiple environments with one line change |


---


## 6. Extending the Example


You could easily extend this CDK application to:

- Add **Amazon SQS** for dead-letter queue to handle failed processing

- Add **CloudFront** distribution for serving processed results

- Add **API Gateway** to expose a REST API for querying results

- Add **Step Functions** for complex workflows (multiple analysis stages)

- Deploy to **multiple environments** (dev, staging, prod) with different configurations


This demonstrates the power of CDK: **infrastructure defined in real code** that is testable, shareable, and maintainable.


Would you like me to explain any specific part in more detail, or show how to add additional features like a frontend or API?

No comments:

Post a Comment