Friday, April 3, 2026

How does OpenClaw work?

 ## How OpenClaw Works


OpenClaw is an **always-on agent runtime** that acts as a control plane for AI automations . Think of it as a small operating system for agents - it continuously listens for events, manages sessions, queues work, and executes tools .


### The Agent Loop (Core Mechanism)


OpenClaw operates through a **serialized agentic loop** per session . Here's how it works:


```mermaid

flowchart TD

    A[Input from Channels/CLI/API] --> B[Gateway Control Plane]

    B --> C[Session Management & Queue]

    C --> D[Agent Runtime]

    

    subgraph D [Agent Loop Execution]

        D1[Load Skills Snapshot] --> D2[Build System Prompt]

        D2 --> D3[Model Inference]

        D3 --> D4{Tool Called?}

        D4 -->|Yes| D5[Execute Tool]

        D5 --> D3

        D4 -->|No| D6[Stream Response]

    end

    

    D --> E[Persistence & Memory]

    

    style D fill:#f9f,stroke:#333,stroke-width:2px

```


**Key phases of the agent loop** :


1. **Intake** - Receives requests from messaging channels (WhatsApp, Telegram, Slack), CLI, or APIs

2. **Context Assembly** - Loads skills snapshots, bootstrap files, and session state

3. **Model Inference** - Calls the LLM with assembled prompt

4. **Tool Execution** - If the model calls a tool, it executes and feeds results back

5. **Streaming** - Outputs are streamed as assistant deltas and tool events

6. **Persistence** - Session state is saved for continuity


### Architecture Layers 


| Layer | Purpose |

| :--- | :--- |

| **Control Interfaces** | Desktop app, CLI, web UI for human interaction |

| **Messaging Channels** | WhatsApp, Telegram, Slack, iMessage - event sources |

| **Gateway Control Plane** | Routes requests, enforces access, manages sessions |

| **Agent Runtime** | Core AI reasoning, prompt construction, tool orchestration |

| **Tools Layer** | Bash, browser, filesystem, cron - actual execution |


### Queueing & Concurrency


Runs are **serialized per session** to prevent tool/session races and maintain consistency . Sessions can have different queue modes: `collect`, `steer`, or `followup` .


---


## What are Skills in OpenClaw?


Skills are **portable knowledge packages** that teach OpenClaw how to perform specific tasks . Each skill is a directory containing a `SKILL.md` file with YAML frontmatter and Markdown instructions.


### Skill Directory Structure 


```

skill-name/                    # lowercase, hyphens only

├── SKILL.md                   # REQUIRED - frontmatter + instructions

├── scripts/                   # OPTIONAL - executable code (Python, Bash, etc.)

├── references/                # OPTIONAL - detailed documentation loaded on demand

└── assets/                    # OPTIONAL - templates, images, static files

```


### SKILL.md Format 


```markdown

---

name: my-skill

description: What this does. Use when user asks about X.

license: MIT

metadata: { "openclaw": { "requires": { "bins": ["python3"] } } }

---


# Skill Instructions


Write clear, imperative instructions here. Use {baseDir} to reference skill folder.


## Step 1

Do this: `command --arg`


## Troubleshooting

Common error → fix

```


### Frontmatter Fields 


| Field | Required | Description |

| :--- | :--- | :--- |

| `name` | **Yes** | 1-64 chars, lowercase alphanumeric-hyphens |

| `description` | **Yes** | 1-1024 chars, include "Use when..." |

| `license` | No | SPDX identifier (MIT, Apache-2.0) |

| `metadata.openclaw` | No | Gating rules, installers, requirements |


### Progressive Disclosure (Token Efficiency)


Skills use a **three-stage loading model** to save context tokens :


| Stage | What Loads | When |

| :--- | :--- | :--- |

| **Discovery** | Only `name` + `description` | Session start (~100 tokens) |

| **Activation** | Full `SKILL.md` body | When skill is triggered |

| **Resources** | `references/` files | Only when explicitly referenced |


### Skill Locations & Priority 


OpenClaw loads skills from multiple locations with this priority order:


1. **Workspace skills** - `<workspace>/skills` (highest priority)

2. **Project agent skills** - `<workspace>/.agents/skills`

3. **Personal agent skills** - `~/.agents/skills`

4. **Managed skills** - `~/.openclaw/skills`

5. **Bundled skills** - shipped with OpenClaw (lowest priority)


### Skill Gating (Load-Time Filtering)


Skills can be **conditionally loaded** based on environment :


```markdown

metadata: {

  "openclaw": {

    "requires": {

      "bins": ["docker", "python3"],

      "env": ["OPENAI_API_KEY"],

      "config": ["browser.enabled"]

    },

    "os": ["darwin", "linux"],

    "emoji": "🐳"

  }

}

```


**Gating options**:

- `requires.bins` - binaries must be in PATH

- `requires.env` - environment variables must exist

- `requires.config` - config paths must be truthy

- `os` - restrict to specific platforms


### ClawHub (Skill Registry)


OpenClaw has a public skill registry at [clawhub.com](https://clawhub.com) . You can:


```bash

openclaw skills install <skill-slug>   # Install to workspace

openclaw skills update --all            # Update all skills

```


---


## Can You Make a Generic Agent That Accepts a skills.md File?


**Yes, absolutely.** The Agent Skills format is an **open standard** from [agentskills.io](https://agentskills.io) . This means skills are **portable across multiple platforms**, including:


- Claude Code

- Cursor

- GitHub Copilot

- OpenClaw

- VS Code (via symlinks)

- Any custom agent that implements the spec


### Building Your Own Generic Agent


You can build an agent that:

1. **Scans directories** for folders containing `SKILL.md`

2. **Parses YAML frontmatter** to get `name` and `description`

3. **Injects the manifest** into the system prompt

4. **Loads full SKILL.md** when the LLM indicates the skill is relevant

5. **Provides tool execution** for actions described in the skill


### Example: Minimal Agent Logic


```python

# Pseudocode for skill loading

skills = []

for skill_dir in scan_directories():

    if (skill_dir / "SKILL.md").exists():

        metadata = parse_frontmatter(skill_dir / "SKILL.md")

        skills.append({

            "name": metadata["name"],

            "description": metadata["description"],

            "path": skill_dir

        })


# Inject manifest into system prompt

system_prompt = f"Available skills: {skills}\n\nWhen a skill is relevant, ask to load it."


# On skill trigger

if triggered_skill:

    full_content = (triggered_skill["path"] / "SKILL.md").read_text()

    # Inject into context and continue

```


### Validation Tools


You can validate skills using the official CLI :


```bash

uv tool install git+https://github.com/agentskills/agentskills#subdirectory=skills-ref

skills-ref validate ./my-skill

skills-ref read-properties ./my-skill

skills-ref to-prompt ./my-skill

```


---


## What Other Files Exist Alongside SKILL.md?


Yes, skills can include **three optional subdirectories** :


### 1. `scripts/` - Executable Code


Contains runnable scripts that the agent can execute:


```

scripts/

├── validate.py

├── process_data.sh

└── generate_report.js

```


Use in SKILL.md: `Run: python scripts/validate.py --input {file}`


### 2. `references/` - Detailed Documentation


Loaded **on-demand** to save context tokens:


```

references/

├── api_documentation.md

├── policies.md

├── architecture.md

└── troubleshooting.md

```


Reference in SKILL.md: `See [references/policies.md](references/policies.md) for details`


### 3. `assets/` - Static Resources


Templates, images, fonts, or any static files:


```

assets/

├── report-template.docx

├── diagram.png

├── config-schema.json

└── logo.svg

```


### Complete Example: OpenClaw-Skill Structure


A real-world example (OpenClaw-Skill) has **51 reference files** covering everything from architecture to troubleshooting :


```

OpenClaw-Skill/

├── SKILL.md                     # Main entry point

└── references/

    ├── architecture.md          # Gateway architecture

    ├── agent_runtime.md         # Agent loop details

    ├── channels.md              # 20+ channel configs

    ├── providers.md             # 35+ model providers

    ├── security.md              # Security baseline

    └── ... (46 more files)

```


---


## Summary Table


| Question | Answer |

| :--- | :--- |

| **How does OpenClaw work?** | Always-on agent runtime with serialized agent loop per session; Gateway routes requests → Agent loads skills → Model reasons → Tools execute |

| **What are Skills?** | Portable directories with `SKILL.md` containing YAML metadata + Markdown instructions |

| **Required files?** | Only `SKILL.md` with `name` and `description` frontmatter |

| **Optional files?** | `scripts/`, `references/`, `assets/` subdirectories |

| **Generic agent possible?** | **Yes** - Agent Skills is an open standard; works across Claude Code, Cursor, OpenClaw, and custom agents |

| **Skill validation?** | Official `skills-ref validate` CLI tool |

| **Skill registry?** | ClawHub at clawhub.com |


Would you like me to help you create a specific skill for your use case, or dive deeper into building a custom agent that uses the Agent Skills format?

Thursday, April 2, 2026

How does claude code works?

 Based on the official documentation, here is a summary of how **Claude Code** works.


Claude Code is an agentic terminal assistant that operates in an **agentic loop** to complete tasks. It combines a reasoning model with a set of tools that allow it to act on your project.


### 🔄 The Agentic Loop: Core Operating Principle


When you give Claude a task, it works through three dynamic phases:


1.  **Gather Context:** It uses tools to search files, read code, explore your project structure, and understand the problem.

2.  **Take Action:** It uses tools to edit files, run shell commands (like tests or builds), or search the web.

3.  **Verify Results:** It runs tests, checks error outputs, or reviews changes to see if the goal was met.


Claude decides the sequence of steps based on what it learns from the previous one. It can chain dozens of actions together, course-correcting along the way. You can **interrupt at any point** to steer it in a different direction.


### 🛠️ What Makes Claude Code Agentic: Tools


The agentic loop is powered by two things: a **model** (Claude) that reasons, and **tools** that allow it to act. Without tools, Claude can only respond with text.


The built-in tools generally fall into five categories:


| Category | What Claude Can Do |

| :--- | :--- |

| **File operations** | Read files, edit code, create new files, rename and reorganize |

| **Search** | Find files by pattern, search content with regex, explore codebases |

| **Execution** | Run shell commands, start servers, run tests, use git |

| **Web** | Search the web, fetch documentation, look up error messages |

| **Code intelligence** | See type errors and warnings after edits, jump to definitions, find references (requires plugins) |


### 🗂️ What Claude Can Access


When you run `claude` in a directory, it can access:


-   **Your project files** (in the directory and subdirectories, with permission for files elsewhere).

-   **Your terminal** (any command you could run: build tools, git, package managers, scripts).

-   **Your git state** (current branch, uncommitted changes, recent commit history).

-   **`CLAUDE.md`** (a markdown file for project-specific instructions and conventions).

-   **Auto memory** (learnings Claude saves automatically between sessions, like project patterns).

-   **Extensions you configure** (MCP servers, skills, subagents).


### 🧠 Context Window Management


Claude Code manages the conversation's context window automatically:


-   **Filling up:** As you work, the context fills with conversation history, file contents, command outputs, etc.

-   **Compaction:** When the limit approaches, Claude clears older tool outputs first, then summarizes the conversation. Your requests and key code are preserved, but early detailed instructions may be lost.

    -   **Tip:** Put persistent rules in `CLAUDE.md` rather than relying on conversation history.

    -   Use `/context` to see what's using space.

-   **Skills and Subagents:** These help manage context. Skills load on demand (only name/description are always present). Subagents get their own fresh context, separate from your main conversation, and only return a summary.


### 🛡️ Safety: Checkpoints and Permissions


-   **Checkpoints:** Before editing any file, Claude Code snapshots the current contents. You can undo file changes by pressing `Esc` twice or asking Claude to undo.

-   **Permissions:** Press `Shift+Tab` to cycle through modes:

    -   **Default:** Claude asks before file edits and shell commands.

    -   **Auto-accept edits:** Edits files without asking, but still asks for commands.

    -   **Plan mode:** Uses **read-only tools only** to create a plan you approve before execution.

    -   **Auto mode:** Evaluates all actions with background safety checks (research preview).


### 💡 Tips for Effective Use


-   **It's a conversation:** Start with what you want, then refine. You don't need perfect prompts.

-   **Interrupt and steer:** If Claude goes down the wrong path, type your correction and press Enter.

-   **Be specific upfront:** Reference specific files, mention constraints, and point to example patterns for better first attempts.

-   **Give Claude something to verify against:** Include test cases or paste screenshots of expected UI so it can check its own work.

-   **Explore before implementing:** For complex problems, use **Plan mode** to analyze the codebase first, review the plan, then let Claude implement.

-   **Delegate, not dictate:** Give context and direction, then trust Claude to figure out the details (e.g., "The checkout flow is broken... the relevant code is in `src/payments/`. Can you investigate?").


### 📂 Sessions


-   Each session is tied to your current directory. Conversations are saved locally.

-   **Resume or fork:** Use `--continue` to resume a session. Use `--fork-session` to branch off a new session from a previous one without affecting the original.

-   **Switching branches:** Claude sees the new branch's files, but your conversation history stays the same.


In essence, Claude Code works as an agent that **autonomously navigates your project using a loop of gathering context, acting, and verifying**, while giving you full control to interrupt, steer, and manage its permissions. It's designed to be a conversational, flexible, and safe coding partner from your terminal.

Wednesday, April 1, 2026

What is Amazon Rekognition (quick context)

Amazon Rekognition is a pre-trained AI service that can:

  • Detect objects, scenes, faces

  • Perform image moderation (unsafe content)

  • Extract text from images

  • Do facial comparison, etc.


🎯 1. Important Clarification (Your Question)

“Is it about training a model with images to produce a new model?”

👉 Answer: It depends—there are TWO modes


✅ Mode 1 — Pre-trained (Most common)

  • No training needed

  • Works out-of-the-box

Examples:

  • Detect labels → “Car”, “Tree”, “Person”

  • Detect unsafe content

  • Face detection

👉 This is what most people use


✅ Mode 2 — Custom Training (Rekognition Custom Labels)

  • You train your own model using your images

  • Used for domain-specific use cases

Example:

  • Detect:

    • “Damaged equipment”

    • “Specific device types”

    • “Company logos”

👉 This is closer to what you described


🖥️ 2. How to Use Rekognition from AWS Console (UI)


🟢 A. Using Pre-trained Features (No Training)

Step-by-step:

1️⃣ Go to Rekognition Console

  • Open AWS Console

  • Navigate to Rekognition


2️⃣ Choose Feature

You’ll see options like:

  • Label detection

  • Face analysis

  • Text detection

  • Content moderation


3️⃣ Upload Image

  • Upload from local OR select from:

    • Amazon S3


4️⃣ View Results

  • Labels with confidence scores

  • Bounding boxes

  • Detected text

  • Moderation flags


👉 That’s it—no model training required


🧠 3. Using Custom Models (Custom Labels)

This is where training happens


🔁 Workflow:

Upload Images → Label Data → Train Model → Evaluate → Deploy → Use for Prediction

Step-by-step:

1️⃣ Prepare Dataset

  • Upload images to S3

  • Organize:

    • Train set

    • Test set


2️⃣ Create Project

  • Go to “Custom Labels” in Rekognition

  • Create project


3️⃣ Label Images

  • Use built-in labeling tool

  • Draw bounding boxes

  • Assign labels


4️⃣ Train Model

  • Click “Train”

  • AWS trains model automatically


5️⃣ Evaluate Model

  • Check:

    • Accuracy

    • Precision/Recall


6️⃣ Deploy Model

  • Start model endpoint

  • (Billing starts here)


7️⃣ Use Model

  • Call API to classify new images


⚙️ 4. How Prediction Works

For pre-trained:

Image → Rekognition API → Labels + Confidence

For custom:

Image → Your Custom Model → Domain-specific labels

🔗 5. Integration Options

Rekognition integrates with:

  • Amazon S3 → image storage

  • AWS Lambda → processing

  • Amazon API Gateway → expose APIs


Example architecture:

Upload Image → S3 → Lambda → Rekognition → Result → App/UI

🖼️ 6. Common Use Cases

✅ Pre-trained:

  • Image tagging

  • Content moderation

  • Face detection

  • OCR (basic)


✅ Custom Labels:

  • Manufacturing defect detection

  • Logo detection

  • Device classification

  • Medical imaging (basic cases)


⚠️ 7. Key Limitations

  • Custom training requires:

    • Labeled data

    • Time & cost

  • Not as flexible as full ML frameworks like:

    • Amazon SageMaker


🧠 8. When to Use What

Use CaseRecommendation
General object detectionPre-trained
ModerationPre-trained
Domain-specific detectionCustom Labels
Complex ML problemSageMaker

🏁 Final Answer

👉 Using Rekognition from UI does NOT always involve training

  • ✅ Most features → ready-to-use (no training)

  • ✅ Custom Labels → lets you train your own model


💬 One-line takeaway:

Rekognition is both a plug-and-play vision API and a lightweight custom model training platform—depending on your use case.


If you want, I can:

  • Show sample API code (Python / Node.js)

  • Compare Rekognition vs SageMaker for custom vision

  • Or design a pipeline using Rekognition + A2I (human review) 👍

What is Amazon Lex?

Amazon Lex is AWS’s service for building chatbots and voice bots using natural language understanding (NLU) and automatic speech recognition (ASR).

👉 In simple terms:

It lets users interact with applications using natural language (text or voice)—similar to Alexa (which actually uses Lex under the hood).


🧠 1. How Amazon Lex Works

Core building blocks:

🔹 Intents

  • What the user wants to do

  • Example: “Book a ticket”, “Check order status”


🔹 Utterances

  • Different ways users express an intent

  • Example:

    • “I want to book a flight”

    • “Reserve a ticket”


🔹 Slots

  • Parameters required to fulfill intent

  • Example:

    • Date

    • Location

    • Ticket type


🔹 Fulfillment

  • What happens after intent is understood

  • Typically:

    • Call backend API (via Lambda)

    • Return response


🔹 Dialog Management

  • Lex automatically:

    • Prompts for missing slots

    • Handles conversation flow


🔁 2. End-to-End Flow

User → Lex Bot → Intent Recognition → Slot Filling → Lambda/API → Response → User

Example:

User: “Book a flight to Delhi tomorrow”

  • Intent → BookFlight

  • Slots → Destination = Delhi, Date = tomorrow

  • Lambda → processes booking

  • Response → “Your flight is booked”


🛠️ 3. Creating a Chatbot using Amazon Lex (Console)

Step-by-step using AWS Console:


1️⃣ Create Bot

  • Go to Amazon Lex console

  • Click Create bot

  • Choose:

    • Blank bot OR template

  • Configure:

    • Language (e.g., English)

    • IAM role


2️⃣ Create Intents

  • Add intent (e.g., BookHotel)

  • Add utterances:

    • “Book a hotel”

    • “Reserve a room”


3️⃣ Define Slots

  • Example:

    • Location

    • Check-in date

  • Define slot types:

    • Built-in OR custom


4️⃣ Configure Prompts

  • Ask user:

    • “Which city?”

    • “What date?”


5️⃣ Fulfillment (Backend Integration)

  • Connect to:

    • AWS Lambda


6️⃣ Build & Test

  • Click Build

  • Test in console chat window


7️⃣ Deploy (Alias)

  • Create bot version + alias

  • Use alias in applications


🔗 4. Integration with Other Applications

✅ Option 1 — Web Application (Most common)

Embed chatbot UI using:

  • Lex Web UI

  • JavaScript SDK


Architecture:

Web App (Angular/React)
        ↓
   Lex API (SDK)
        ↓
    Lex Bot
        ↓
   Lambda / Backend

✅ Option 2 — Mobile Apps

  • iOS / Android SDK

  • Voice + text support


✅ Option 3 — Backend Integration

Call Lex using APIs:

  • RecognizeText

  • RecognizeUtterance


✅ Option 4 — Messaging Platforms

Integrate with:

  • Slack

  • Facebook Messenger

  • Twilio (SMS)


✅ Option 5 — Voice Assistants

  • Telephony systems

  • Contact center bots


🔌 5. Example Integration (Angular App)

Since you’re working with Angular:

👉 You can:

  • Use AWS SDK

  • Call Lex runtime APIs


Flow:

Angular UI → API Gateway → Lambda → Lex → Response → Angular UI

👉 Or directly:

Angular UI → AWS SDK → Lex Runtime API

⚙️ 6. Backend Integration (Important)

Lex typically integrates with:

  • AWS Lambda → business logic

  • Databases (DynamoDB / RDS)

  • External APIs


Example:

Intent → “Check device status”
→ Lambda calls your network API
→ Returns result to user


🧩 7. Advanced Features

🔹 Multi-turn conversations

  • Context-aware dialogs


🔹 Built-in slot types

  • Dates, numbers, locations


🔹 Custom slot types

  • Domain-specific (e.g., device IDs)


🔹 Confidence scores

  • Helps fallback handling


🔹 Fallback intents

  • Handle unknown queries


🚀 8. Where Lex Fits (Important for YOU)

Given your work (GenAI, planners, network automation):

👉 Lex can act as:

“Conversational Interface Layer”

Example:

User → Lex → Intent → Planner Agent → Execution → Response

👉 You can combine:

  • Lex → intent detection

  • LLM → reasoning

  • APIs → execution


🏁 Final Summary

  • Amazon Lex = chatbot + voice bot builder

  • Uses:

    • NLU (intent detection)

    • Slot filling

  • Built using:

    • Intents, utterances, slots

  • Integrates via:

    • Web apps, mobile apps, APIs, messaging platforms

  • Backend handled via:

    • Lambda or APIs


💬 One-line takeaway:

Lex is a managed conversational interface layer that connects user language → backend execution.


If you want, I can:

  • Show a real Angular + Lex integration code

  • Compare Lex vs Dialogflow vs LLM-based chatbot

  • Or design a GenAI + Lex hybrid architecture (very powerful for your use case)

RTK - RUST token Killer

 RTK (Rust Token Killer) is a fascinating tool that fits perfectly into your blog's second part about **system-level optimizations**. Here’s a high-level overview and a practical example you can include.


### 🧠 How RTK Works: High-Level Overview


RTK acts as a **transparent CLI proxy** that intercepts commands run by AI coding tools (like Claude Code, Cursor, or Copilot) and filters their output **before** it enters the LLM’s context window.


**Four Core Strategies:**

1.  **Smart Filtering** – Removes noise (comments, whitespace, boilerplate) from command outputs like `ls`, `git status`, or `cargo test`.

2.  **Grouping** – Aggregates similar items (e.g., files by directory, errors by type) to show structure without repetition.

3.  **Truncation** – Keeps only the most relevant context (e.g., first/last N lines, signatures of functions).

4.  **Deduplication** – Collapses repeated log lines into a single line with a count.


**The Result:** The AI tool receives the same *information* but uses **60–90% fewer tokens**. This directly translates to lower API costs, faster context processing, and less chance of hitting context limits.


### ⚙️ Example: Optimizing a `cargo test` Command


This is one of the most impactful use cases. A failed test in a medium-sized Rust project can output hundreds of lines, consuming thousands of tokens. Here’s how RTK transforms it:


**Without RTK (Standard Output)** – Sends ~25,000 tokens

```bash

$ cargo test

   Compiling myproject v0.1.0 (/Users/dev/myproject)

   ...

running 15 tests

test utils::test_parse ... ok

test utils::test_format ... ok

test api::test_login ... ok

test api::test_logout ... ok

test db::test_connection ... ok

test db::test_query ... ok

test auth::test_password_hash ... ok

test auth::test_token_verify ... ok

test handlers::test_index ... ok

test handlers::test_submit ... FAILED

test handlers::test_delete ... ok

test models::test_user ... ok

test models::test_session ... ok

test middleware::test_auth ... ok

test middleware::test_logging ... ok


failures:

---- handlers::test_submit stdout ----

thread 'handlers::test_submit' panicked at 'assertion failed: `(left == right)`

  left: `Some(ValidationError)`,

 right: `None`', src/handlers.rs:42:9

note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:

    handlers::test_submit


test result: FAILED. 14 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out

```


**With RTK (`rtk test cargo test`)** – Sends ~2,500 tokens (90% reduction!)

```bash

$ rtk test cargo test

running 15 tests

FAILED: 1/15 tests

  handlers::test_submit: panicked at src/handlers.rs:42:9 - assertion failed: left == right

```


### 🔧 How to Demonstrate in Your Blog


You can show a **before/after token count** using RTK’s built-in analytics. For example, after running a session with RTK, you can run:


```bash

rtk gain --graph

```


This would produce a simple ASCII graph showing token savings per command, which makes for a compelling visual in a blog post.


RTK is a perfect example of an **infrastructure-level optimization** that sits between the application and the model, dramatically improving efficiency without changing the application’s logic—a key theme for your Part 2.