Saturday, September 13, 2025

How to seed content into AI crawlers?


- llms.txt is proposed to control whether your site’s content is seeded into LLM training pipelines.

- It acts as an opt-in/opt-out mechanism for AI crawlers.


“llms.txt” is a proposed standard (initiated around September 2024 by Jeremy Howard) meant to let web owners provide a machine-readable, curated guide of their most important content (docs, APIs, canonical pages, etc.) so that LLMs / AI crawlers can better understand what to ingest.  


Here are the facts / findings so far:

Very Low Adoption Among Top Sites

Scans of top 1,000 websites show only about 0.3% of sites have an llms.txt file.  

Some community directories list hundreds of domains using it, but many are smaller docs sites, startups, or developer-platforms.  

Major LLM Providers Do Not Officially Support It Yet

A key point repeated in many sources: OpenAI, Anthropic, Google, Meta etc. have not publicly committed to parsing or respecting llms.txt in their crawling / ingestion pipelines.  

For example, John Mueller (from Google) has said he is not aware of any AI services using llms.txt.  

Some Early Adopters / Use Cases

A number of documentation sites, developer platforms, and SaaS/digital product companies have published llms.txt (and sometimes llms-full.txt) in their docs or marketing domains. Examples include Cloudflare, Anthropic (for its docs), Mintlify etc.  

Also, tools and plugins are emerging (for WordPress, SEO tools, GitBook) to help create llms.txt files.  

Unclear Real-World Impact So Far

There is little evidence that having llms.txt causes an LLM to pick up content more correctly, or improves traffic / retrieval / citation by LLMs. Because major LLMs do not appear to check it. Also server logs from sites with llms.txt show that AI services do not seem to be requesting it.  

Emerging Tools & Community Momentum

Although official adoption is lacking, community interest is growing: directories of implementations, write-ups, generators, documentation, and discussion.  

There are files like llms-full.txt (a more exhaustive content dump) being used, which in some cases appear to get more parser / crawler traffic (or at least more visits) than just llms.txt in some documentation contexts.  


Friday, September 12, 2025

What is Sentinel and Cluster in FalkorDB

FalkorDB is a fork of RedisGraph, and under the hood it runs on Redis. That’s why you see references to Sentinel.

🔹 What is is_sentinel?

It’s usually a configuration flag/parameter in client libraries (Python, Node, etc.) when connecting to Redis/FalkorDB.

is_sentinel=True tells the driver:

“Don’t connect directly to a single Redis instance. Instead, connect to a Redis Sentinel node to discover the master/replica topology.”



🔹 What is a Sentinel Connection?


Redis Sentinel is a high-availability (HA) system for Redis.

It monitors Redis instances.

Automatically handles failover (if master goes down, promote a replica).

Provides service discovery for clients.


So in FalkorDB:

If you deploy in standalone mode, you connect directly (no Sentinel).

If you deploy in HA mode (with replication & failover), you connect through Sentinel, and the driver will auto-route queries to the current master.



🔹 Example (Python FalkorDB client)


from redis.sentinel import Sentinel


# Connect to Sentinel

sentinel = Sentinel([('localhost', 26379)], socket_timeout=0.1)


# Get master connection

master = sentinel.master_for('mymaster', password="yourpassword", db=0)


# Get replica connection (read-only queries can go here)

replica = sentinel.slave_for('mymaster', password="yourpassword", db=0)


# Use master to run queries (FalkorDB graph calls are still Redis commands)

print(master.execute_command("GRAPH.QUERY", "MyGraph", "MATCH (n) RETURN n"))


Here:

"mymaster" = name of the Redis master group managed by Sentinel.

Sentinel runs separately (usually on port 26379).

is_sentinel=True just signals to use this connection mode.



🔑 Summary

is_sentinel → flag to tell the driver to connect via Redis Sentinel.

Sentinel connection → allows HA setup (failover, replication, service discovery) in FalkorDB just like Redis.

Useful if you run FalkorDB in production with clustering + replicas, not needed in local/dev (single-node) setups.


Gems, Gemini, GEmini Advanced, NotebookLM


🔹 1. Gems

What it is: Google’s upcoming feature (announced for Gemini) where you can create custom AI assistants/agents.

How it works: You can set personality, role, and goals (e.g., “a fitness coach,” “a travel planner,” “a coding tutor”).

Think of it as: Similar to OpenAI’s GPTs or Anthropic’s Custom Claude.



🔹 2. Gemini

What it is: Google’s family of generative AI models (successor to PaLM 2).

Released in Dec 2023 and continuously improved.

Capabilities: Multimodal (text, images, code, audio, video).

Versions:

Gemini Ultra → largest, most powerful (enterprise/research).

Gemini Pro → balanced, used in free-tier chat.

Gemini Nano → lightweight, runs on devices like Pixel phones.



🔹 3. Gemini Advanced

What it is: The premium subscription plan for Gemini users.

Includes:

Access to Gemini Ultra 1.5 (most capable model).

Longer context window (handles bigger documents/conversations).

Priority access to new AI features.

Cost: Around $20/month (similar to ChatGPT Plus).



🔹 4. NotebookLM

What it is: A Google AI tool for AI-powered research & note-taking.

Specialty: You can upload documents (Google Docs, PDFs, notes), and NotebookLM builds a source-grounded AI assistant.

Why it’s different: Instead of general chat, it answers only based on your provided sources → reduces hallucinations.

Use case: Study aid, research assistant, knowledge management.

Extra feature: Auto-generates summaries, FAQs, and “briefings” from your uploaded docs.



✅ Summary in 1 Line Each

Gems → Custom AI assistants you design.

Gemini → Google’s core AI model family.

Gemini Advanced → Paid plan with Gemini Ultra & extra features.

NotebookLM → AI research notebook grounded in your docs.



👉 Do you want me to also make a comparison table (like feature → who it’s for → pricing → use case) for these four? That way you can quickly decide which one fits your needs.


Thursday, September 11, 2025

What is LLM Seeding?

LLM seeding involves publishing content in places and formats that LLMs are more likely to crawl, understand, and cite. It’s not a traditional SEO strategy or “prompt engineering.” Instead, you’ll use this strategy to get your content to appear in AI-generated answers, even if no one clicks.

LLM seeding involves publishing content where large language models are most likely to access, summarize, and cite.

Unlike SEO, you’re not optimizing for clicks. Instead, you’re working toward citations and visibility in AI responses.

Formats like listicles, FAQs, comparison tables, and authentic reviews increase your chances of being cited.

Placement matters. Publish on third-party platforms, industry sites, forums, and review hubs. 

Track results and monitor brand mentions in AI tools, referral traffic from citations, and branded search growth from unlinked citations across the web.

LLM seeding is publishing content in formats and locations that LLMs like ChatGPT, Gemini, and Perplexity can access, understand, and cite.

Instead of trying to rank #1 in Google search results, you want to be the source behind AI-generated answers your audience sees. The goal is to show up in summaries, recommendations, or citations without needing a click. The fundamentals overlap with SEO best practices, but the platform you’re optimizing for has changed.

LLMs have been trained on massive datasets pulled from the public web, including blogs, forums, news sites, social platforms, and more. Some also use retrieval systems (like Bing or Google Search) to pull in fresh information.  When someone asks a question, the model generates a response based on what it has learned and in some cases, what it retrieves in real time. 

Well-structured content, clearly written, and hosted in the right places, is more likely to be referenced in the response: an LLM citation. It’s a huge shift because instead of optimizing almost exclusively for Google’s algorithm, you’re now engineering content for AI-visibility and citations.

Traditional SEO focuses on ranking high on Google to earn clicks. You optimize for keywords, build backlinks, and improve page speed to attract traffic to your site.

ou don’t chase rankings. You build content for LLMs to reference, even if your page never breaks into the top 10. The focus shifts from traffic to trust signals: clear formatting, semantic structure, and authoritative insights. You provide unique insights and publish in places AI models scan frequently, like Reddit, Medium, or niche blogs, which increases your chances of being surfaced in AI results.

SEO asks, “How do I get more people to click to my website?”

LLM seeding asks, “How do I become the answer, even if there’s no click?”

Best Practices For LLM Seeding

If you want LLMs to surface and cite your content, you need to make it easy to find, read, and worth referencing. Here’s how to do that:

Create “Best of Listicles”

LLMs prioritize ranking-style articles and listicles, especially when they match user intent, such as “best tools for freelancers” or “top CRM platforms for startups.” Adding transparent criteria boosts trust.

Use Semantic Chunking

Semantic chunking breaks your content into clear, focused sections that use subheadings, bullet points, and short paragraphs to make it easier for people to read. This structure also helps LLMs understand and accurately extract details. If you’re having trouble thinking about where to start, think about FAQs, summary boxes, and consistent formatting throughout your content.

Write First-Hand Product Reviews

LLMs tend to favor authentic, detailed reviews that include pros, cons, and personal takeaways. Explain your testing process or experience to build credibility. 

Add Comparison Tables

Side-by-side product or service comparisons (especially Brand A vs. Brand B) are gold to LLMs. You’re more likely to be highlighted if you include verdicts like “Best for Enterprise” or “Best Budget Pick.” An example of a brand that does comparison tables particularly well is Nerdwallet.

Include FAQ Sections

Format your FAQs with the question as a subheading and a direct, short answer underneath. LLMs are trained on large amounts of Q&A-style text, so this structure makes it easier for them to parse and reuse your content. FAQ schema is also fundamental to placement in zero-click search elements like featured snippets. The structured format makes your content easier for AI systems to parse and reference. 

Offer Original Opinions

Hot takes, predictions, or contrarian views can stand out in LLM answers, especially when they’re presented clearly and backed by credible expertise. Structure them clearly and provide obvious takeaways.

Demonstrate Authority

Use author bios, cite sources, and speak from experience. LLMs use the cues to gauge trust and credibility. If you’ve been focusing on meeting E-E-A-T guidelines, much of your content will already have this baked in.

Layer in Multimedia

While ChatGPT may not show users photos inside the chat window, screenshots, graphs, and visuals with descriptive captions and alt text help LLMs (and users who do click through) better understand context. It also breaks up walls of text.

How To Track LLM Seeding

Tracking LLM seeding is different from tracking SEO performance. You won’t always see clicks or referral traffic, but you can measure impact if you know where to look. These KPIs matter the most:

1. Brand Mentions in AI Tools

Tracking tools: Perplexity Pro lets you see citation sources, while ChatGPT Advanced Data Analysis can sometimes surface cited domains. Even enterprise tools like Semrush AIO have started to track brand mentions across AI models. There are also dedicated tools like Profound that specifically focus on AI visibility.

2.  Referral Traffic Growth

Using tools like GA4 can help you determine LLM seeding’s effectiveness, but not via traditional metrics.

3. Unlinked Mentions

You have several options for seeking out unlinked mentions.

4. Overall LLM Visibility

No matter which tools you use, building a log to track your monthly tests across AI platforms can provide insights. Document the tool(s) used, prompt asked, and the exact phrasing of the mention. You’ll also want to track your brand sentiment; is your brand being talked about in a positive, neutral, or negative light?


Wednesday, September 10, 2025

How to setup a logger sidecar container in Kubernetes

 We’ll set up:

A main container (writes logs to a file).

A sidecar container (reads the log and prints it to stdout).

Deploy them together in a Pod.

Run it inside Minikube.


🔹 1. Kubernetes Pod YAML (main + sidecar)


Save as pod-sidecar.yaml:


apiVersion: v1

kind: Pod

metadata:

  name: main-sidecar-pod

spec:

  containers:

  - name: main-app

    image: busybox

    command: ["/bin/sh", "-c"]

    args:

      - |

        while true; do

          echo "$(date) : Hello from MAIN container" >> /var/log/app.log;

          sleep 5;

        done

    volumeMounts:

    - name: shared-logs

      mountPath: /var/log


  - name: sidecar-log-reader

    image: busybox

    command: ["/bin/sh", "-c"]

    args:

      - |

        tail -n+1 -f /var/log/app.log

    volumeMounts:

    - name: shared-logs

      mountPath: /var/log


  volumes:

  - name: shared-logs

    emptyDir: {}


What this does:

main-app writes a timestamped log to /var/log/app.log.

sidecar-log-reader tails the same log file (via emptyDir shared volume).

Both share the same directory /var/log.



🔹 2. Run on Minikube


Step 1: Start Minikube


minikube start


Step 2: Apply the Pod


kubectl apply -f pod-sidecar.yaml


Step 3: Verify the Pod


kubectl get pods


You should see:


main-sidecar-pod   Running   2/2   0   <time>


(2/2 containers means both main + sidecar are running.)



🔹 3. Check the logs


👉 Logs from sidecar (should continuously show main’s logs):


kubectl logs -f main-sidecar-pod -c sidecar-log-reader


👉 Logs from main (itself writes to file, not stdout):


kubectl exec -it main-sidecar-pod -c main-app -- tail -f /var/log/app.log




🔹 4. Clean up


kubectl delete pod main-sidecar-pod


What is a sidecar container. What would be a good example

Excellent observation 👍 — you’re right:

👉 In Kubernetes, there is no special keyword like sidecar: true in YAML.

A “sidecar container” is just another container in the same Pod. The sidecar pattern is a design convention, not a separate resource type.


Let’s break this down:

🔹 1. Why a Sidecar Works Without Special YAML

Pod = group of containers that share resources.

All containers inside a Pod:

Share the same network namespace (same IP address & ports).

Can communicate over localhost.

Can share volumes mounted into each container.

So when you run a container to assist the main app (logging, proxying, syncing, etc.), we call it a sidecar — by convention.

In the YAML:

spec:

  containers:

    - name: main-app

      ...

    - name: sidecar-log-reader

      ...


Both are just containers. The “sidecar” role comes from its purpose, not from Kubernetes magic.



🔹 2. How Sidecar Containers Work


A sidecar works by sharing Pod-level resources with its “main” container:

Shared filesystem → via emptyDir, configMap, secret, or persistentVolume.

Shared networking → can connect via localhost:<port> to the main container.

Shared lifecycle → sidecars start and stop when the Pod does.


That’s why in the log example:

The main container writes logs to /var/log/app.log on a shared volume.

The sidecar container tails the same file (because it mounts the same volume).



🔹 3. Access Rights of a Sidecar Container


Within a Pod:

1. Filesystem

Only shared volumes (emptyDir, PVC, configMap, etc.) are accessible across containers.

Sidecar cannot see the main container’s entire root filesystem unless explicitly shared.

Example: both mount /var/log → both see the same files.

2. Network

All containers in the Pod share the same IP and network namespace.

They can talk over localhost, e.g., curl http://localhost:8080.

But each container still has its own process namespace — they don’t see each other’s processes.

3. Lifecycle

If the Pod restarts, both restart together.

There’s no “parent/child” relationship — all containers are siblings in a Pod.

4. Security Context

Access is further controlled by:

Kubernetes security policies (PodSecurityPolicy, PodSecurityAdmission).

Container user IDs (run as root vs non-root).

Volume permissions.

So, a sidecar container only has as much access as you give it via volumes, ports, and security context.

🔹 4. Key Insight

A sidecar is not a privileged child — it’s just a peer container in the same Pod, given access to the same sandbox (network, volumes, lifecycle).

We call it a sidecar only because it assists the main container.

✅ Example:

Logging sidecar → ships logs from main container’s shared volume.

Proxy sidecar (like Envoy in Istio) → intercepts main container’s network traffic.

Data loader sidecar → fetches config files into a shared volume before main starts.


What is FalkorDB?


FalkorDB is a graph database built on top of Redis, designed for real-time AI/ML applications.

It is a fork of RedisGraph (after Redis stopped maintaining RedisGraph in 2023).

Uses GraphBLAS (linear algebra-based graph processing) for speed.

Query language: Cypher-like syntax (similar to Neo4j).


Think of it as: Redis (fast in-memory DB) + Graph structure support + AI-friendly features.



🔹 Advantages of FalkorDB

1. Performance (In-Memory + GraphBLAS)

Extremely fast queries, thanks to in-memory Redis + linear algebra ops.

Good for low-latency use cases (e.g., recommendations, fraud detection).

2. Real-time AI/ML Support

Supports hybrid search (vector embeddings + graph search).

Can combine semantic search (vector DB) with graph traversals.

3. Cypher Query Language Support

Developers familiar with Neo4j/Cypher can adapt quickly.

4. Scalability

Inherits Redis cluster scalability.

Works well in distributed, high-throughput environments.

5. Open Source & Actively Maintained

Unlike RedisGraph (which is discontinued), FalkorDB is actively updated.

6. Integration with AI frameworks

Works nicely with LLMs, recommendation engines, and knowledge graphs.



🔹 Disadvantages of FalkorDB

1. Memory Intensive

Like Redis, it stores data in memory (RAM).

Expensive for very large graphs unless persistence layers are optimized.

2. Younger Ecosystem

Compared to Neo4j or ArangoDB, community and ecosystem are smaller.

Fewer third-party integrations, tutorials, and production deployments.

3. Feature Gap vs Neo4j

Neo4j still has richer tooling (Bloom visualization, enterprise features, plugins).

FalkorDB is more lightweight.

4. Operational Complexity

Needs careful memory management and persistence tuning.

Scaling beyond RAM can be tricky compared to disk-based graph DBs.

5. Limited Query Language Extensions

Cypher support is partial (not 100% Neo4j compatible).

Some advanced graph analytics require custom workarounds.



🔑 Summary

FalkorDB = high-performance, Redis-based graph + vector database for real-time AI/ML workloads.

Best for: recommendation systems, fraud detection, semantic search, knowledge graphs in LLM apps.

Trade-off: blazing-fast but RAM-heavy and still growing ecosystem compared to Neo4j.