Friday, May 1, 2026

Some techniques for graph edge and nodes

 I only keep 219 of these 329 in the network — an entity has to be mentioned in at least two paragraphs to get a node. The long tail of single-mention entities adds clutter without signal.

For the edges, I used paragraph-level co-occurrence. Two entities that appear in the same paragraph are connected; the edge weight is the number of paragraphs in which they both appear. This is a coarse proxy — it conflates “mentioned together” with “actually related” — but on a well-edited book, it works surprisingly well. Paragraphs are typically topical. If Morris Chang and TSMC appear in 34 paragraphs together, they're related regardless of what the verbs are.

I layered PMI (pointwise mutual information) on top of raw weight to surface pairs that co-occur more often than you’d expect given their individual mention counts. PMI is how you separate United States + Intel (weight = 64, but PMI = -0.08 because both appear in half the book) from John Bardeen + Walter Brattain (weight = 7, PMI = 4.4 because they basically only ever appear in each other's company — they're the transistor co-inventors



Graph with no llms

 I converted the PDF to text, segmented it into 1,156 paragraphs, and built a canonical schema with four entity types: people, companies, countries, and technologies. Each canonical entity carries an alias list. I ran regex matching for every alias against every paragraph to produce entity mentions. I built an undirected weighted graph where nodes are entities and edge weights are the number of paragraphs in which two entities co-occur, with PMI as a secondary score to surface surprising pairs. I ran Louvain community detection, computed the standard centrality measures (PageRank, betweenness, eigenvector, weighted degree), and then layered pattern-based typed relation extraction on top — 17 predicates like FOUNDED, INVENTED, ACQUIRED, SANCTIONED, filtered by semantic-type plausibility so (Apple, FOUNDED, Taiwan) gets dropped.

No LLMs. Just schema, regex, NetworkX, and Louvain. The whole pipeline is ~800 lines of Python and runs in about 20 seconds.

What is spherical harmonics

 “Spherical harmonics” and “knowledge graphs” come from two very different areas, so the phrase can sound confusing. The key is that spherical harmonics are a mathematical tool, and in knowledge graphs they’re sometimes used as a way to represent or learn structure.

1. What spherical harmonics are

In math and physics, Spherical Harmonics are a set of functions defined on the surface of a sphere. They’re similar to sine/cosine waves but wrapped around a sphere instead of a line.

They’re widely used in:

  • Quantum Mechanics (electron orbitals)

  • Computer Graphics (lighting, reflections)

  • Signal processing on spherical surfaces

They let you describe complex shapes or patterns on a sphere as a combination of simpler components.


2. What a knowledge graph is

A Knowledge Graph is a network of entities (nodes) and relationships (edges), like:

  • (Paris) → (capital of) → (France)

  • (Einstein) → (developed) → (relativity)

These graphs are used in search engines, recommendation systems, and AI.


3. Where spherical harmonics come into knowledge graphs

They don’t belong to knowledge graphs by default, but appear in advanced machine learning methods, especially:

a) Geometric / embedding approaches

Modern knowledge graph models often map entities and relations into continuous spaces (vectors, manifolds).

Some newer approaches use:

  • Spherical spaces instead of flat (Euclidean) ones

  • Functions like spherical harmonics to represent directional or rotational patterns

Why?

  • Real-world relationships can have symmetry, hierarchy, or cyclic structure

  • Spherical representations can capture these better than plain vectors


b) Graph neural networks (GNNs)

In certain Geometric Deep Learning methods:

  • Graph data is treated with geometric tools

  • If the graph has spatial or rotational properties, spherical harmonics can be used for:

    • Encoding node positions

    • Learning rotationally invariant features


c) Rotational or angular embeddings

Some knowledge graph embedding models try to represent relationships as:

  • Rotations

  • Angles

  • Transformations on manifolds

Spherical harmonics can help model these transformations smoothly.


4. Intuition (simple analogy)

Think of a knowledge graph like a map of relationships.

  • Traditional embeddings = placing everything on a flat sheet

  • Spherical methods = placing everything on a globe

Spherical harmonics = a way to describe patterns on that globe efficiently.


5. Bottom line

Spherical harmonics are not a core concept of knowledge graphs, but they are used in advanced representation learning techniques to:

  • Model complex relational patterns

  • Capture symmetry and geometry

  • Improve embeddings in non-Euclidean spaces


If you want, I can show a concrete example (with equations or code) of how spherical harmonics are used in a graph embedding model.

High degree steerable models

 High-degree steerable models in knowledge graphs combine two ideas:

  1. knowledge graphs (KGs) as structured relational data, and

  2. steerability—the ability to control model behavior, reasoning paths, or outputs with fine granularity.

Let’s unpack this in a clear way.


1. What “high-degree” means in knowledge graphs

In a knowledge graph, entities (nodes) are connected by relations (edges). A high-degree node is one that has many connections.

  • Example: In a KG like Wikidata

    • A node like “United States” or “Human” has thousands of edges.

  • These nodes are:

    • Information-dense

    • Often ambiguous

    • Harder for models to represent efficiently


2. What “steerable models” are

Steerability refers to controlling how a model behaves. In the context of KGs, this includes:

  • Controlling which relations the model focuses on

  • Guiding reasoning paths

  • Adjusting bias toward certain subgraphs

  • Enforcing constraints or rules

Common model families:

  • Graph Neural Networks (GNNs)

  • Embedding models (e.g., TransE, RotatE)

  • Neuro-symbolic models


3. The challenge with high-degree nodes

High-degree nodes create problems:

a. Over-smoothing

In GNNs, repeated aggregation causes node representations to become similar.

b. Noise accumulation

Not all connections are relevant. High-degree nodes mix:

  • useful edges

  • irrelevant edges

c. Scalability

Computational cost increases significantly.


4. How high-degree steerable models address this

(A) Attention mechanisms

Models assign weights to edges:

  • Focus on important neighbors

  • Ignore noisy connections

Example:

  • Graph Attention Networks (GAT)


(B) Relation-aware filtering

Instead of treating all edges equally:

  • Filter by relation type

  • Prioritize specific predicates


(C) Subgraph steering

The model is guided to operate on a selected subgraph:

  • Query-driven reasoning

  • Task-specific pruning


(D) Path-based reasoning

Instead of aggregating all neighbors:

  • Follow specific multi-hop paths

  • Improves interpretability


(E) User or task control signals

Steerability can come from:

  • Prompts (in LLM-integrated systems)

  • Constraints (rules, logic)

  • Reinforcement learning rewards


5. Mathematical intuition

Let a node ( v ) have neighbors ( N(v) ).

Standard aggregation:
[
h_v = \sum_{u \in N(v)} h_u
]

Steerable aggregation:
[
h_v = \sum_{u \in N(v)} \alpha_{vu} \cdot h_u
]

Where:

  • ( \alpha_{vu} ) = learned attention weight

  • Can depend on:

    • relation type

    • query

    • external control signal


6. Applications

Knowledge graph completion

Predict missing edges:

  • e.g., (drug → treats → disease)

Question answering

Used in systems combining KGs with LLMs like ChatGPT:

  • steer reasoning to relevant entities

Recommendation systems

Focus on relevant user-item interactions

Biomedical reasoning

High-degree nodes like proteins or diseases require filtering


7. Key research directions

a. Dynamic steering

Real-time control based on query

b. Neuro-symbolic integration

Combine:

  • symbolic logic

  • neural embeddings

c. Controllable reasoning paths

Explicitly generate interpretable chains

d. Handling extreme-degree nodes

  • Sampling techniques

  • Hierarchical aggregation


8. Simple intuition

Think of a high-degree node like a celebrity with millions of connections.

A non-steerable model:

listens to everyone → gets confused

A steerable model:

listens only to relevant voices depending on the question


If you want, I can go deeper into:

  • specific architectures (TransE vs RotatE vs GAT)

  • equations and implementation

  • or how this connects to LLM + KG systems (very active research area)

Thursday, April 30, 2026

Permutation invariance and permutation equivalence

 Permutation invariance is a fundamental property of **Graph Neural Networks (GNNs)** that ensures the output of the network remains identical regardless of how the nodes in a graph are ordered or indexed. Since graphs do not have a natural spatial ordering (unlike pixels in an image or words in a sentence), the model must treat the set of nodes as unordered.

## The Mathematical Definition

In a graph with n nodes, the structure is represented by an adjacency matrix A and a node feature matrix X. If we apply a permutation matrix P to reorder the nodes, the new adjacency matrix becomes P A P^T and the feature matrix becomes PX.

A function f is **permutation invariant** if:


This means the final scalar or vector output (like a graph-level classification score) does not change even if we swap the IDs of the nodes.

## Why It Matters

Standard Neural Networks (like MLPs or CNNs) are sensitive to the order of input features. If you swap two input pixels in a CNN, the output changes because the convolutional filters are tied to specific spatial coordinates. In a graph, "Node 1" and "Node 2" are arbitrary labels. If a GNN were not permutation invariant, it would learn different representations for the exact same graph structure simply because the data was stored in a different order in memory.

## Permutation Equivariance vs. Invariance

While the final output of a graph-level task must be invariant, the intermediate layers of a GNN (node-level representations) are usually **permutation equivariant**.

A function is equivariant if permuting the input results in an identically permuted output:


In simpler terms, if you swap the order of nodes in the input, the resulting node embeddings are swapped in the exact same way, but the content of those embeddings remains consistent.

## How GNNs Achieve This

The core mechanism for ensuring these properties is the use of **symmetric aggregation functions**. During the message-passing phase, a node collects information from its neighbors. To be permutation invariant, the aggregation step must use operations where the order of operands does not matter, such as:

 * **Summation (\sum):** Captures the total energy or scale of the neighborhood.

 * **Mean (\frac{1}{N}\sum):** Captures the average characteristic of the neighborhood.

 * **Max/Min:** Captures the most prominent features.

By applying these operations locally at every node and then globally for graph-level pooling, the GNN becomes robust to any arbitrary node indexing provided by the input dataset.


Thursday, April 23, 2026

What is difference between Attention Head and Attention Layer in GAT?

Attention Layer vs Attention Head: Complete Explanation

This is a fundamental concept in Graph Attention Networks (GATs). Let me explain with clear examples and visualizations.


The Short Answer

Attention Layer = A complete level of processing in the network (like a floor in a building)


Attention Head = One "perspective" within a layer (like multiple people looking at the same problem from different angles)


 SIMPLE ANALOGY: Medical Diagnosis Team                                      │

├─────────────────────────────────────────────────────────────────────────────┤

│                                                                             │

│  ATTENTION HEAD = A Single Doctor                                           │

│  • Each doctor has their own expertise                                      │

│  • Each examines the patient from their perspective                        │

│  • Each gives their own opinion                                             │

│                                                                             │

│  ATTENTION LAYER = The Entire Medical Team (Layer 1)                        │

│  • Contains MULTIPLE doctors (heads)                                        │

│  • All doctors work in parallel                                             │

│  • Their opinions are COMBINED                                              │

│                                                                             │

│  MULTIPLE LAYERS = Multiple Rounds of Consultation                          │

│  • Layer 1: General practitioners                                           │

│  • Layer 2: Specialists                                                     │

│  • Each layer refines the understanding                                     │

│                                              


Tuesday, April 21, 2026

What is TransE and ComplEx in Knowledge Graph Embeddings

 Both TransE and ComplEx are techniques from knowledge graph embeddings—a part of Machine Learning used to represent entities and relationships in a graph as vectors so that machines can reason over them.

Think of a knowledge graph as:

(Paris, CapitalOf, France)
(Sachin, PlaysFor, India)

The goal is to convert these into mathematical representations that preserve relationships.


1) TransE (Translating Embeddings)

Core Idea

Relationships are modeled as translations in vector space.

👉 If:

  • h = head entity

  • r = relation

  • t = tail entity

Then:

h + r ≈ t

Intuition

  • “Paris + CapitalOf ≈ France”

  • The relation acts like a vector shift

How it works

  • Each entity and relation is a vector

  • Model learns embeddings such that:

    distance(h + r, t) is minimized
    

Strengths

  • Simple and fast

  • Works well for:

    • One-to-one relationships
      (e.g., Country → Capital)

Limitations

  • Struggles with:

    • One-to-many (Parent → Children)

    • Many-to-many relationships

👉 Because one vector translation cannot map to multiple correct targets


2) ComplEx (Complex Embeddings)

Core Idea

Uses complex numbers (real + imaginary parts) to represent embeddings.

Instead of:

vector = [1.2, 0.5, -0.3]

You have:

vector = [1.2 + 0.7i, 0.5 + 0.1i, ...]

Why complex numbers?

They allow modeling asymmetric relationships

Example:

  • “India isLocatedIn Asia” (true)

  • “Asia isLocatedIn India” (false)

👉 TransE struggles here
👉 ComplEx handles this well


Scoring Function (conceptually)

Instead of distance, ComplEx uses:

  • Complex dot product

  • Conjugate operations

👉 Captures directionality of relationships


Strengths

  • Handles:

    • Asymmetric relations

    • Complex graph patterns

  • Better performance on real-world datasets


Limitations

  • More computationally complex

  • Harder to interpret


3) Key Differences

FeatureTransEComplEx
Embedding typeReal vectorsComplex vectors
Relation modelingTranslation (h + r ≈ t)Complex interactions
Asymmetry support❌ Poor✅ Strong
ComplexityLowMedium
Use casesSimple graphsReal-world KGs

4) When to Use What

Use TransE if:

  • You want a simple baseline

  • Graph is not very complex

  • You need speed and scalability

Use ComplEx if:

  • Relations are directional/asymmetric

  • You need higher accuracy

  • Real-world knowledge graphs


5) Real-World Applications

Both are used in:

  • Knowledge graph completion
    (predict missing links)

  • Recommendation systems

  • Search ranking

  • Question answering systems (RAG enhancements)


6) Simple Analogy

  • TransE → moving points with arrows

  • ComplEx → rotating and scaling in a richer space


If you want, I can:

  • Show Python implementation using PyTorch

  • Compare with other models like DistMult, RotatE

  • Or explain how to integrate this into your RAG / vector DB system