NeuG v0.1.3: The Era of Graph Computing Inside the Graph Database

Thu, 02 Jul 2026 00:00:00 +0000

NeuG v0.1.3 is out. The theme of this release: giving the graph database native graph computing capabilities. The GDS extension brings 9 graph algorithms, COPY TEMP lets you load external data for temporary analysis without polluting your production database.

Here’s why we built this, and what exactly shipped.

What NeuG Has Always Been Missing: Graph Computing

The core capability of graph databases is graph storage — representing entities and relationships as nodes and edges. With Cypher’s MATCH statement, you can do pattern-matching queries: find a node’s neighbors, match a path, filter subgraphs by conditions.

But there’s a class of problems that Cypher queries struggle to solve efficiently:

How many communities are in this graph? Which nodes belong together?
Which node is the most influential hub?
What are the shortest paths from node A to all other nodes?
What is the local clustering coefficient of each node?

These are graph algorithm problems, not graph query problems. The distinction: a graph query says “I know what pattern I’m looking for, find it”; a graph algorithm says “compute some structural property across the entire graph.”

Previously, since NeuG didn’t support graph algorithms, you had to export data and hand it off to NetworkX and other graph analysis libraries, compute results, then import them back. Data movement, format conversion, environment setup — the usual engineering overhead.

The GDS extension in v0.1.3 brings graph algorithms directly into the graph database. No data export, no second environment to maintain — just CALL and go.

GDS: 9 Algorithms, One API

Algorithm Overview

The GDS extension covers the three most common categories of graph algorithms:

Category	Algorithms	What It Solves
Traversal & Centrality	WCC, BFS, SSSP, PageRank	What’s reachable from A? Shortest path? How many connected components? Which node is most influential?
Community Detection	Louvain, Leiden, CDLP	How many communities? Which nodes belong together?
Structural Analysis	LCC, K-Core	How connected is the graph? Local structure density?

Unified Calling Convention

All algorithms follow the same pattern: project a subgraph, then call the algorithm.

-- 1. Project a subgraph (define which nodes and edges participate)
CALL project_graph(
    'social',
    ['person'],
    {'[person, knows, person]': ''}
);

-- 2. Call the algorithm
CALL page_rank('social', {max_iterations: 20})
RETURN node.fName, rank
ORDER BY rank DESC;

-- 3. Clean up
CALL drop_projected_graph('social');

project_graph defines the computation scope — which node labels and edge types participate. After projection, all algorithms use the unified CALL algo_name('graph_name', {options}) interface. Return values can be processed with standard Cypher clauses like ORDER BY, WHERE, and RETURN.

BFS and SSSP also support returning full paths:

CALL bfs('social', {source: '0'})
YIELD node, distance, path
RETURN node.fName, distance, nodes(path) AS path_nodes;

The path column is a standard Cypher PATH type, supporting nodes(path), relationships(path), and length(path).

Performance: LDBC Graphalytics Benchmark

We evaluated GDS on the LDBC Graphalytics benchmark — the recognized standard for graph analysis. It covers 5 datasets ranging from 1 to 1.8 billion edges, including synthetic R-MAT graphs (graph500-26), weighted social graphs (datagen-9 series), and a real-world social graph (com-friendster, 1.8B edges). We compared against SuiteSparse:GraphBLAS, GeminiGraph, and ladybug (ladybug implements PR and WCC only, with a significant performance gap). All tests ran on a 32-core/64-thread machine with 495GB RAM, all systems compiled with -O3.

Partial results:

Algorithm	Dataset	NeuG	GraphBLAS	GeminiGraph
CDLP	datagen-9_2-zf	19s	316s	—
LCC	datagen-9_0	21.8s	65.1s	—
SSSP	datagen-9_0	1.14s	6.47s	1.24s
BFS	com-friendster	0.84s	1.58s	0.99s
WCC	com-friendster	0.95s	1.14s	8.47s

Per-algorithm analysis:

CDLP (Label Propagation): fastest on all datasets — 16x faster than GraphBLAS on datagen-9_2-zf (GeminiGraph does not implement this algorithm)
LCC (Local Clustering Coefficient): fastest on all datasets — 3x faster than GraphBLAS on datagen-9_0 (GeminiGraph does not implement this algorithm)
SSSP (Single-Source Shortest Path): fastest on all three weighted datasets — 6x faster than GraphBLAS on datagen-9_0, slightly faster than GeminiGraph
BFS (Breadth-First Search) / WCC (Weakly Connected Components): lead or tie with the best on most datasets
PageRank: close to GraphBLAS; GeminiGraph slightly faster on some datasets

Full benchmark documentation: GitHub.

The complete algorithm list, parameters, and usage are in the GDS extension docs.

Temporary Graphs: Load External Data, Compute, Don’t Pollute

GDS solves the “how to compute” problem. But there’s another practical question: where does the data come from?

Suppose you want to use GDS to analyze a batch of external data — relationship data exported from another system, a delta change file, or a temporarily assembled test dataset. You don’t want to create tables in your production database because this is a one-time analysis; but you need the data to exist as a graph in the database to run GDS.

v0.1.3’s COPY TEMP solves this:

-- Load temporary node table (from CSV)
COPY TEMP temp_user FROM 'users.csv' (header=true, primary_key='id');

-- Load temporary edge table (specify from/to endpoint tables)
COPY TEMP temp_knows FROM 'edges.csv' (header=true, from='temp_user', to='temp_user');

Temporary tables have three key properties:

Lifecycle bound to Connection — closing the connection automatically cleans up all temporary tables. No manual DROP TABLE needed.
Not written to checkpoint — gone after restart, never pollutes persistent data.
Mixable with persistent data — standard Cypher JOIN works across temporary and persistent tables.

Supports CSV / JSON / JSONL / Parquet formats. Combined with v0.1.2’s httpfs extension, you can also load directly from OSS / S3 / HTTPS remote paths. See the data import docs for COPY TEMP details.

In Practice: Discovering New Concepts from Code Changes

Combining GDS and COPY TEMP solves a real-world problem: how do code changes impact the knowledge graph?

Scenario

Take the NeuG codebase as an example. We parse the NeuG codebase (up to v0.1.2) into a graph — functions as nodes, function call relationships as edges (the actual code graph schema would be more complex, also including imports and other relationships — simplified here for demonstration). The knowledge graph is overlaid on this same graph: each function is tagged with which concept it belongs to (e.g., execute_query() belongs to the “Query Execution” concept).

NeuG releases v0.1.3, and the code repository has new changes (new feature modules, documentation updates, etc.). We want to know: what impact will these code changes have on the knowledge graph? Do we need to add new concepts?

The approach: parse the new code from v0.1.3 into a delta graph (new function nodes + call edges), load it temporarily via COPY TEMP, then analyze its structure with GDS:

Which new code modules are the most influential?
How many communities do the new code changes form? Does each community suggest a new concept?

This is a pre-analysis — you don’t want to pollute the production knowledge base before deciding whether to apply the changes. COPY TEMP is the perfect fit.

Code Graph and Delta Data

The production database already has the current code graph — function nodes and call edges parsed from the v0.1.2 codebase, plus concept nodes and function → concept assignment relationships.

The new code from v0.1.3 is parsed into delta CSV files: new function nodes (e.g., project_graph, page_rank, leiden, bfs, etc.) and their call edges. We use COPY TEMP to temporarily load the delta, then use GDS to analyze its community structure — if new functions naturally cluster into communities, that suggests the knowledge graph may need new concepts.

Analysis Flow

from neug import Database
import tempfile, os

# 1. Create database (with persistent code graph, simulating production)
db = Database(db_path=tempfile.mkdtemp(), mode="w")
conn = db.connect()

# [Persistent code graph already exists: function nodes + call edges + concept assignments]

# 2. Load GDS extension
conn.execute("install gds")
conn.execute("load gds")

# 3. New functions and call edges parsed from v0.1.3 code (CSV provided by code analysis tool)
nodes_csv = os.path.join(tempfile.gettempdir(), "delta_funcs.csv")
edges_csv = os.path.join(tempfile.gettempdir(), "delta_calls.csv")
# ... write CSV files ...

# 4. COPY TEMP: load delta code as temporary graph (no pollution to persistent data)
#    Note: in practice, delta call edges may also connect to persistent func nodes
#    (new code calling existing functions, or vice versa); simplified to delta_func internal calls here
conn.execute("COPY TEMP delta_func FROM '{}' (header=true, primary_key='id')".format(nodes_csv))
conn.execute("COPY TEMP delta_call FROM '{}' (header=true, from='delta_func', to='delta_func')".format(edges_csv))

# 5. Project the delta code graph
conn.execute("""
    CALL project_graph(
        'delta_graph',
        ['delta_func'],
        {'[delta_func, delta_call, delta_func]': ''}
    )
""")

# 6. PageRank: which new code modules are most influential?
result = conn.execute("""
    CALL page_rank('delta_graph', {max_iterations: 20})
    RETURN node.name, rank
    ORDER BY rank DESC
""")
# (example results)
# project_graph        -> highest rank (called by page_rank, leiden, wcc, bfs; the delta's hub)
# ...
# NodeDatabase::Init   -> low (called only by InitAll)

# 7. Leiden: how many communities do the new code changes form?
#    Here we run community detection on the delta graph; in practice, you can also run
#    Leiden on the combined persistent + delta graph, comparing with original communities
#    to discover newly emerged communities → suggesting new concepts
result = conn.execute("""
    CALL leiden('delta_graph', {concurrency: 1})
    RETURN node.name, community
    ORDER BY community
""")
# (example results, actual community partitions may vary by algorithm parameters)
# Group A: project_graph, page_rank, leiden, wcc, bfs, parse_subgraph_entries
# Group B: InitAll, NodeDatabase::Init
# -> Two main groups: graph analytics functions (6 nodes) -> may correspond to new concept "GDS Extension"
#    Node.js binding functions (2 nodes) -> may correspond to new concept "Node.js Binding"

# 8. Clean up
conn.execute("CALL drop_projected_graph('delta_graph')")
conn.close()
# -> Temporary tables automatically cleaned up

# 9. Verify: reopen, persistent code graph intact
conn2 = db.connect()
result = conn2.execute("MATCH (n:func) RETURN count(n)")

conn2.close()
db.close()

Analysis Conclusions

PageRank reveals: project_graph is the most influential function among the new code — it is called by page_rank, leiden, and other algorithm functions, making it the core hub of the new code.

Leiden reveals: the new code forms multiple communities — GDS functions (project_graph, page_rank, leiden, etc., 6 nodes) are tightly coupled, suggesting that v0.1.3’s code changes may require adding a new “GDS Extension” concept to the knowledge graph; the Node.js binding functions (InitAll, NodeDatabase::Init, 2 nodes) are relatively independent and can be treated as a separate concept.

This is the core idea of “discovering new concepts from the code graph”: code updates are temporarily loaded via COPY TEMP, GDS analyzes their community structure, and newly emerging communities suggest that the knowledge graph needs new concepts. Throughout the analysis, the production database data was never modified. COPY TEMP’s temporary tables are automatically cleaned up when the connection closes, and persistent data remains intact.

We’ll publish a separate blog post covering the full code-graph-based concept discovery scenario in detail.

Also Worth Noting

A few more improvements in v0.1.3:

Node.js Binding (#424): Native C++ addon (N-API), providing Database, Connection, QueryResult high-level API. Supports embedded graph database + Cypher queries. npm install @graphscope-neug/neug.
COW Snapshot Isolation (#370): UpdateTransaction switched to Copy-on-Write snapshot isolation for safer read-write concurrency.
Unified Type System (#525): Compiler LogicalType and engine DataType unified, reducing type conversion overhead.
Database Directory Management Refactor (#148): Introduced Module, Checkpoint, and CheckpointManager for improved database directory management.
macOS ARM64 CI (#576): New dedicated macOS ARM64 testing workflow for Apple Silicon quality.
Key Bug Fixes: Non-standard column names causing parquet to load 0 rows (#455), macOS ARM64 nightly build failures (#515, #517).

Full release notes: GitHub Release.

Try It

pip install neug==0.1.3

Copy-paste the demo code above and run it. For a complete reproducible script (including CSV generation and assertions), see reproduce.py.

GDS extension docs: load_gds

GitHub: https://github.com/alibaba/neug

How does AI affect U.S. Employment? Replicating Harvard Business School Research with NeuGBI

Tue, 16 Jun 2026 00:00:00 +0000

TL;DR

We used NeuGBI to replicate the Harvard Business School paper Generative AI as Seniority-Biased Technological Change: same Revelio Lab U.S. employment dataset (300 million records), same research question — AI’s impact on jobs. The entire analysis was conducted autonomously by NeuGBI using its built-in analytical Skills, with no human intervention. The result matched the paper’s conclusion: junior position records declined by 29.4%, while senior positions only declined by 5.8%. On some dimensions, NeuGBI went finer than the original paper — for example, within software development roles, it’s specifically junior-level (L2) positions that nearly halved, while entry-level (L1) and senior (L3+) positions showed minimal decline.

A recent Harvard Business School paper, Generative AI as Seniority-Biased Technological Change, specifically studied this question: AI’s impact on employment is not evenly distributed — junior positions are hit more significantly, while senior positions have remained relatively stable in employment volume.

The paper uses Revelio Lab employment data (hereafter “Revelio data”). Revelio classifies positions into seniority levels L1–L7 (per the Harvard paper’s definitions): Entry, Junior, Specialist, Manager, Director, Executive, and C-Suite. The paper defines “junior positions” as Entry and Junior (L1–L2), and “senior positions” as L3 and above.

This kind of open-ended question is genuinely interesting — but does answering it really require an elite research team like Harvard’s? With the advancement of large language models, a natural thought arises: let LLMs handle this kind of open-ended exploration. They can understand natural language, generate code, and interpret results — seemingly well-suited for this type of layer-by-layer drill-down analysis.

Where Current LLM-Based Analytics Fall Short

But directly feeding Revelio data to an LLM clearly won’t work: 60GB, 300 million records — far exceeding any model’s context window. Even stepping back to keep the data in a database and use an LLM-driven text-to-SQL approach (similar to chatBI) runs into three unavoidable problems:

Data relationships are too complex. Employment data is inherently graph-structured — people, positions, companies, and skills form multi-hop relationships that are either awkward to model in SQL or prohibitively expensive to join.
Interactive latency is unacceptable. Exploratory analysis depends heavily on instant feedback loops (“see result → refine question → query again”), but at 300 million rows, traditional databases either can’t handle it or are too slow to maintain flow. No matter how fast the LLM is, it can’t save a slow query.
Analysis chains break. Each decomposition step depends on findings from the previous one, but text-to-SQL treats each query as an isolated task — context from earlier rounds is completely lost, preventing the LLM from truly “drilling down continuously.”

To address these pain points, we designed NeuGBI: the underlying NeuG graph database natively supports multi-hop relationships; the query layer uses end-to-end unbiased sampling so complex queries return in seconds; and the analysis model is built on Hypergraph reconstruction, naturally suited for continuous analysis and intermediate result reuse. On top of this, we packaged exploratory analysis Skills for LLMs to invoke, enabling them to autonomously conduct open-ended exploration on massive, complex data — decomposing questions, driving the engine, reading results, and deciding where to dig next.

Enough talk — let’s validate directly. Next, we’ll run NeuGBI on the Revelio dataset through the complete replication workflow and see whether its conclusions match the Harvard paper.

NeuGBI’s Query Logic

Now that we know why NeuGBI is needed, let’s look at how it decomposes this task step by step. The given question is: “How hard has AI hit U.S. employment?”

Faced with this question, NeuGBI doesn’t produce a simple conclusion. Instead, it first breaks it into queryable data problems — for example, whether overall U.S. employment changed before and after AI’s emergence; whether changes differ across occupations and regions; whether junior and senior positions changed differently. Here we examine two core analysis threads.

The first thread: examine AI’s impact on U.S. employment before and after its emergence. After observing the overall decline, continue examining different occupations and different states.

The second thread follows the Harvard paper’s main thesis: if junior positions declined significantly more than senior positions, continue drilling into different occupations and seniority levels to see exactly which category within junior positions is declining.

Exploratory analysis works exactly this way: see the previous round’s results, then decide what to query next.

1. U.S. Employment Before and After AI

First, NeuGBI queried U.S. employment conditions before and after AI’s emergence.

From 2021 to 2025, total employment records dropped from 21 million to approximately 10 million. Since Revelio’s coverage fluctuates over time, absolute numbers cannot be directly equated with real employment figures, but this trend tells us one thing: the data clearly shows significant contraction.

Just looking at overall U.S. employment trends isn’t enough — we don’t yet know which occupations the decline is concentrated in. So the next step: NeuGBI continues down the occupation path.

1.1 Impact on Different Industries

Due to space constraints, here we show AI’s impact on computer-related occupations (e.g., software development, systems engineering, web development). Computer-related hiring headcount declined 31% from 2022 to 2024 (2025 data is incomplete), while job posting volume declined 64%.

This data shows that U.S. computer-related hiring is declining overall, but we don’t yet know whether this decline is concentrated in just a few regions. So the next step: NeuGBI examines different states.

1.2 State-Level Breakdown

Here NeuGBI explored computer-related employment across different U.S. states. The chart shows the major states by hiring volume — most states’ computer-related hiring is declining. This indicates the decline isn’t isolated to one state but appears as a similar trend across multiple major states.

At this point, NeuGBI has analyzed AI’s impact on U.S. employment from both occupation and regional perspectives. But the results are still insufficient, so next NeuGBI pursues another path: is AI’s impact different for junior vs. senior positions?

2. AI’s Impact on Junior vs. Senior Positions

In this step, NeuGBI analyzed AI’s impact on junior and senior positions.

The results show that not every category declined equally. By record count, junior positions declined by 29.4%, while senior positions only declined by 5.8%. In other words, looking at share of total employment, senior positions actually rose from 35% in 2022 to 43.6% in 2025. This isn’t because senior positions grew — it’s because junior positions fell so much more, making senior positions’ share appear higher.

This result is consistent with the Harvard paper’s conclusion: junior positions contracted more significantly, while senior positions remained relatively stable.

But this answer isn’t granular enough. Junior positions themselves include both entry-level and junior-level. The data here only shows “junior positions are declining” but cannot tell us exactly which category is declining.

So here NeuGBI chose to continue drilling down, examining whether different seniority levels within specific occupations also show different patterns.

2.1 Drilling Deeper: Which Seniority Level Declined More in Software Development?

Here NeuGBI continued querying to see AI’s impact on different occupations and seniority levels. We focus primarily on software development.

The result is clear: within software development positions, what truly declined significantly is L2, not the entire junior category. Specifically:

L1: 1.2M → 1.1M, minor fluctuation
L2: 2.8M → 1.45M, nearly halved
L3+: 4.4M → 4.25M, essentially stable

In other words, at a coarse level it looks like junior positions are declining; but when broken apart, L1 shows no significant drop — it’s specifically L2 that’s dragging junior positions down.

This query is extremely complex. It requires aggregating data across different occupations, 7 seniority levels, and 5 years — a very demanding fine-grained query. Thanks to NeuGBI’s sampling methodology, we can quickly see the data trends and continue drilling down.

At this point, we’ve truly gotten to the bottom of this question: “junior positions are contracting” isn’t accurate enough. At least within software development, junior-level positions declined more significantly than entry-level.

Conclusion

Looking back, the entire replication was conducted autonomously by NeuGBI: on 60GB / 300 million U.S. employment records, it produced the same primary conclusion as the Harvard paper — junior positions declined more significantly while senior positions remained relatively stable — and on software development roles specifically, it uncovered a finer conclusion not present in the paper: what truly declined dramatically is junior-level positions, while entry-level and other levels remained relatively stable.

NeuGBI’s value here is also straightforward: it lets us first validate a coarse conclusion, then continue drilling down. One path examines overall employment, computer-related occupations, and major states; another returns to junior vs. senior positions, then drills into specific occupations and seniority levels. For questions where you don’t know where the answer is hiding at the outset, this step-by-step querying capability matters more than any single query.

References

Harvard Business School Paper: Generative AI as Seniority-Biased Technological Change
NeuGBI Theory Paper: A Hypergraph-Based Framework for Exploratory Business Intelligence
NeuG Graph Database: https://github.com/alibaba/neug
NeuGBI Demo: https://github.com/shunyangli/neugbi-demo

NeuG v0.1.2: Your Graph Database Is No Longer a Data Island

Mon, 01 Jun 2026 00:00:00 +0000

NeuG v0.1.2 is out. The theme of this release: making the graph database a native part of the data lake ecosystem. Data lives in OSS — NeuG reads it there. Files have schemas — NeuG infers them. Query results need to go downstream — write them back as Parquet.

Here’s why we built this, and what exactly shipped.

The Biggest Problem with Graph Databases Isn’t Performance

If you’ve ever tried to introduce a graph database into your team — Neo4j, TigerGraph, JanusGraph, or anything else — you’ve had this conversation:

“Our data is on OSS. How do I get it in?” “Download it locally, write a schema definition, then run COPY FROM.” “How do I feed query results to Spark?” “Write a script to export CSV. Nested fields? Flatten them manually.” “Can query results go back to OSS directly? Downstream pipelines pull from there.” “No. Export locally first, then upload with ossutil.”

Each round trip raises the real barrier to adoption. Not because the query language is hard to learn. Not because performance is lacking. Because graph databases don’t fit into your existing data infrastructure.

Data from the lake needs a “format adaptation” to enter the graph. Query results need a “format conversion” to return to the lake. Every adaptation is engineering cost. Every conversion can lose information.

This isn’t a bug in any single product — it’s the default paradigm of graph databases as a category: they treat themselves as the destination for data, not as a compute node in the data flow.

The Design Shift in v0.1.2

NeuG v0.1.2 shipped three features. Individually, each looks like a routine enhancement. Together, they form a clear design statement:

Feature	What it does	What it means
Cloud Object Storage Extension #179	Direct read from S3 / OSS / HTTPS	NeuG goes to where the data lives, instead of demanding data be moved
COPY FROM Without DDL #134	Auto-infer schema from data files	Data defines structure, not humans pre-defining structure
Parquet Export #241	Query results output as Parquet, with write-back to OSS/S3	Graph computation results flow back to the data lake with zero friction

The combined effect: NeuG adapts to your data infrastructure — not the other way around.

The graph database shifts from “destination for data” to “a plug-and-play graph compute layer in the data flow” — read from OSS, run graph analytics, write results back to OSS, with no local disk involvement.

Feature 1: Direct Cloud Storage Read

Why “No Download” Matters More Than “Fast Download”

Reading remote files directly isn’t just about “saving one download step.” In production, it changes the operational model:

No local disk usage: Data stays in OSS / S3. NeuG fetches on demand. No need to provision separate storage for the graph database.
No sync maintenance: When the source updates, your next LOAD FROM reads the new version automatically. No ETL pipeline to keep things in sync.
Cleaner permission isolation: Access control stays in the object storage layer. No need to replicate ACLs inside the graph database.

Usage

v0.1.2 adds the httpfs extension:

-- First-time setup: download and load extensions (one-time)
install httpfs;
install parquet;
load httpfs;
load parquet;

-- HTTPS direct read (simplest, zero config for public data)
LOAD FROM "https://graphscope.oss-cn-beijing.aliyuncs.com/neug/vPerson.parquet"
RETURN * LIMIT 5;

-- OSS scheme (requires credentials and endpoint)
LOAD FROM "oss://graphscope/neug/vPerson.parquet" (
    CREDENTIALS_KIND='Anonymous',
    ENDPOINT_OVERRIDE='oss-cn-beijing.aliyuncs.com'
)
RETURN *;

Same data, three protocols (HTTPS / OSS / S3) — only the URL prefix differs. For typical OSS deployments, configure the endpoint and AK/SK and you’re set.

Feature 2: Schema-on-Read

Graph Databases Finally Catch Up to the Data Lake’s Core Principle

The data lake philosophy is schema-on-read: don’t enforce structure when storing data; infer it when reading. Spark reads Parquet without requiring you to define a table first. DuckDB reads CSV without requiring column type definitions. This is the baseline experience for modern data tools.

Graph databases haven’t always worked this way. Before v0.1.2, importing data into NeuG looked like:

-- First, examine the file's columns and types...
-- Then hand-write DDL:
CREATE NODE TABLE person (
    ID INT64,
    fName STRING,
    gender INT64,
    isStudent BOOLEAN,
    isWorker BOOLEAN,
    age INT64,
    eyeSight DOUBLE,
    -- ... 9 more columns to list one by one ...
    PRIMARY KEY (ID)
);

-- Only then can you import
COPY person FROM "vPerson.parquet";

A 16-column table requires 16 lines of DDL. Get a type wrong (INT64 vs INT32, DATE vs STRING) and the import fails or silently loses precision.

After v0.1.2:

COPY person FROM (
    LOAD FROM "https://graphscope.oss-cn-beijing.aliyuncs.com/neug/vPerson.parquet"
    RETURN *
);

One statement. NeuG automatically infers column names and types from the Parquet schema. The first column becomes the primary key. Table creation and import happen in a single step.

Edges work the same way — just specify which node tables are connected:

COPY meets FROM (
    LOAD FROM "https://graphscope.oss-cn-beijing.aliyuncs.com/neug/eMeets.parquet"
    RETURN *
) (from="person", to="person");

Tested: all 16 columns of vPerson.parquet correctly inferred (INT64 / STRING / BOOL / DOUBLE), 8 vertices imported successfully.

Feature 3: Parquet Export, Direct Write-Back to Cloud

Bidirectional: Data Comes In, and It Goes Back Out

Another long-standing complaint about graph databases: “data goes in but can’t come out.” To feed query results to downstream Spark / pandas / DuckDB, the traditional approach is exporting CSV with manual type mapping. Nested structures (edges, paths) make it worse — you decide how to serialize.

v0.1.2 solves not just “how to export” but “where to export” — COPY TO supports remote paths just like LOAD FROM. Query results can be written back directly to OSS / S3, making the entire pipeline a true cloud-native closed loop.

-- Export graph query results: only socially active young people
COPY (
    MATCH (p:person)-[m:meets]->(friend:person)
    WHERE p.age < 35
    RETURN p.fName AS name, p.age AS age, friend.fName AS met_person, m.location
) TO 'active_young_people.parquet';

-- Same query, results written directly to OSS (zero local disk)
COPY (
    MATCH (p:person)-[m:meets]->(friend:person)
    WHERE p.age < 35
    RETURN p.fName AS name, p.age AS age, friend.fName AS met_person, m.location
) TO "oss://my-bucket/output/active_young_people.parquet" (
    CREDENTIALS_KIND='Explicit',
    OSS_ACCESS_KEY_ID='<your-ak>',
    OSS_ACCESS_KEY_SECRET='<your-sk>',
    ENDPOINT_OVERRIDE='oss-cn-hangzhou.aliyuncs.com'
);

What’s being exported is the result of graph computation — data that has been filtered, joined, and projected. This is the value of graph databases as a compute layer: you analyze using graph patterns (who knows whom, under what conditions), then send conclusions directly back to the data lake for downstream consumption.

The entire pipeline truly runs without touching local disk: read from OSS → graph compute → write results back to OSS. For lightweight compute tasks running in containers, this is critical — no need to mount local volumes.

Full Demo: Cloud to Data Lake, End to End

The following demonstrates the complete flow using NeuG v0.1.2’s Python binding. The dataset is public test data from NeuG’s CI — anyone can reproduce this.

from neug import Database
import tempfile

# 1. Create database
db = Database(db_path=tempfile.mkdtemp(), mode="w")
conn = db.connect()

# 2. Install and load extensions
conn.execute("install httpfs")
conn.execute("install parquet")
conn.execute("load httpfs")
conn.execute("load parquet")

# 3. Remote preview
result = conn.execute('''
    LOAD FROM "https://graphscope.oss-cn-beijing.aliyuncs.com/neug/vPerson.parquet"
    RETURN ID, fName, age, isStudent
    LIMIT 3
''')
for row in result:
    print(row)  # [0, 'Alice', 35, True], [2, 'Bob', 30, True], [3, 'Carol', 45, False]

# 4. One-line table creation + node import (no CREATE NODE TABLE needed)
conn.execute('''
    COPY person FROM (
        LOAD FROM "https://graphscope.oss-cn-beijing.aliyuncs.com/neug/vPerson.parquet"
        RETURN *
    )
''')

# 5. One-line table creation + edge import (no CREATE REL TABLE needed)
conn.execute('''
    COPY meets FROM (
        LOAD FROM "https://graphscope.oss-cn-beijing.aliyuncs.com/neug/eMeets.parquet"
        RETURN *
    ) (from="person", to="person")
''')

# 6. Graph query
result = conn.execute('''
    MATCH (a:person)-[m:meets]->(b:person)
    WHERE a.age > 30
    RETURN a.fName, b.fName, m.location
''')
for row in result:
    print(row)

# 7. Export graph query results to Parquet
conn.execute('''
    COPY (
        MATCH (a:person)-[m:meets]->(b:person)
        WHERE a.age < 35
        RETURN a.fName AS name, a.age AS age, b.fName AS met_person, m.location
    ) TO '/tmp/young_social.parquet'
''')

Read from OSS → auto-create tables → graph query → Parquet export to OSS. Zero DDL written, zero manual data movement. In production, swap the export path to oss:// or s3:// for a fully cloud-native closed loop.

Also Worth Noting

A few more features in v0.1.2:

MERGE clause (#312): MERGE (p:person {ID: 1}) ON CREATE SET p.name = 'Alice' — match if exists, create if not. Idempotent graph writes.
neug-cli autocomplete + syntax highlighting (#373): Major improvement to CLI interactive experience.
QueryResult → pyarrow Table (#270): result.to_arrow() converts graph query results to a pyarrow Table in one call — hand off directly to pandas / polars / DuckDB for further analysis, zero intermediate format conversion.

Full release notes: PR #404.

Try It

pip install neug==0.1.2

Copy-paste the demo code above and run it. For a detailed step-by-step tutorial (including common pitfalls and OSS/S3 credential configuration), see: Data Pipeline Tutorial.

GitHub: https://github.com/alibaba/neug.

Feeding Code Repositories into LLM Wiki: 6x Token Savings in One Knowledge Build

Thu, 28 May 2026 10:00:00 +0000

Andrej Karpathy’s LLM Wiki reshapes the knowledge-base paradigm: no chunking, no vector index. The LLM itself handles create / read / update / delete on the knowledge, and the wiki evolves dynamically with use.

If vector-retrieval knowledge bases are the “interpreter” of knowledge, LLM Wiki is the “compiler.” Raw sources are not re-read on every query — they are compiled into concise wiki summaries, and the LLM answers from those summaries whenever they suffice.

Karpathy’s proposal collected millions of views and thousands of GitHub stars in weeks, sparking a wave of reproductions. But almost every reproduction stops at textual data — papers, technical docs, study notes. Our day-to-day workflow involves frequent competitor-code surveys and architectural comparisons, which prompted a different question: what if we feed the code repository itself into the wiki?

1. Repository-Aware Knowledge Management

Document-level reproductions can ingest an entire file into the context window and let the LLM extract, summarize, and link. Code repositories break this assumption — context blows up, knowledge density is high, and structures are hierarchical. We had to redesign the pipeline to handle code.

1.1 Three-Layer Directory Structure

We keep the canonical LLM Wiki layout — raw / wiki / schema — but each layer carries different semantics when the source is code.

`raw` Directory

Most LLM Wiki implementations store original files in raw for answer authenticity. That works for plain text. It does not work for code: repositories are large and change constantly. For reference, the three databases we benchmarked weigh in at MySQL 6.36 GB · PostgreSQL 958 MB · NeuG 38 MB — well past what we want to drag around inside a knowledge base.

Our solution: use Git submodules. A submodule is just a pointer to a specific branch of a specific repository — the code is downloaded only when the LLM explicitly needs to read the source. Submodules also give incremental updates for free: a git diff between two branches tells us exactly which wiki entries need refreshing, so we never re-ingest the entire repo.

`wiki` Directory

Code repositories are too large to be summarized into a single wiki page. We partition the codebase into modules, and each module gets its own wiki page describing its function. These pages mirror the style of text-derived wikis — abstract, tags, dense reference links — so users can locate facts quickly and the LLM has fewer chances to hallucinate.

Beyond modules, individual code components can also link to concept pages. PostgreSQL and MySQL both implement cost-based optimization; during ingestion, the LLM updates the shared concept page with content from both repositories, forming cross-repository comparisons and associations naturally.

`schema` Directory

The schema directory holds the rules the LLM uses to manage the wiki. Because code repositories have their own quirks around storage, extraction, and querying, we wrote a dedicated prompt set to standardize the pipeline — covered in the next section.

1.2 Knowledge Ingestion

LLM Wiki defines three core operations: Ingest, Query, and Linter. Query and Linter reuse the existing wiki as-is, so the real challenge is producing accurate wikis during Ingest.

A code repository will not fit into the context window the way a plain document does. We surveyed tools like DeepWiki and RepoWiki, then adopted a two-stage generation strategy:

Architecture phase. The LLM scans descriptive metadata — README, top-level docs, directory tree, build/config scripts — to form a high-level understanding of the project. It partitions the repository into modules and creates a placeholder wiki page per module. This stage only fixes module names — no content yet.
Detail phase. With the partitioning settled, the LLM reads the actual code for each module and fills in the detailed wiki content.

This separation lets the LLM autonomously decide how many wiki pages a repository needs based on scale and complexity, keeping each module’s code small enough to fit context. It also gives the user a cheap checkpoint: inspect the module partitioning before paying for the expensive detail phase.

2. Aligning Concepts and Modules

2.1 Concept Alignment

Extracting concepts from raw documents is the heart of building a knowledge base — shared concepts are what turn isolated documents into a graph.

Before LLMs this was hard: “data modification” and “data update” mean the same thing; “quantization trading” and “quantization compression” do not. Modern LLMs, with proper context, can do concept alignment almost perfectly. Across the LLM Wiki implementations we surveyed, the pattern boils down to three rules:

Read every existing concept already in the LLM Wiki.

Summarize 3–5 concepts from the new document.

Allow at most one new concept to be introduced.

Compared to free-form summarization, this disciplines the LLM into matching and updating existing concepts instead of inventing new ones — which is what keeps the wiki from fragmenting into noise.

2.2 Module Alignment

Inspired by the concept-alignment trick, we inject the wiki directories of previously-ingested repositories into the context before parsing a new one. The LLM is told to reference existing module titles when partitioning the new repository.

This turns module partitioning from “fill in the blank” into “multiple choice”. Instead of brainstorming module names, the LLM reverse-checks whether each known module exists in the new codebase, and only invents a new label when the functionality genuinely has no analogue.

2.3 Experimental Comparison

We partitioned PostgreSQL twice — once on its own, once with reference to MySQL and NeuG wikis.

Independent Partitioning	Aligned Partitioning (with Reference)
Processing Pipeline	SQL Frontend Optimizer Execution
Storage	Storage
Transaction	Transaction WAL Recovery
Replication	Replication
Utility & Client Tools	Utilities
Build System	—

Neither result is strictly “wrong” on content. But critical architectural pieces like the Optimizer and WAL rarely have explicit textual descriptions and account for only a small fraction of the file count, so they get drowned out in independent extraction.

In database engineering, Optimizer and WAL are pivotal — surrounded by theory, papers, and operational lore. Surfacing them as separate modules is clearly better. Conversely, support pieces like the Build System are less central and lack analogues in other databases, so dropping them from the wiki is a reasonable trade.

3. Team Collaboration

Karpathy’s framing — and most reproductions — treat LLM Wiki as a personal knowledge base: open an LLM tool inside the wiki directory and chat. That leaves two gaps: invoking the wiki from inside other projects, and sharing the wiki across users.

To let any project talk to the wiki, we packaged it as a skill. We renamed AGENT.md to SKILL.md and dropped the whole folder under .agents/skills/llm-wiki. Any project can now invoke the wiki through the standard skill mechanism:

paper-101.pdf /neug-wiki add this paper to the wiki.

For multi-user access we host the LLM Wiki on GitHub. Setup is one command: cd .agents/skills/llm-wiki && git clone …. Coding tools parse the skill on clone.

GitHub’s read path is easy — users pull the whole wiki and the LLM reads it locally. The write path is where it gets interesting: shared editing brings concurrency conflicts. We resolve this by requiring every update — almost always an ingest — to go through a Pull Request. The full workflow is orchestrated inside SKILL.md so users only confirm at the web interface; everything else is automated.

3.3 Linter and CI Checks

Hosting on GitHub lets us encode wiki rules into the CI pipeline. We split rules into two kinds:

Mandatory rules — must hold on every update; violations block the merge. Examples: link paths must resolve, the raw directory is append-only, file sizes are capped. One check rejects accidentally-uploaded SKILL.md files because they would confuse other skills’ registration.
Advisory rules — judgment calls a rigid rule cannot make. These get surfaced as PR comments for the user to confirm. A healthy PR uploads raw data, summarizes it, extracts concepts, and updates wikis. If something is off — say one raw file produced two wikis — the system flags it and asks for confirmation.

4. Case Study: 6x Token Savings on a Single Query

Competitive-implementation comparison is one of the most common queries we run during technology selection. We selected two well-known databases — MySQL and PostgreSQL — and our team’s open-source NeuG, then asked the LLM Wiki to compare their transaction-management designs side by side.

4.1 Data Preparation

All three repositories were ingested through the full pipeline. Two arms:

Experimental — the LLM answers from the wiki.
Control — the LLM answers by reading the raw code.

4.2 Response Quality

Answers vary in wording; we compare the structural outlines.

Experimental (with wiki):

# Comparative Analysis Report on Transaction Strategies of Three Major Databases
## I. Overall Architecture Overview
## II. MVCC Implementation Comparison
## III. Lock Mechanism Comparison
## IV. Isolation Level Comparison
## V. WAL and Crash Recovery
## VI. Garbage Collection / Space Reclamation
## VII. Featured Mechanisms
## VIII. Summary

Control (raw code only):

# Report on Transaction Strategies of Three Databases
## I. Transaction Architecture Overview
## II. Transaction Isolation Levels
## III. MVCC Implementation Details
## IV. Concurrency Control Mechanisms
## V. WAL (Write-Ahead Logging) Mechanism
## VI. Transaction Commit Process
## VII. Two-Phase Commit (XA / 2PC) and Distributed Transactions
## VIII. Savepoints and Sub-transactions
## IX. Deadlock Detection
## X. Overall Assessment

Both arms surfaced the key modules — MVCC, concurrency control, isolation levels, WAL. The control arm, with code in context, leans toward implementation-level details. The experimental arm, reading the wiki, leans toward architectural framing — but answers the question completely without ever touching the source. From a “technical summary” standpoint, both responses were high-quality.

4.3 Resource Overhead

Since multi-turn context inflates raw token counts and pricing varies by token type, we use final billed credits as the apples-to-apples baseline. Both runs used DeepSeek-V4-Pro with cleared memory.

Group	Context Length	Credit Cost	Money Cost	Execution Time
Experimental	30k	3.11	$0.04	3 min
Control	110k	18.44	$0.27	13 min
Improvement	3.66x	5.92x	6.75x	4.33x

Using the wiki cut token consumption to roughly 1/6. Since generation speed is roughly fixed, lower token consumption translates directly into >4x faster execution. A token-efficient query saves money — and it saves time.

5. Conclusion and Outlook

This LLM Wiki + Code practice is our first attempt at extending wiki-style data management into multi-modal territory. We believe an LLM-managed wiki, like the LLM itself, should be capable of ingesting and emitting data in arbitrary formats. This management style also breaks the boundary between data sources: with the LLM’s understanding doing the bridging, all data — text, code, and beyond — can live in a single unified knowledge base.

Links: NeuG · LLM Wiki (Karpathy)

You and Your LLM Can't Review Thousands of PRs at Once — But a Graph Database Can

Sun, 17 May 2026 10:00:00 +0000

146 open PRs. 83 of them clustered into conflict groups. Some modifying the same function across 5 different branches. This isn’t a hypothetical — it’s the real state of the qwen-code repository on a random Tuesday in May 2026. And qwen-code isn’t even the extreme case — projects like OpenClaw and Hermes routinely carry thousands of open PRs.

Manually reviewing this many PRs is impractical, and hidden conflicts between them are nearly impossible to spot — even for an LLM. This post shows how CodeGraph — a code analysis tool powered by the NeuG embedded graph database — transforms PR management from “intuition-driven” to “data-driven,” using a real-world analysis of qwen-code’s 146 open PRs.

The PR Review Bottleneck: Why Neither You Nor Your LLM Can Keep Up

Picture this: you’re the maintainer of a popular open-source project. You open GitHub in the morning and find 146 PRs waiting for review. They come from contributors around the world — some touch core configuration, some refactor the tool invocation pipeline, some just fix a typo in the docs. Worse still, several of them modify the same function — merging any one could break the others.

Traditional manual review has three fundamental limits:

Scale: you can only review one PR at a time, and 146 doesn’t fit in anyone’s morning.
Association blindness: no human can mentally track which PRs touch the same function, let alone whether PR A and PR B both modify the same underlying logic.
Gut-based risk: “this looks risky” isn’t a metric — there’s no structural risk assessment.

What about using LLMs? Current AI Code Review solutions feed PR diffs one by one to a language model. This works at small scale, but at 146 PRs, token consumption explodes — and more critically, LLMs cannot natively perform cross-PR correlation analysis. Each invocation sees only a single PR’s diff and cannot answer global questions like “which other PRs touch the same function as this one?”

CodeGraph takes a different path: structural analysis via graph database, not only text comprehension via LLM. It indexes a code repository into a queryable knowledge graph — functions, call chains, classes, files, PRs — then runs risk scoring, conflict detection, and blast radius computation as graph queries. Zero token cost, yet capable of answering the global correlation questions that neither humans nor LLMs handle well.

How CodeGraph Works: From Repository to Knowledge Graph

CodeGraph is built on NeuG — an open-source, embedded graph database that speaks Neo4j-compatible Cypher. NeuG runs in-process (no server deployment needed), giving you the expressive power of graph queries with the simplicity of a local database file.

The pr-review workflow consists of two phases:

# Phase 1: Analyze + Detect + Persist
codegraph pr-review prepare --db .codegraph --repo owner/repo

# Phase 2: Label + Comment on GitHub
codegraph pr-review label --db .codegraph --repo owner/repo

During prepare, CodeGraph fetches all open PRs, parses their diffs at function granularity, and writes the results into the graph database — PR nodes, Function nodes, and CHANGES edges connecting them. This data coexists with the existing code structure graph (call relationships, file ownership, class hierarchies), enabling cross-PR and cross-code queries in a unified model.

The key insight: once PRs are modeled as graph nodes connected to the functions they modify, questions like “which PRs conflict?” become simple graph traversals — no pairwise diff comparison needed.

146 PRs in QwenLM/qwen-code: A Real-World Analysis

We ran CodeGraph on QwenLM/qwen-code — an active open-source AI coding assistant project. At the time of analysis (commit 870bdf2a, 2026-05-13), it had 146 open PRs. Here’s what CodeGraph found.

Per-PR Risk Scoring

For each PR, CodeGraph performs structural risk scoring — not by counting diff lines, but by parsing every hunk to identify modified, added, and deleted functions, then computing a composite score from:

Interface change: modifying an abstract class or interface definition (e.g., a Tool base class method signature)
Config file: touching config/schema files (e.g., settings.json, tsconfig.json)
Cross-module: spanning multiple modules (e.g., modifying both cli/ and core/)
Dead code: introducing uncalled functions
Blast radius: the upstream/downstream call scope of modified functions (fan_in × fan_out)

Each PR receives a risk level — LOW / MEDIUM / HIGH / CRITICAL:

Risk distribution: 🔴 CRITICAL: 10 · 🟠 HIGH: 32 · 🟡 MEDIUM: 63 · 🟢 LOW: 40

Cross-PR Conflict Detection

This is CodeGraph’s most distinctive capability. When two PRs modify the same function, they become connected through that Function node in the graph. Finding all such conflicts is a single Cypher query:

MATCH (pr1:PR)-[c1:CHANGES]->(f:Function)<-[c2:CHANGES]-(pr2:PR)
WHERE pr1.id < pr2.id
  AND c1.info IN ['hunk', 'deleted']
  AND c2.info IN ['hunk', 'deleted']
RETURN pr1.id, pr2.id, f.name, f.file_path;

On top of these pairwise conflicts, a Disjoint Set Union (DSU) algorithm computes connected components — each component becomes a “conflict group” of PRs that share modification targets, directly or transitively.

Result: 83 of 146 PRs clustered into 4 conflict groups.

The Three Categories

CodeGraph automatically classified all 146 PRs into actionable categories:

Category	Count	Action
Auto-merge candidates	31	LOW risk, zero conflicts — fast-track merge
Independent review	32	Non-trivial risk but no cross-PR conflicts — review in any order
Conflict groups	83 (4 groups)	Shared function modifications — coordinated review needed

Auto-merge candidates (31 PRs) are the low-hanging fruit — release automation, pure docs, config bumps. These can go through a “lightweight review” track without blocking anything:

#4100 chore(release): v0.15.11 — Release automation, zero risk
#4051 docs: --json-schema structured output user and design docs — Pure documentation

Independent review (32 PRs) carry real risk but can be reviewed in isolation:

#439 feat(tools): Include the new content after edits — CRITICAL, blast radius = 100. Touches both config schema and the tool pipeline, but conflicts with no other PR.

Conflict Group 1 (77 PRs) is the real battleground — the repository’s most complex conflict network, with all PRs connected through shared function modifications:

#3982 refactor(core): collapse three task registries into one — CRITICAL, blast radius = 132, interface changes across 5 files
#4088 feat(cli): add session-scoped /goal command — CRITICAL, blast radius = 140, touching the hooks execution pipeline

The root cause: qwen-code is a CLI + Core + VSCode extension multi-surface project. Functions like loadCliConfig, LoadedSettings.loadSettings, and sendMessageStream are natural convergence points — different contributors touching these hub functions from feature branches inevitably get clustered into one connected component.

Pushing Results to GitHub

CodeGraph doesn’t stop at analysis — it pushes results directly to GitHub via codegraph pr-review label:

Labels each PR with its classification (auto-merge-candidate / independent-review / conflicting-group-N) — visible directly in the PR list.
Posts comments on conflict-group PRs listing specific conflicting PR numbers and shared function names. Contributors see “Your changes potentially conflict with #3455 and #2220 on function loadCliConfig” without maintainers lifting a finger.

A --dry-run mode lets you preview before execution:

codegraph pr-review label --db .codegraph --dry-run

Why Graph, Not LLM?

The natural question: why not just throw all 146 diffs at an LLM?

The answer is structural. LLMs process text sequentially — each invocation sees one PR’s diff in isolation. To detect that #3982 and #4088 both modify loadCliConfig, you’d need to feed all 146 diffs into a single context (impossible at current context lengths) or build an explicit external index (which is exactly what a graph database is).

CodeGraph flips the model: structural analysis first, LLM second. Risk scoring, conflict detection, and blast radius are pure graph queries — zero tokens. The LLM budget gets saved for the PRs that actually need deep semantic reasoning: “Is this refactor safe? Does this new error handling cover all edge cases?” Graph analysis tells you where to look; the LLM tells you what to think about what you find.

This isn’t a replacement for AI Code Review — it’s the upstream infrastructure that makes AI Code Review tractable at scale.

Takeaways

For maintainers of active projects: If your PR queue regularly exceeds what you can mentally model, graph-based conflict detection gives you the “map of the battlefield” before you start reviewing. The 21% auto-merge identification alone justifies the setup.

For teams using AI Code Review tools: CodeGraph is the pre-filter that tells you which PRs to feed to your LLM reviewer and in what order. Don’t spend tokens on PRs that are safe to merge, and don’t review PRs in isolation when they have hidden dependencies.

For anyone building developer tooling: The pattern of “model code as a graph, query relationships as Cypher” is broadly applicable beyond PR review — dead code detection, impact analysis, architectural enforcement, bug root cause tracing. The CodeGraph skill documentation covers the full capabilities.

Try It on Your Own Repository

# Install
pip install codegraph-ai

# Index your repository
codegraph init --repo . --lang auto --commits 500

# Run PR review
codegraph pr-review prepare --db .codegraph --repo owner/repo
codegraph pr-review label --db .codegraph --repo owner/repo --dry-run

The full analysis toolkit — call graphs, dead code, architectural reports, module coupling, bug root cause tracing — is documented in the CodeGraph skill reference. The underlying graph engine NeuG is open-source and designed for embedding into developer tools.

Links: NeuG · CodeGraph · QwenLM/qwen-code

Star Counts Lied to Us: We Gave 74 Open Source Projects a Graph Database Health Check

Mon, 20 Apr 2026 00:00:00 +0000

When you pick an open source project on GitHub, what do you look at first? For most people, the answer is Star count. But can Star count really tell you whether a project is healthy and worth relying on long-term?

Ollama has 250K Stars on GitHub — it’s the hottest local LLM tool out there. If you had to guess its community health ranking, where would you place it?

Answer: #68 out of 74 projects, in the red zone.

Monthly active users dropped from a peak of 1,128 to just 150. Core developers shrank from 102 down to as low as 13, scoring only 8.8 on personnel retention (out of 100) — almost no one stayed. Meanwhile, vLLM in the same space has less than half of Ollama’s Stars but ranks #7 in health, with a maintenance score of 90.6 and four consecutive years of positive talent growth.

A 250K-Star project is less healthy than a 50K-Star one? This isn’t an outlier — it’s a pattern we found repeatedly across 74 projects. Star count measures “how many people passed by,” not “how many people stayed.”

All graph data storage, querying, and analysis in this study were powered by NeuG embedded graph database.

Why We Did This

How do you actually measure the health of an open source community? Stars, Forks, commit frequency — these cross-sectional counts describe “how much happened,” not “whether the community structure is healthy.” Some projects looked their most “prosperous” with great Star and commit numbers, monthly active users exceeding 2,000 — only to drop to single digits two years later. Cross-sectional counts never warned of these collapses.

Especially under the AI wave, hype makes judgment harder. When an LLM repository sees 60K events and tens of thousands of Stars in its first month, every cross-sectional metric screams “boom” — but three months later it may be down to double-digit activity. Hype creates the illusion that “everything is fine,” masking the fact that the collaboration network is unraveling.

We needed an evaluation method that cuts through the hype — not asking “how many events happened,” but asking “whether the collaboration network’s topology is healthy.”

So we did this: we modeled GitHub collaboration behavior as a time-evolving graph network and gave 74 representative open source projects a “health check” across four dimensions: maintainer burnout risk, newcomer onboarding path, community interaction quality, and personnel flow trends. Covering 10 tracks including AI/ML, programming languages, front-end, cloud-native, and big data, spanning 2021-2025 across 60 months. All graph data storage and analysis were powered by NeuG embedded graph database, whose high-performance analytics engine made bulk k-core decomposition, BFS shortest path, and clustering coefficient computation across thousands of monthly graph snapshots possible.

Five Findings You Might Not Expect

Finding 1: The Most-Starred Projects Are Often Not the Healthiest

Ollama (250K Stars, #68), Next.js (130K Stars, #62), React (230K Stars, #48) — these household names rank in the mid-to-lower tier for health. Instead, “quietly doing the work” projects like Elasticsearch (#4), vLLM (#7), and DuckDB (#37, perfect atmosphere score) perform best.

The key signal isn’t “how many people visited,” but “whether the collaboration network is tight and whether talent stays.”

Finding 2: LLM Model Repositories Score Universally Low

baichuan2 (19.6), deepseek-v3 (33.7), yi (28.5), chatglm3 (48.1) — the LLM track averages only 39.9, with 5 out of 6 projects scoring below 50, the lowest among all tracks.

Two notable patterns in the data:

1. “Peak at launch” exponential decay: baichuan2 started with 2,925 events in month one, dropping to just 43 after 12 months; yi went from 5,908 to 239 after 9 months; deepseek-v3 plummeted from a peak of 61,174 to 309 after 13 months. Tool projects like vLLM do the opposite — growing steadily from 3,574 to 12,614, getting more alive the more they’re used.

2. Near-total core team turnover: The initial core developers of all 6 LLM repositories almost entirely turned over. baichuan2 shrank from 41 core contributors to 2, chatglm3 from 20 to 2 — mostly new faces.

A possible explanation: LLM GitHub repositories are closer to “model launch pages” than long-term collaborative community projects — after the model ships, R&D focus naturally shifts elsewhere, and the real community ecosystem (fine-tuning discussions, weight distribution) happens more on platforms like HuggingFace. This also means tens of thousands of Stars and events in the first month reflect attention rather than community stickiness, and when making technology choices, you should be aware of the gap between hype and long-term community activity.

Finding 3: Maintainer Health and Personnel Flow Are the “Twin Pillars”

Across the 74 projects, the two dimensions most correlated with total score are maintainer health (r=0.817) and personnel flow (r=0.755), with newcomer friendliness close behind (r=0.578). Maintainers doing well → talent stays → community is healthy.

Finding 4: The Closer to Infrastructure, the More Cycle-Resistant the Community

The AI ecosystem shows a clear three-tier gradient — infrastructure frameworks (PyTorch/Transformers, avg 59.7) > application orchestration layer (LangChain/AutoGen, 53.2) > model release layer (39.9). Infrastructure has technical moats for protection; the orchestration layer is too easily replaceable, and once the hype fades, people scatter.

Finding 5: “False Prosperity” Has a Mathematical Signature — High Event Volume + Low Clustering Coefficient

LangChain peaked at 2,022 monthly active users and 14,796 monthly events — two years later, MAU dropped to 173 (↓91%). AutoGen’s change was even more dramatic, dropping from 328 to 2 users (↓99.4%), with the collaboration network’s clustering coefficient falling to 0.000. When the Star chart looks fine but the network clustering coefficient keeps declining, it means collaboration relationships are loosening and deserves attention.

Counter-Intuitive Cases: Three Comparisons You Didn’t See Coming

1. Ollama vs vLLM: Ice and Fire in the Same Track

Both projects are LLM inference tools — the former for individual users (local deployment), the latter for enterprises (high-throughput inference).

	Ollama	vLLM
GitHub Star	~250K	~50K
Overall Rank	#68 (36.8)	#7 (76.5)
Maintainer Health	34.0	90.6
Personnel Flow	8.8	84.5
Burnout Risk	High	Healthy
MAU Peak→Trough	1,128 → 150 (-87%)	Peak at 1,306
Core Developers	102 → 24 (once down to 13)	Stable 15-30
Talent Trend	2025 -9.6%, 2026 -83.3%	4 consecutive years of positive growth

What’s Ollama’s challenge? A personnel flow score of 8.8 indicates insufficient contributor retention. The rapid influx of +100% in 2023 turned into a massive outflow of -83.3% by 2026. A maintainer health score of 34.0 reflects significant core team contraction with high burnout risk.

In contrast, vLLM scores 90.6 on maintenance, maintains a stable core developer base of 15-30 people long-term, and has seen positive talent growth for 4 consecutive years (though the growth rate is slowing, it remains positive). This is a textbook case of “healthy sustainable growth.”

Takeaway: This is a noteworthy “hype vs health misalignment” phenomenon — Star count measures “how many people pay attention,” while health measures “how many people keep participating.” A project can have 250K Stars but only 24 core developers — the higher the attention, the easier this gap is to overlook. When making technology choices in the AI track, we recommend checking whether Star growth rate matches core contributor growth rate.

2. OpenClaw vs AutoGPT: After Lightning-Fast Rise, Landing or Crashing?

OpenClaw shot to #1 overall (92.4 points) with just 2 months of data. Let’s put it alongside another project that once had a “lightning-fast rise” — AutoGPT from 2023.

	OpenClaw (2026)	AutoGPT (2023)
Explosion Month Events	100,277	82,469
Explosion Month MAU	3,382	1,985
Explosion Month Core Devs	25	108
Next Month MAU	2,349 (-30.5%)	849 (-57%)

AutoGPT’s story is a successful “landing” case. After its April 2023 explosion (82K events), MAU dropped from 1,985 to 849 (-57%) the next month, and core developers plummeted from 108 to 14 — but it didn’t die. It stabilized at 3,000-6,000 monthly events in 2024-2025, with talent flow recovering from -33% bleeding to +9.4% and +22.4% positive growth. AutoGPT spent two years rebuilding a smaller but healthier community and currently ranks #22.

OpenClaw now stands at the same crossroads. February’s 100K events and 3,382 MAU represent the largest single-month surge among all 74 projects. And OpenClaw has a feature AutoGPT didn’t have: AI Agents deeply embedded in the development workflow. Bots like greptile-apps[bot] and openclaw-barnacle[bot] consistently rank among the top core contributors — a pattern that barely exists in traditional projects. However, OpenClaw’s March MAU dropped to 2,349, a 30.5% decline. Historically, AutoGPT experienced a 57% MAU drop after a similar-scale explosion; while OpenClaw’s decline is milder, it hasn’t escaped this pattern either.

The iron law of “easy to explode, hard to sustain” doesn’t change just because of AI Agents. Among the 74 projects, others with similar explosions include Stable Diffusion WebUI (October 2022 peak of 1,834 MAU → only 30 by end of 2025, near stagnation) and LangChain (September 2023 peak of 2,022 MAU → only 173 by 2025, running on low power). Three paths lie ahead:

Fate	Representative	Key Difference
Successful Landing	AutoGPT	Rebuilt core team after decline, talent returned to positive growth
Low-Power Operation	LangChain	Scale contracted 91%, but collaboration structure remains (clustering coefficient 0.15-0.27)
Near Stagnation	SD WebUI	Maintained by few, collaboration network becoming sparse

Can OpenClaw avoid SD WebUI’s fate and replicate AutoGPT’s “landing”? The key lies in whether it can convert the first-month influx of participants into stable core contributors over the next 6 months.

3. Next.js: Hot Front-End Framework, Atmosphere Dimension Needs Work

Dimension	Next.js	Contrast: DuckDB
Atmosphere Score	21.7 (#73)	100.0 (#1)
Toxicity Rate	Peak 17.68%	0.0%
Overall Rank	#62 (50.2)	#37 (64.5)

Vercel’s flagship project Next.js has 130K+ Stars, and its position in the front-end ecosystem needs no elaboration. But its community atmosphere score is only 21.7 (out of 100), ranking second-to-last among 74 projects.

The main reason is rising negative comment ratio. Using the ToxiCR model, we scanned 6.74 million comments on Next.js and found its toxicity rate rose from ~2% in mid-2022 to 15%, reaching 17.68% in April 2025 — well above the 5% healthy baseline.

By contrast, DuckDB’s toxicity rate has remained at 0.0% with a perfect atmosphere score. DuckDB has only 100-180 monthly active users, but the community has virtually no negative interactions. This is a “small but refined” technology-driven community — modest in scale, but exceptionally high in collaboration quality.

Takeaway: Community scale and community quality are two different dimensions. Large projects without effective community management may see negative interactions affect contributor willingness to participate, creating a vicious cycle.

Method at a Glance: Health Checks on Time-Series Collaboration Graphs

We built three types of monthly graphs from GitHub Archive’s full event data:

Actor-Actor Collaboration Graph: PR merges, code reviews, Issue discussion relationships between developers, used to identify core developers and calculate collaboration density
Actor-Repo Contribution Graph: Developer-to-repository contribution relationships, used to track cross-project talent flow
Actor-Discussion Participation Graph: Developer-to-Issue/PR participation relationships, used to evaluate community atmosphere

We extracted monthly time series from these graphs using topological metrics like weighted degree centrality, k-core decomposition, BFS shortest path, and clustering coefficient, then performed “long-term trend + fluctuation stability” two-layer analysis on the sequences to synthesize four dimension scores (each weighted 25%).

Dimension	What It Measures	Key Graph Metrics
Maintainer Burnout Risk	Whether core maintainers are declining	k-core decomposition, clustering coefficient, Bus Factor
Newcomer Friendliness	Whether newcomers can reach the core circle	BFS shortest path, core unreachable rate
Community Atmosphere	Communication quality and response efficiency	Toxicity detection (ToxiCR), response time, closure rate
Personnel Flow	Whether the talent pool is growing or shrinking	Annual net growth rate trend

Advice for Different Readers

Technology Evaluators: When adopting AI application-layer frameworks, don’t be blinded by Star count. Check Issue closure rate and core contributor count — if core maintainers are fewer than 3, or if there’s been net personnel outflow for 2 consecutive years, evaluate alternatives. Prioritize dependency on infrastructure layers (PyTorch, Elasticsearch) rather than orchestration layers (LangChain, AutoGen).

Community Managers: Shift from “attracting eyeballs” to “filtering core contributors.” Elasticsearch’s counter-cyclical growth proves that a mature project’s core competitiveness isn’t widening the funnel mouth, but optimizing the funnel neck — making capable contributors enter the core layer more smoothly. Watch personnel flow trends — sustained talent outflow over 2-3 years is an earlier warning signal than Star count decline.

Open Source Contributors: Prioritize joining projects with high network clustering coefficient and stable core layers (PyTorch, Rust, Kubernetes). The deep collaboration relationships you build in these projects have far stronger cycle-crossing capability than mastering any single application-layer framework.

This study is based on GitHub Archive full data from 2021-2025, covering 74 projects, 60 months, and 6.74 million comments. All graph data storage and analysis were powered by NeuG embedded graph database. For complete methodology, scoring criteria, and raw data, see the full report.

NeuG: One Engine, Two Modes, Outperforming Established Graph Databases

Sat, 11 Apr 2026 10:00:00 +0000

Graph databases have become essential for applications built on relationships — social networks, knowledge graphs, fraud detection, recommendation engines. But integrating a graph database into your application often means managing a separate server process, dealing with network latency, and wrestling with deployment complexity.

NeuG (pronounced “new-gee”) is a new graph database designed to change that. Following the same philosophy as DuckDB — but for graph data — NeuG is lightweight, minimal in dependencies, and easy to embed. It runs directly inside your application process as an embedded library, supports full Cypher queries, and delivers sub-millisecond latency on complex graph traversals. Currently available for Python (pip install neug), with Node.js and more language bindings on the roadmap.

In this article, we explore what NeuG can do today, how it compares with established graph databases, and why it might be the graph engine your next project needs.

1. Full Cypher Support, Ready Out of the Box

NeuG provides rich Cypher query language support, including variable-length path matching (*1..3), shortest path queries (*SHORTEST, ALL SHORTEST), subqueries (CALL { ... }), multi-label node matching (:POST:COMMENT), and more.

Here is a complete Python example showing the full workflow from schema creation, data import, to querying:

from neug.database import Database

db = Database("social.db")
conn = db.connect()

# Define graph schema
conn.execute("CREATE NODE TABLE Person (id INT64, name STRING, PRIMARY KEY(id))")
conn.execute("CREATE REL TABLE Knows (FROM Person TO Person, since DATE)")

# Insert data
conn.execute("CREATE (:Person {id: 1, name: 'Alice'})")
conn.execute("CREATE (:Person {id: 2, name: 'Bob'})")
conn.execute("CREATE (:Person {id: 3, name: 'Carol'})")
conn.execute("""
    MATCH (a:Person {id:1}), (b:Person {id:2})
    CREATE (a)-[:Knows {since: date('2024-01-15')}]->(b)
""")
conn.execute("""
    MATCH (b:Person {id:2}), (c:Person {id:3})
    CREATE (b)-[:Knows {since: date('2024-06-01')}]->(c)
""")

# Bulk import from CSV
conn.execute("COPY Person FROM 'persons.csv' (HEADER=true)")

# Query: find 2nd-degree friends
result = conn.execute("""
    MATCH (a:Person {name:'Alice'})-[:Knows*1..2]->(f:Person)
    RETURN DISTINCT f.name
""")
print(result.get_bolt_response())

conn.close()
db.close()

NeuG’s Cypher implementation is powered by GOpt’s unified intermediate representation (IR), making it also GQL-ready — when the ISO/GQL standard matures, migration cost will be minimal.

2. Dual-Mode Architecture and TP/AP Capabilities

Most graph databases force you to choose between a heavy client-server deployment and a limited embedded store. NeuG offers both in a single lightweight package — a design unique among graph databases.

Embedded Mode loads NeuG as a library directly into your process, just like SQLite or DuckDB. No daemon, no configuration files, no ports to manage — your graph database lives entirely inside your application. This mode is ideal for data science workflows, ML/AI pipelines, batch analytics, and rapid prototyping.

Service Mode switches from embedded with a single line of code:

uri = db.serve(host="localhost", port=10000, blocking=True)

NeuG then serves over HTTP/RPC with full MVCC transactions and multi-client concurrency, ready for web applications, real-time APIs, and multi-user collaboration.

In both modes, you use the same library, the same data, and the same queries. This seamless transition means you can develop and validate in embedded mode, then expose the same database as a production service — no data migration, no query rewriting.

On the capability side, NeuG covers both transactional (TP) and analytical (AP) workloads. ACID transactions and MVCC concurrency control ensure data consistency in service mode; the columnar execution engine with vectorized processing makes complex multi-hop traversals and subgraph matching equally efficient. NeuG also supports a Postgres/DuckDB-style extension model — load data source plugins on demand with INSTALL JSON; LOAD JSON; (supporting JSON, Parquet, S3/OSS), keeping the core engine lean.

3. Performance

NeuG’s dual-mode architecture lets us benchmark each mode against its natural competitor: embedded mode against LadybugDB on analytical query latency, and service mode against Neo4j on transactional query throughput.

Neo4j is the most widely deployed graph database, operating as a standalone server and the de facto standard for transactional graph workloads.
LadybugDB is an embeddable graph database built on Kùzu. Like NeuG, it runs in-process without a server, making it the closest architectural peer in the embedded graph space.

All tests use the LDBC SNB SF1 dataset (~3M nodes, ~17M edges) on an Apple Silicon Mac.

Embedded Mode: LSQB Benchmark (NeuG vs LadybugDB)

LSQB (Labelled Subgraph Query Benchmark) contains 9 complex subgraph matching queries that lean toward analytical workloads, making it an excellent test for embedded graph database query optimization and execution.

Note on KNOWS Edges: The original LSQB benchmark assumes KNOWS relationships are bidirectional (i.e., if A knows B, then B also knows A). In our tests, we modified all queries involving KNOWS edges to use directed traversal (-[:KNOWS]->). This adjustment allows the same LDBC SNB SF1 dataset to be used for both SNB Interactive and LSQB benchmarks, since the KNOWS relationships in the original LDBC SNB data are unidirectional. This modification does not affect the fairness of evaluating graph database query optimization and execution capabilities.

We compared NeuG (single-threaded) against LadybugDB (Kuzu-based, using the best multi-threaded result for each query):

NeuG wins 8 out of 9 queries with just a single thread. On Q3 (triangle pattern within the same country) it achieves a 287x speedup; on Q2 (two-hop with filtering) a 91x speedup. Even with LadybugDB using up to 8 threads, NeuG single-threaded significantly outperforms on most queries.

On complex analytical queries involving multi-hop joins, triangle patterns, and long paths, NeuG’s graph-native query optimizer demonstrates an overwhelming advantage. Notably, on Q9 — a simpler short-path traversal where multi-threading typically benefits competitors — NeuG still achieves a 1.7x speedup.

Service Mode: LDBC SNB Interactive Benchmark (NeuG vs Neo4j)

LDBC SNB Interactive contains 14 complex read queries (IC1–IC14) covering multi-hop friend finding, shortest paths, and aggregation — typical transactional graph workloads. We ran 4 concurrent clients for 300 seconds, randomly issuing these queries against both NeuG in service mode and Neo4j in server mode.

NeuG achieves 617 QPS — 50.6x the throughput of Neo4j. On latency, NeuG’s P50 is just 3.1ms with a P95 of 20.6ms, while Neo4j’s P95 reaches 1,728ms. Over 300 seconds NeuG successfully executed 185,156 queries with zero failures, demonstrating the stability of its service path under high concurrency.

Why Is NeuG So Fast?

NeuG’s performance advantage comes from three technical foundations.

First, NeuG is built on GraphScope Flex, the graph engine that set the LDBC SNB Interactive Benchmark world record at 80,000+ QPS. This is not a research prototype — it’s the same battle-tested engine proven at enterprise scale at Alibaba. NeuG inherits its optimized CSR storage for cache-friendly traversals, a columnar vectorized execution engine, and efficient memory management.

Second, unlike most graph databases that reuse relational optimizers, NeuG uses GOpt — a graph-native query optimizer. GOpt performs cost-based optimization with graph-specific cardinality estimation, supports pattern decomposition and worst-case optimal join planning, and reasons about graph topology to select efficient execution strategies. This is exactly why NeuG achieves 287x speedup on LSQB Q3 (triangle patterns) and 91x on Q2 (multi-hop filtering) — complex patterns that traditional optimizers struggle with.

Third, in embedded mode NeuG executes in-process with zero serialization overhead. For multi-hop traversal queries, this advantage compounds — each hop that would incur a network round-trip in a client-server model is instead a direct memory access.

Reproducibility

All performance results in this article are independently reproducible. We have published the following resources:

Dataset: The LDBC SNB SF1 dataset is available at https://neug.oss-cn-hangzhou.aliyuncs.com/datasets/ldbc-snb-sf1-lsqb.tar.gz (~282MB)
Benchmark and Tutorial: Benchmark Scripts are in https://github.com/alibaba/neug/tree/main/examples/, tutorial at https://graphscope.io/neug/en/tutorials/benchmark-neug-dual-mode/

Quick Reproduction

Embedded Mode (NeuG vs LadybugDB):

wget https://neug.oss-cn-hangzhou.aliyuncs.com/datasets/ldbc-snb-sf1-lsqb.tar.gz
tar -xzf ldbc-snb-sf1-lsqb.tar.gz
pip install neug real_ladybug
cd neug/examples/lsqb_benchmark
python run_neug_benchmark.py --data-dir social_network-sf1-CsvComposite-StringDateFormatter

Service Mode (NeuG vs Neo4j):

wget https://neug.oss-cn-hangzhou.aliyuncs.com/datasets/ldbc-snb-sf1-lsqb.tar.gz
tar -xzf ldbc-snb-sf1-lsqb.tar.gz
pip install neug neo4j
# Start Neo4j server (optional, for comparison)
docker run -d --name neo4j -p 7687:7687 -e NEO4J_AUTH=neo4j/neo4j123 neo4j:latest
cd neug/examples/ldbc_interactive_benchmark
python run_interactive_benchmark.py --data-dir social_network-sf1-CsvComposite-StringDateFormatter

If you encounter any issues reproducing these results, please report them on GitHub Issues.

What’s Next

NeuG is under active development. The roadmap includes Node.js bindings (AI Agent integration), graph algorithm extensions (Leiden community detection, PageRank, K-Core), data lake support (S3/OSS + Parquet + GraphAr), and vector database extensions for RAG & GraphRAG workflows.

If you’re building applications that need fast, embedded graph queries without the operational overhead of a database server, give NeuG a try.

GitHub: github.com/alibaba/neug Install: pip install neug

Dissecting Claude Code: A Graph-Powered Deep Dive into an AI Coding Agent's Architecture

Thu, 02 Apr 2026 00:00:00 +0000

We indexed 1,906 source files, 9,848 functions, and 22,678 call edges using CodeGraph — a code intelligence skill powered by NeuG, an open-source embedded graph database — to reveal what lies beneath the surface of one of the most ambitious AI coding tools ever built.

The Numbers at a Glance

Metric	Count
Source Files	1,906
Functions	9,848
Call Edges	22,678
Import Edges	15,743
Modules	306
Classes	128

Claude Code is a TypeScript monolith running on Bun + Ink (React for terminals). main.tsx alone exceeds 4,700 lines, and REPL.tsx tops 5,000 lines with 288 outgoing function calls — the highest fan-out of any component in the system.

1. Architecture: Five Layers, One Gravity Well

CodeGraph’s layer discovery reveals an onion-ring architecture with src/utils as the gravitational center. The five layers, from top to bottom:

Entry Points — REPL.tsx (288 calls, 236 imports), main.tsx (275 calls, 163 imports), and print.ts (164 calls, CLI pipeline). These are the three doors into Claude Code.

UI Layer — 131 files including the 113-file components/ directory, PromptInput (105 outgoing calls), the Buddy virtual pet system, and Ink’s custom terminal renderer (52 methods).

Logic Layer — 120 hooks, 80+ slash commands, 42 tools (from the 151-function BashTool to the 1-function SleepTool), and 14 keybinding files.

Service Layer — Analytics (428 callers), MCP protocol (41 files), Permissions (41 files), Bridge remote control (33 files), the Swarm multi-agent system, and 7 task types.

Foundation — The utils/ directory with 299 files absorbing 5,951 incoming calls — 6x more than the next highest module. It’s not a utility folder; it’s the bedrock of reality for this codebase.

The REPL: A Component That Imports the Universe

REPL.tsx imports from 236 files — more than any other file in the codebase. It connects to hooks, components, services, bridge, swarm, voice, cost-tracking, MCP, and keybindings. Yet it has 0 callers (fan-in = 0). It’s the root UI leaf that nobody calls into; it only calls out.

2. Bridge Functions: The Nervous System

A bridge function is one called from the most distinct modules — the glue holding disparate subsystems together. CodeGraph identified these top bridges:

Function	File	Modules Spanned	Total Callers
`logForDebugging`	debug.ts	90	911
`logEvent`	analytics/index.ts	94	428
`logError`	log.ts	77	413
`set`	fileStateCache.ts	64	290
`jsonStringify`	slowOperations.ts	63	248
`isEnvTruthy`	envUtils.ts	47	194
`getGlobalConfig`	config.ts	44	163
`getCwd`	cwd.ts	43	135

logForDebugging is called from 90 out of 306 modules — nearly 1 in 3 modules feeds debug info through this single function. It’s the all-seeing eye of Claude Code.

Notice the file name slowOperations.ts housing jsonStringify (248 callers, 63 modules). The team explicitly wraps JSON operations in a module named “slow” — a candid acknowledgment that serialization is a performance bottleneck they want to track.

The top 3 bridge functions alone account for 1,752 call edges across 261 module-boundaries. Removing any one would fracture the entire system.

3. The Permission Gateway: A Single Chokepoint

One of the most architecturally significant findings: every tool execution flows through a single function — checkPermissionsAndCallTool. It has only 1 caller (streamedCheckPermissionsAndCallTool) but calls 30 different functions spanning telemetry, permission rules, classifier checks, error handling, and analytics.

This is a textbook Policy Enforcement Point (PEP) — a single chokepoint where all 42 tools must pass through for authorization. The downstream functions include:

checkRuleBasedPermissions — evaluates the rule engine
startSpeculativeClassifierCheck — the “YOLO mode” classifier
resolveHookPermissionDecision — hook-based overrides
logEvent + logOTelEvent — dual telemetry (analytics + OpenTelemetry)
startToolSpan / endToolSpan — tracing boundaries
formatZodValidationError — schema validation

This architecture makes it trivially easy to add security policies, audit logging, and rate limiting without modifying any individual tool.

The 151:1 Complexity Ratio

Not all tools are created equal. BashTool has 151 functions (command parsing, sandboxing, permission verification, timeout management, output capture, cross-platform compatibility). SleepTool has 1 function. That’s a 151:1 complexity range across a unified interface.

Tool	Functions	Category
BashTool	151	Shell execution
PowerShellTool	101	Windows shell
AgentTool	91	Multi-agent orchestration
LSPTool	35	Language server
FileEditTool	31	File operations
GrepTool	8	Search
SleepTool	1	Wait

4. The Buddy System: A Gacha Game Inside a Dev Tool

Perhaps the most delightful discovery: Claude Code contains a complete virtual pet system with gacha mechanics, ASCII art animations, and RPG stats.

How Companion Generation Works

Hash the User ID with a salt (friend-2026-401) using hashString()
Seed a Mulberry32 PRNG with the hash value
Roll deterministically for species, rarity, eyes, hat, and stats
Cache the result (the buddy renders every 500ms, can’t re-roll each frame)

The “bones” (species, rarity, stats) are regenerated from hash on every read and never persisted to disk. Only the “soul” (user-chosen name and personality) is saved. This means:

Anti-cheat: Users can’t edit their config to fake a Legendary rarity
Schema-safe: Changes to the species array won’t break saved companions
Deterministic: Same user = same companion forever

The Hex Obfuscation Trick

All 18 species names are encoded as hex character codes to dodge a build-time string filter:

const duck = String.fromCharCode(0x64, 0x75, 0x63, 0x6b)  // "duck"

One species name collides with an internal model codename listed in excluded-strings.txt, so the team encoded ALL species uniformly.

Gacha Rarity Distribution

Rarity	Probability
Common	60%
Uncommon	25%
Rare	10%
Epic	4%
Legendary	1%

Each companion has 5 RPG stats: DEBUGGING, PATIENCE, CHAOS, WISDOM, SNARK. The ASCII sprites are 5 lines tall, 12 columns wide, with 2-3 frames for idle animation. Frame -1 in the 15-frame idle sequence triggers a rare blink animation.

Here’s an actual duck sprite from the codebase:

    __
  <(· )___
   (  ._>
    `--´

CodeGraph’s impact analysis traces the call chain: getCompanion() → CompanionSprite → PromptInput (105 calls) → REPL (288 calls) → Terminal UI.

5. Dream Task: The AI That Dreams While You Code

Claude Code has a background memory consolidation system that literally runs while you work — the AI “dreams” about past sessions to improve its memory.

The Four Gates (Cheapest First)

Before dreaming begins, four gates must pass — ordered by computational cost:

Time Gate (cost: 1 stat() call) — Has 24 hours elapsed since the last dream?
Scan Throttle (cost: memory timestamp check) — Has 10 minutes passed since the last scan?
Session Gate (cost: readdir) — Are there 5+ new sessions to consolidate?
Lock Gate (cost: flock) — Is the file-based mutex free?

This ordering ensures cheap rejections happen first. Most API turns exit at gate 1 with a single stat call.

The Four Phases of Dreaming

Once the gates pass, a restricted subagent forks with read-only Bash permissions (only ls, grep, cat, head, tail):

Orient — ls the memory directory to understand current state
Gather — grep recent session transcripts for patterns and reusable knowledge
Consolidate — Write/update memory files with distilled knowledge
Prune — Keep the memory index under 25KB, entries under 150 characters

The Filesystem Retry Trick

When a dream is killed mid-flight, it calls rollbackConsolidationLock(priorMtime) to rewind the lock file’s modification time. This makes the Time Gate pass again on the next API turn — a retry mechanism built into the filesystem itself.

6. Swarm: Multi-Agent Orchestration

The most architecturally ambitious module: a three-tier backend abstraction for spawning and managing teams of AI agents.

Backend Detection Priority Chain

When a teammate is spawned, the system probes the environment:

Priority	Backend	Lines	Detection Method
P1	TmuxBackend	765	`ORIGINAL_USER_TMUX` env var
P2	ITermBackend	370	`TERM_PROGRAM` + `it2 session list`
P3	InProcessBackend	340	Always available (fallback)

The PaneBackendExecutor adapter wraps tmux/iTerm2 into the same TeammateExecutor interface that InProcessBackend implements directly. This lets the system degrade gracefully: tmux → iTerm2 → in-process.

Why AsyncLocalStorage?

When agents are backgrounded (Ctrl+B), multiple agents run concurrently in the same Node.js process. AppState is a single shared mutable object. If Agent A’s telemetry reads it, it might get Agent B’s context.

AsyncLocalStorage isolates each async execution chain, giving each agent its own identity bubble — without requiring process isolation.

Task ID Design

Each task gets a unique ID: a single-letter prefix + 8 random base-36 characters:

b = bash, a = agent, r = remote, t = teammate, w = workflow, m = monitor, d = dream

This gives instant type identification from the ID string alone, with 36^8 ≈ 2.8 trillion combinations.

7. Cost Tracker: Follow the Money

The cost tracking system handles 7 pricing tiers with dynamic tier selection for Opus 4.6:

Model	Input/1M	Output/1M
Haiku 3.5	$0.80	$4
Sonnet	$3	$15
Opus 4.5	$5	$25
Opus 4/4.1	$15	$75
Opus 4.6 (fast)	$30	$150

getOpus46CostTier(fastMode) dynamically selects the tier based on response_speed — fast mode costs 2x.

The Billing Access Gate

The display logic reveals thoughtful product thinking:

Subscribers (Max/Pro): Costs are hidden (flat rate — showing costs would be confusing)
API-key users with admin/billing roles: Costs are displayed (they’re paying per-token)

This is controlled by hasConsoleBillingAccess(), which checks DISABLE_COST_WARNINGS env, subscriber status, auth tokens, and org/workspace roles.

What Gets Tracked

totalCostUSD — cumulative dollar amount
totalAPIDuration — time spent waiting for API responses
totalToolDuration — time spent executing tools
linesAdded / linesRemoved — code change volume
modelUsage — per-model token breakdown
fpsMetrics — terminal rendering performance

All of this persists to project config and survives session restarts via restoreCostStateForSession().

8. The Loneliest 1,265

CodeGraph found 1,265 isolated functions — functions with zero callers AND zero callees. Out of 9,848 total, that’s 12.8% sitting in complete isolation.

Where do they live?

File	Total Functions	Why Isolated
`bootstrap/state.ts`	212	Getters/setters called dynamically
`sessionStorage.ts`	153	Dynamic property access
`yoga-layout/index.ts`	134	FFI binding layer

bootstrap/state.ts alone has 212 functions — the densest file in the codebase. It’s a global state machine that vends getters and setters for everything from session IDs to model strings. Static analysis can’t see dynamic property access, so they appear “isolated” even though they’re heavily used at runtime.

9. The Class Hierarchy: OOP at the Boundaries

With 128 classes in a primarily functional codebase, where does OOP live?

Class	Methods	Purpose
`Node`	91	Yoga layout FFI binding
`Cursor`	59	Terminal cursor management
`Ink`	52	Custom React terminal renderer
`YogaLayoutNode`	50	Flexbox layout engine
`WebSocketTransport`	30	WebSocket client
`TmuxBackend`	20	Tmux pane management
`CCRClient`	24	API transport layer

The top 4 classes are all rendering infrastructure. The business logic (tools, permissions, cost tracking, buddy) is overwhelmingly functional — closures, hooks, and module-level state. Classes appear only at the boundaries: transport protocols, terminal abstractions, and system backends.

10. Patterns Worth Stealing

Deterministic Generation Without Persistence

The buddy system regenerates bones from hash(userId) every time. Only soul persists. This eliminates schema migration headaches AND prevents config-file cheating.

Gate Ordering by Cost

The dream task checks: in-memory flag → stat a file → list a directory → acquire a lock. Each gate is more expensive than the last, so cheap rejections happen first.

Single-Letter Task ID Prefixes

bash, agent, remote, teammate, workflow, monitor, dream — instant type identification from the ID string alone.

Honest Function Names

getFeatureValue_CACHED_MAY_BE_STALE, slowOperations.ts, writeFileSyncAndFlush_DEPRECATED — names that force callers to understand the tradeoffs.

AsyncLocalStorage for Concurrent Identity

When multiple agents share a process, don’t use global state for identity. Use AsyncLocalStorage to give each async execution chain its own context.

Single Permission Gateway

Route all 42 tools through one function. Adding a new security policy means changing one file, not 42.

Conclusion

Claude Code is not just an AI coding assistant — it’s a miniature operating system for AI agents. Through CodeGraph’s structural analysis, we’ve uncovered:

A utils foundation with 5,951 incoming calls acting as bedrock
A permission system where all 42 tools pass through a single gateway
A virtual pet with gacha mechanics and anti-cheat architecture
An AI that dreams about its memories during idle hours
A three-tier multi-agent orchestration system spanning tmux, iTerm2, and in-process
1,265 isolated functions (12.8%) sitting in silence

The graph never lies. Behind every great product is an architecture that tells its own story — you just need the right tools to read it.

Following 94 Million Relationships Down the Rabbit Hole

Tue, 24 Mar 2026 00:00:00 +0000

We ran an experiment: take a real question, point it at a massive academic knowledge graph, and keep asking follow-ups to see how deep LLM-driven analysis can actually go.

The question: “What impact has the Russia-Ukraine conflict had on Russian academia?”

The data comes from OpenAIRE, one of Europe’s largest open research data platforms: 22 million entities, 94 million relationships, spanning publications, authors, institutions, funding sources, and journals. Too large to feed into any LLM directly — but perfect for structured analysis.

Here’s how the investigation unfolded.

Question 1: Did publication output actually decline?

The obvious starting point. We matched Publication ↔ Organization patterns across the graph and aggregated by country and year to track Russia’s publication trend.

Year	Publications	YoY Change	Cumulative
2020	9,544	-	0%
2021	8,976	-6.0%	-6.0%
2022	8,548	-4.8%	-10.4%
2023	7,732	-9.5%	-19.0%
2024	4,686	-39.4%	-50.9%

A mild decline before the war, accelerating sharply after 2022, nearly halved by 2024.

But is this Russia-specific, or a global trend? We sliced the same analytical structure differently — global publication volume actually increased in 2022. Comparing 2021→2024: Russia dropped 47.8%, far exceeding the US and Germany at 27.6%, while China grew 14.8%.

This isn’t a global downturn. It’s a cliff — and it’s Russia’s alone.

Question 2: Did the funding dry up?

When publication output drops, the first instinct is to follow the money.

We extended our analysis by adding a Funding dimension on top of the existing structure — not starting over, but incrementally expanding what we’d already built.

The result: the European Commission (EC) had been a major funder of Russian research. After 2022, EC completely cut off funding for projects involving Russian institutions — while continuing to fund 26 Ukrainian institutions. This aligns perfectly with the EC’s official policy announcements.

The money did dry up. But does that explain everything?

Question 3: Are journals shutting the door too?

We pivoted to a different angle, adding a Publisher dimension to the structure from Question 1: which journals changed their stance toward Russian papers around 2022?

Finding: EDP Sciences dramatically reduced publications from Russian institutions after 2022. Further investigation revealed that EDP Sciences had publicly declared support for Ukraine. And crucially, their publication volume from other countries didn’t see a comparable drop — this wasn’t a general contraction, it was selective.

Funding was being cut. Publishing channels were narrowing.

Question 4: What happened to the collaboration network?

The previous questions looked at individual factors. We zoomed out to the big picture — querying Author(country=RU) → Publication → Organization to map how Russia’s collaboration with each country changed over time.

Partner Country	2021	2024	Change	Country Type
Turkey	988	16	-98.4%	Non-Western, no sanctions
Germany	656	46	-93.0%	Western, sanctions
USA	3,614	271	-92.5%	Western, sanctions
China	570	112	-80.4%	Non-Western, no sanctions

Wait — the steepest drop is Turkey?

-98.4%. A non-Western country. A country that imposed no sanctions on Russia. And yet the collaboration decline is worse than with the US or Germany.

This breaks a seemingly natural assumption: that the decline is primarily sanctions-driven. If sanctions were the main cause, Western countries should show the steepest drops. But the data tells a different story.

Sanctions are part of the picture, but the deeper shift is systemic: the entire international academic network is distancing itself from Russia. It’s not that certain countries stopped cooperating — Russia is being structurally marginalized from the global research fabric.

Why could we keep asking?

Looking back at these four steps, each finding naturally raised the next question. This “follow the thread” experience feels intuitive — analysis should work this way.

But if you’ve used traditional data analysis tools, you know reality is different. The typical workflow looks like this:

Ask question → get result → want to follow up → realize the data model doesn’t have that dimension → ask a data engineer to redesign the schema → rebuild the wide table → rerun the query → finally see the result → want to follow up again → redesign again → …

Every follow-up costs as much as the first question. Three questions means three full analysis cycles from scratch. So most analyses stop at the surface — not because people don’t want to go deeper, but because going deeper is prohibitively expensive.

Text-to-SQL and NL2BI tools solve the problem of “how to ask the first question more easily” — having an LLM translate natural language into queries. That’s genuinely useful, but it doesn’t solve the follow-up problem. The SQL gets generated, but the underlying data model is still rigid, intermediate results are still discarded, and every new question still requires full recomputation.

What we set out to solve is exactly this: make follow-up questions dramatically cheaper than the first one.

Step	Question	Operation	What was reused
1	Did output decline?	Match Pub↔Org on graph, aggregate by year+country	—
2	Did funding dry up?	Add Funding dimension	Pub-Org structure from Step 1
3	Are journals biased?	Add Publisher dimension	Pub-Org structure from Step 1
4	How much did collaborations drop?	Match Author→Pub→Org, aggregate by country	Underlying data structures

The analytical structure built in Step 1 was reused across all subsequent steps. Each follow-up wasn’t “start over” — it was “take one more step from where we left off.”

How it works under the hood

Keeping the follow-up chain going requires three things simultaneously: the analytical structure must be dynamically extensible, intermediate results must persist, and computation over massive data must be fast enough for interactive use. We tackle each with a specific design.

Hypergraph Data Model: a schema that grows with your questions

Traditional analytical data models (star/snowflake schemas) are designed upfront. You must decide on all dimensions before asking any questions — miss one, and you start over.

We designed a Hypergraph-based data model with a composable operator algebra:

Source: perform pattern matching on the raw data graph to produce a hypergraph — the starting point
Join: add new dimensions to an existing hypergraph — the ability to extend on follow-up
View: materialize intermediate results — the system’s “memory” of what’s been computed
DrillDown / RollUp / Slice / Dice: multidimensional slicing and aggregation on hypergraphs

These operators form a closed algebra: freely composable and chainable. This is what powers the “add a dimension and keep going” experience in the case study above — the schema isn’t predefined, it grows as your curiosity leads.

Unbiased Sampling: sub-second responses over 94 million relationships

Whether you can sustain a chain of follow-ups depends heavily on how long each step takes. If every question takes five minutes to run, you’ll lose patience by Step 3.

But the core operation behind these analyses is subgraph matching — exhaustively computing this over a large-scale graph is prohibitively expensive. Our approach: unbiased sampling. Instead of enumerating all matches, we uniformly sample a subset and use statistical methods to estimate aggregates. “Unbiased” means the estimates don’t systematically skew high or low.

For analytical tasks, this trade-off is remarkably effective: what matters is “did something change, which direction, who changed most” — not whether every number is exact to the last unit. In practice, COUNT estimates have an average error rate of just 0.27% — trend-level conclusions stay perfectly intact while computation cost drops by one to two orders of magnitude.

NeuG: co-located storage and computation

NeuG is our graph storage engine. The key design decision: sampling-based subgraph matching algorithms are implemented directly in the storage layer, so computation executes where the data lives — no need to shuttle data to a separate system.

Many systems don’t bottleneck on the operators themselves, but on data movement. NeuG’s principle: the best optimization is not moving data at all.

For the formal definitions of the hypergraph data model, operator design, and theoretical guarantees of the sampling algorithms, see our paper: A Hypergraph-Based Framework for Exploratory Business Intelligence.

Benchmarks

Beyond the real-world case study above, we ran systematic evaluations on LDBC Social Network Benchmark datasets (SF1, SF3, SF10) across 13 analytical queries of varying complexity.

On the largest scale, SF10 (~10GB), ExBI was the only system to complete all 13 queries. Neo4j completed 7; MySQL completed 2.

Performance:

vs. Neo4j: 16.21x average speedup, up to 146.25x
vs. MySQL: 46.67x average speedup, up to 230.53x

Speed doesn’t come at the cost of accuracy — COUNT average error rate is 0.27%, MAX error is near zero.

For the full experimental design and analysis, see the paper.

Try it yourself

We integrated this analytical capability into OpenClaw by adding NeuG-related data loading and analysis skills. The case study above was conducted entirely in this environment.

NeuG: https://github.com/alibaba/neug
NeuG data loading and analysis skills: https://github.com/Louyk14/neug/tree/main/skills
Docker (code + OpenAIRE dataset, ready to use): https://hub.docker.com/r/shunyangli/neugbi
Paper: A Hypergraph-Based Framework for Exploratory Business Intelligence
Full analysis report: Impact of the Russia-Ukraine War on Russian Academia

We X-Rayed OpenClaw with a Graph Database — Here's the Technical Debt We Found

Thu, 19 Mar 2026 00:00:00 +0000

When a codebase grows to 21,000 functions connected by 35,000 call edges, code review alone isn’t enough. We ran OpenClaw through NeuG’s graph database and surfaced structural problems that traditional tools can’t see.

Why a Graph Perspective?

OpenClaw is a popular AI gateway project — 18,000+ commits, 6,000+ TypeScript files, and 21,000+ functions, with new versions shipping almost daily. In our previous post, we looked at OpenClaw from the user’s perspective and found that the heartbeat mechanism was consuming far more tokens than anyone expected.

This time, we go deeper. From a developer’s perspective: what structural debt has fast iteration quietly accumulated?

Traditional code review and static analysis tools catch a lot, but they struggle to answer structural questions:

Which functions are “time bombs” — where a bug would cause maximum blast radius?
Which code is dead weight, confusing new contributors who think it’s still active?
Which module is the system’s structural hub — touch it, and half the codebase shakes?

The answers to these questions live in relationships, not in lines of code.

The Approach: Turn the Codebase into a Graph

We built a code knowledge graph using NeuG (graph database) and zvec (vector database):

Nodes: functions, classes, modules, files
Edges: call relationships, import dependencies, commit history

Here’s what the graph looks like at scale:

Type	Count
Source files	6,062
Functions	21,057
Call edges	35,761
Import edges	25,883
Classes	233
Modules	318

21,000 functions connected by 35,000 call edges. Once the graph is built, hard structural questions become Cypher queries.

The full analysis pipeline is packaged as CodeGraph Skill — open source and fully reproducible.

Finding 1: High-Risk Functions

Defining Risk Score

We define a risk score for each function to quantify its potential blast radius:

Risk Score = fan_in × fan_out

fan-in: how many functions call it (dependency pressure)
fan-out: how many functions it calls (dependency complexity)

The product represents the blast radius — how much of the system breaks if this function has a bug.

Detecting fan-out is straightforward; most IDE tools can do it. But fan-in requires a full-codebase scan — exactly where a graph database shines. This is one of the built-in capabilities of our CodeGraph Skill: it ships with a hotspots() method that handles the full-graph traversal for you, so finding the highest-risk functions across the entire codebase is a single call:

functions = cs.hotspots(topk=30)

Top 5 by risk score:

Function	File	fan-in	fan-out	Risk Score
startGatewayServer	src/gateway/server.impl.ts	10	103	1030
createConfigIO	src/config/io.ts	18	56	1008
runEmbeddedPiAgent	src/agents/pi-embedded-runner/run.ts	14	67	938
loadOpenClawPlugins	src/plugins/loader.ts	20	36	720
runCronIsolatedAgentTurn	src/cron/isolated-agent/run.ts	11	60	660

These functions sit at the core of critical execution paths. startGatewayServer bootstraps the entire service; createConfigIO generates all configuration; runEmbeddedPiAgent orchestrates embedded Pi Agent execution.

Here’s an interesting observation about createConfigIO: configuration parsing rarely gets prioritized for test coverage — it “seems safe.” But with 18 callers and 56 outgoing calls, by structural risk metrics it’s one of the most dangerous functions in the entire codebase.

Filtering for the Real High-Risk Core

We wrote a custom Cypher query to filter out low-level utility functions (high fan-in, low fan-out) and focus on functions that are simultaneously heavily depended upon and highly complex:

MATCH (f:Function)-[:CALLS]->(g:Function)
WITH f, COUNT(DISTINCT g) as fan_out
MATCH (f)<-[:CALLS]-(h:Function)
WITH f, fan_out, COUNT(DISTINCT h) as fan_in
WHERE fan_in > 10 AND fan_out > 10
RETURN f.name, fan_in, fan_out
LIMIT 100

The result was surprising: only 17 functions in the entire codebase meet both criteria. Every one of them is a critical, high-risk node:

Function	File	fan-in	fan-out	Risk Score
startGatewayServer	src/gateway/server.impl.ts	10	103	1030
createConfigIO	src/config/io.ts	18	56	1008
runEmbeddedPiAgent	src/agents/pi-embedded-runner/run.ts	14	67	938
loadOpenClawPlugins	src/plugins/loader.ts	20	36	720
runCronIsolatedAgentTurn	src/cron/isolated-agent/run.ts	11	60	660
getReplyFromConfig	src/auto-reply/reply/get-reply.ts	20	24	480
runMessageAction	src/infra/outbound/message-action-runner.ts	21	22	462
loadPluginManifestRegistry	src/plugins/manifest-registry.ts	22	16	352
createOpenClawTools	src/agents/openclaw-tools.ts	10	24	240
fetchWithSsrFGuard	src/infra/net/fetch-guard.ts	24	10	240
resolveCommandSecretRefsViaGateway	src/cli/command-secret-gateway.ts	15	14	210
fetchRemoteMedia	src/media/fetch.ts	18	11	198
loadModelCatalog	src/agents/model-catalog.ts	18	11	198
resolveApiKeyForProvider	src/agents/model-auth.ts	13	15	195
start	src/gateway/client.ts	12	13	156
resolveAuthProfileOrder	src/agents/auth-profiles/order.ts	11	11	121
createClackPrompter	src/wizard/clack-prompter.ts	10	11	110

These 17 functions are the structural pressure points of the system. Any change to them deserves extra scrutiny.

Finding Zombie Functions

We plotted the fan-in distribution across all functions as a histogram. NeuG’s Python SDK makes this straightforward — query results flow directly into pandas or matplotlib. The distribution reveals that most functions have fan-in and fan-out values below 2, meaning OpenClaw’s code is largely linear: most functions sit in a simple “one-in, one-out” position in the call chain.

What stands out: over 20% of all functions have fan-in = 0 — they are never called by anything. After filtering out legitimate entry points and framework hooks, more than 2,000 functions remain that are true zombie code.

Here’s a concrete example. assertPublicHostname in src/infra/net/ssrf.ts has zero callers in the current codebase. A git bisect reveals:

Commit 5bd550: the function is introduced and actively used
Commit b62355: its logic is migrated to resolvePinnedHostname
The old function is never deleted — likely out of caution — and silently becomes a zombie

The real harm isn’t the wasted lines. It’s what happens next: a new contributor finds the function, assumes it’s still relevant, builds something on top of it, and then spends hours debugging a dead code path.

In a project with OpenClaw’s pace of change, a 20% zombie function rate is a significant cognitive tax on everyone who reads the code.

Finding 2: The Over-Coupled Module

We analyzed the 20 most recent bug reports. Of 57 root-cause function candidates, 24 — 42% — pointed to src/agents.

To understand why, we ran a Cypher query to inspect agents’ call relationships with the rest of the system:

MATCH (m1:Module {name: 'agents'})<-[:BELONGS_TO]-(f1:File)-[:DEFINES_FUNC]->(func1:Function)
MATCH (func1)-[:CALLS]->(func2:Function)<-[:DEFINES_FUNC]-(f2:File)-[:BELONGS_TO]->(m2:Module)
WHERE m2.name <> 'agents'
RETURN m2.name, count(*) as call_count
ORDER BY call_count DESC
LIMIT 10

The numbers below show bidirectional call counts between modules (a→b = calls from module a into module b):

agents <-> reply              a→b=19    b→a=117
agents <-> infra              a→b=108   b→a=15
agents <-> pi-embedded-runner a→b=19    b→a=88
agents <-> tools              a→b=38    b→a=55
agents <-> plugins            a→b=45    b→a=40
agents <-> gateway            a→b=9     b→a=60
agents <-> src                a→b=35    b→a=31
agents <-> models             a→b=1     b→a=63
agents <-> sessions           a→b=46    b→a=5
agents <-> auth-profiles      a→b=24    b→a=12

In total, 30+ modules have bidirectional dependencies with agents. The reply module calls into agents 117 times; pi-embedded-runner calls it 88 times; models calls it 63 times. The vast majority of the system’s business logic flows through this one module.

We visualized the reply->agents call relationships using neug-ui:

agents is the structural hub of the system. A single-line change there can ripple across half the codebase. The fact that 42% of bugs trace back to this module isn’t bad luck — it’s an architectural consequence.

What Graph Analysis Reveals That Other Tools Don’t

Code review catches function-level issues. Static analysis handles syntax and import-level concerns. Graph analysis fills the gap between them — cross-file, cross-module structural problems:

Traditional tools can tell you:

Which line has a syntax error
Which function has high cyclomatic complexity

NeuG can tell you:

How many functions are affected by a single change
Which code is dead but still confusing contributors
Which module is the structural load-bearing wall of the system

When a codebase outgrows what human reviewers can hold in their heads, structural problems can only be found through structural queries.

Three Recommendations for OpenClaw

Based on this analysis, the three highest-leverage improvements:

Decouple src/agents: With 30+ modules in bidirectional dependency, start by defining clear interface boundaries between agents and its heaviest callers (reply, pi-embedded-runner, models). Even partial decoupling would significantly reduce blast radius.
Prioritize test coverage for high-risk functions: Functions like createConfigIO and startGatewayServer are structurally critical but likely undertested. A bug there propagates everywhere. Cover them first.
Clean up zombie functions: 2,000+ dead functions are actively misleading contributors. Running a graph-based dead code analysis and removing confirmed zombies is a high-ROI cleanup with low risk.

Analysis powered by NeuG graph database and CodeScope code analysis engine. Full index completed in approximately 4 minutes.

NeuG v0.1.3: The Era of Graph Computing Inside the Graph Database

What NeuG Has Always Been Missing: Graph Computing

GDS: 9 Algorithms, One API

Algorithm Overview

Unified Calling Convention

Performance: LDBC Graphalytics Benchmark

Temporary Graphs: Load External Data, Compute, Don’t Pollute

In Practice: Discovering New Concepts from Code Changes

Scenario

Code Graph and Delta Data

Analysis Flow

Analysis Conclusions

Also Worth Noting

Try It

How does AI affect U.S. Employment? Replicating Harvard Business School Research with NeuGBI

Where Current LLM-Based Analytics Fall Short

NeuGBI’s Query Logic

1. U.S. Employment Before and After AI

1.1 Impact on Different Industries

1.2 State-Level Breakdown

2. AI’s Impact on Junior vs. Senior Positions

2.1 Drilling Deeper: Which Seniority Level Declined More in Software Development?

Conclusion

References

NeuG v0.1.2: Your Graph Database Is No Longer a Data Island

The Biggest Problem with Graph Databases Isn’t Performance

The Design Shift in v0.1.2

Feature 1: Direct Cloud Storage Read

Why “No Download” Matters More Than “Fast Download”

Usage

Feature 2: Schema-on-Read

Graph Databases Finally Catch Up to the Data Lake’s Core Principle

Feature 3: Parquet Export, Direct Write-Back to Cloud

Bidirectional: Data Comes In, and It Goes Back Out

Full Demo: Cloud to Data Lake, End to End

Also Worth Noting

Try It

Feeding Code Repositories into LLM Wiki: 6x Token Savings in One Knowledge Build

1. Repository-Aware Knowledge Management

1.1 Three-Layer Directory Structure

raw Directory

wiki Directory

schema Directory

1.2 Knowledge Ingestion

2. Aligning Concepts and Modules

2.1 Concept Alignment

2.2 Module Alignment

2.3 Experimental Comparison

3. Team Collaboration

3.1 Cross-Project Sharing

3.2 Cross-User Sharing

3.3 Linter and CI Checks

4. Case Study: 6x Token Savings on a Single Query

4.1 Data Preparation

4.2 Response Quality

4.3 Resource Overhead

5. Conclusion and Outlook

You and Your LLM Can't Review Thousands of PRs at Once — But a Graph Database Can

The PR Review Bottleneck: Why Neither You Nor Your LLM Can Keep Up

How CodeGraph Works: From Repository to Knowledge Graph

146 PRs in QwenLM/qwen-code: A Real-World Analysis

Per-PR Risk Scoring

Cross-PR Conflict Detection

The Three Categories

Pushing Results to GitHub

Why Graph, Not LLM?

Takeaways

Try It on Your Own Repository

Star Counts Lied to Us: We Gave 74 Open Source Projects a Graph Database Health Check

Why We Did This

Five Findings You Might Not Expect

Finding 1: The Most-Starred Projects Are Often Not the Healthiest

Finding 2: LLM Model Repositories Score Universally Low

Finding 3: Maintainer Health and Personnel Flow Are the “Twin Pillars”

Finding 4: The Closer to Infrastructure, the More Cycle-Resistant the Community

Finding 5: “False Prosperity” Has a Mathematical Signature — High Event Volume + Low Clustering Coefficient

Counter-Intuitive Cases: Three Comparisons You Didn’t See Coming

1. Ollama vs vLLM: Ice and Fire in the Same Track

2. OpenClaw vs AutoGPT: After Lightning-Fast Rise, Landing or Crashing?

3. Next.js: Hot Front-End Framework, Atmosphere Dimension Needs Work

`raw` Directory

`wiki` Directory

`schema` Directory