From Files to Foundations
The Filesystem-First Revolution in Agentic AI — and Releasing LIOTHIL v2
Independent teams at Vercel, LlamaIndex, Anthropic, and across the open-source community are arriving at the same architectural conclusion, apparently without coordinating: the filesystem is the right universal interface between AI agents and the world.
This is not a framework announcement. It is a design philosophy rediscovered. The Unix operating system solved the same problem fifty years ago — too many special-purpose interfaces, each with its own protocol — by collapsing everything into files accessible through open, read, write, close. That abstraction survived decades of hardware evolution. Now it is being reapplied to AI agents, which face an analogous explosion of tools, APIs, memory systems, and retrieval mechanisms.
The evidence is specific:
Vercel removed 80% of their agent’s custom tools and replaced them with bash + filesystem access. Execution time dropped from 274.8 seconds to 77.4 seconds. Success rate went from 80% to 100%. Token usage fell 37%.
LlamaIndex published “Files Are All You Need,” identifying three patterns — context storage, external retrieval, and skills — all converging on file-based abstractions.
Andrej Karpathy published the LLM Wiki pattern: a persistent, file-based knowledge system where LLMs maintain markdown wikis instead of re-retrieving from raw sources.
Deepak Babu Piskala formalized the historical throughline in an arXiv paper tracing Unix’s “everything is a file” to agentic AI’s “files are all you need.”
Meanwhile, production implementations are appearing: just-bash (Vercel’s virtual filesystem for agents), TigerFS (Timescale’s PostgreSQL-as-filesystem), and GitNexus (code intelligence exposed through filesystem abstractions). Virtual filesystems are being built at Mintlify and elsewhere to give agents structured access to documentation, databases, and the open web.
What follows is a synthesis of all of this research — where it came from, what it means, and what we built on top of it.
The Core Thesis
The central claim: agents perform better when their primary interface to the world is a filesystem rather than a collection of specialized tools.
This runs against instinct. The conventional approach involves defining dozens or hundreds of discrete tools — one for database queries, another for API calls, another for file operations, another for search. Each tool carries its own schema, its own error handling, and its own footprint in the agent’s context window. The agent must learn when to use which tool, how to chain them, and how to recover from failures at each junction.
The filesystem-first alternative inverts this. Instead of teaching the agent fifty tools, you teach it six operations: ls, cat, grep, find, echo >, and bash. Then you expose everything the agent needs — data, APIs, configuration, documentation, memory — as files in a navigable directory tree. The agent explores, reads, writes, and executes. The filesystem becomes a uniform interface that hides the heterogeneity of the underlying systems.
Three properties make this work.
LLMs already understand filesystems. Every coding model has been trained on millions of examples of filesystem operations. cat README.md, grep -r "error" src/, echo "result" > output.json — these are among the most frequently occurring patterns in training data. An LLM does not need to learn a novel tool schema to use a filesystem. It already knows how.
Filesystems are composable. Unix pipes, redirects, and shell scripting let agents chain operations without any framework-level orchestration. grep "TODO" *.md | wc -l is a two-tool pipeline that requires zero framework code. The shell is the orchestration layer, and agents are already fluent in it.
Filesystems are self-documenting. An agent can ls a directory to discover what’s available, cat a file to understand its contents, and find to locate what it needs. There is no separate “tool discovery” mechanism required — the filesystem is the discovery mechanism.
The Unix Lineage
The intellectual foundation is documented in Deepak Babu Piskala’s January 2026 paper, “From ‘Everything is a File’ to ‘Files Are All You Need’” (arXiv:2601.11672).
In the early 1970s at Bell Labs, Ken Thompson and Dennis Ritchie represented everything — devices, processes, network connections, data — as files. A hard drive and a terminal would both respond to the same operations: open, read, write, close. Three properties made the abstraction endure. Composability: because programs shared the file interface, outputs could feed directly into other programs’ inputs. Eliminability: developers who learned how files worked could apply that knowledge broadly, and new devices could be added without changing the fundamental model. Persistence: the core abstraction survived decades of hardware evolution because it operated at the right level of generality.
Piskala traces a critical intermediate step: the DevOps era’s “everything as code” movement. Infrastructure definitions moved from console configuration to declarative Terraform files. Deployment workflows moved from runbooks to YAML in GitHub Actions. The pattern: replace heterogeneous, special-purpose interfaces with text-based representations that can be versioned, diffed, tested, and composed. In every domain, the text file won.
The paper then identifies three patterns through which the file abstraction is returning for AI systems: conversation history storage (files as persistent memory across sessions), context retrieval via files (filesystem traversal outperforming naive semantic search on small-to-medium collections), and skills replacing tools (markdown files describing capabilities, loaded on demand rather than baked into programmatic schemas).
As the paper puts it:
“The agent really only has access to a filesystem and ~5-10 tools: CLI, text editor, code interpreter, web fetch — and this is fast as general. A lot more general, that is, than an agent with 100+ MCP tools.”
The Evidence
Vercel: “We Removed 80% of Our Agent’s Tools”
Vercel’s data query agent d0 tells the story in four numbers:
Execution time: 274.8s to 77.4s (3.5x faster)
Success rate: 80% to 100%
Token usage: ~102,000 to ~61,000 (37% reduction)
Steps to complete: ~12 to ~7 (42% fewer)
The original architecture relied on heavy prompt engineering, hand-coded retrieval systems, and specialized tools for schema lookup, query validation, error recovery, and result formatting. Each edge case required patches, and model updates necessitated recalibration. They were “solving problems the model could handle on its own.”
The replacement gave Claude direct access to their semantic layer through bash commands — grep, cat, ls, find. The data layer was already structured as YAML, Markdown, and JSON files. They let the agent read them directly.
Every metric improved. The authors attribute this to “addition by subtraction”: every custom tool represents a choice made for the model, sometimes suboptimally. Removing the intermediary tools eliminated a layer of accidental complexity.
Their recommendation: start with minimal architecture and add complexity only when necessary. Investment in documentation and clear data structure outweighs clever tooling.
Source: Vercel Blog — “We Removed 80% of Our Agent’s Tools”
LlamaIndex: “Files Are All You Need”
Jerry Liu’s January 2026 post identifies three use cases where files have emerged as the dominant abstraction. Long-running conversation history: Claude’s CLAUDE.md pattern, Cursor’s compacted chat files, Dex Horthy’s structured workflow writing research.md and plan.md files that subsequent phases consume. External context retrieval: filesystem tools combining with semantic search to “dynamically traverse context for complex queries,” with file-based search alone outperforming naive semantic search on small-to-medium collections. Skills as files: Anthropic’s October 2025 introduction of “skills” — markdown files that teach agents capabilities on demand, versionable and diffable because they are just files. Simon Willison observed that skills may eventually replace MCP entirely as a capability distribution mechanism.
Source: LlamaIndex Blog — “Files Are All You Need”
Andrej Karpathy: The LLM Wiki Pattern
Karpathy approaches the filesystem pattern from a different angle: persistent, LLM-maintained knowledge bases that compile and synthesize information rather than re-retrieving raw sources on every query.
The LLM Wiki has a three-layer architecture:
Raw sources (immutable): Articles, papers, images, data files. The ground truth.
The wiki (dynamic): LLM-generated markdown pages with summaries, entity pages, and cross-references. The compiled knowledge.
The schema (configuration): A document specifying how the wiki should be structured and what workflows govern it.
Three operations maintain the wiki. Ingest processes new sources by reading them, writing summaries, updating relevant pages, and appending to an operation log. Query searches wiki pages, synthesizes answers with citations, and optionally files valuable analyses back as new pages. Lint health-checks the wiki for contradictions, stale claims, orphaned pages, missing cross-references, and knowledge gaps.
The key insight: the tedious maintenance work — updating cross-references, noting contradictions, maintaining consistency — becomes computationally cheap for LLMs. Humans curate sources and ask questions. LLMs handle the bookkeeping that causes traditional knowledge bases to stagnate.
The entire system is files. Markdown for content, a log file for history, an index for navigation. Standard tools — Obsidian for visualization, Git for version control, BM25/vector search for retrieval — all work out of the box because the substrate is the filesystem.
Source: Karpathy — LLM Wiki Pattern
Nine Months In: The Production Case Study
The preceding sources describe the pattern in theory and controlled benchmarks. The Falco research environment demonstrates it under sustained, adversarial production load — and it is the reason LIOTHIL exists.
Falco uses Claude Code (Anthropic’s CLI agent) with a filesystem-first architecture spanning nine months, 60+ conversations, and three concurrent research streams: computational analysis of a 24-item corpus, a temporal navigation system integrating five cultural frameworks, and sacred language scholarship including a full-length book manuscript. The entire agent infrastructure is files. CLAUDE.md provides agent identity and domain knowledge at session start. .claude/rules/ contains 14 domain rule files that Claude Code auto-loads alongside it — roughly 3,700 lines of combined context injecting specialized knowledge: transliteration protocols, evidence grading criteria, editorial watchlists, tradition-specific attribution tables. .claude/skills/ holds 12 markdown skill files teaching repeatable workflows. .claude/agents/ defines 24 specialized agent roles, each a markdown file describing responsibilities and available tools. registers.json serves as the complete interpretive database — the filesystem-as-database pattern at its most literal.
The output: 252 verified analytical values, 36 tier-0 correspondences, 8 tracked structural patterns across the full corpus, all produced through an agent that navigates its world via ls, cat, grep, and file writes. No vector database. No custom retrieval pipeline.
What makes this a case study rather than a demo is that the architecture was tested by failure. Over nine months, the environment survived a complete specification rewrite (the temporal navigation system’s core architecture was rebuilt from scratch), three major analytical pivots (including the discovery that the entire analytical framework needed to account for a breathing instruction encoded in the source material), and a full codebase rebuild where every engine module was rewritten — all without restructuring the agent’s interface. The files changed. The directory conventions changed. The agent’s relationship to the filesystem did not. A new conversation reads CLAUDE.md, discovers rules/ and skills/, and has full operational context in seconds. No embedding recomputation. No API calls. No warm-up period.
The pattern also scaled in a dimension that benchmarks do not capture: institutional memory. Sixty conversations worth of findings, corrections, failure modes, and hard-won operational lessons accumulated in files — session notes, a persistent memory document, a volatile status file, editorial watchlists tracking known error patterns. Each new session inherited everything the previous sessions had learned. The agent did not start fresh. It read what came before.
In February 2026, after seven months of building and validating this architecture, it became clear that the patterns were general. The directory structure, the file conventions, the layered knowledge injection system — none of it was specific to the research domain. The domain content was specific. The scaffold was universal. That realization became LIOTHIL.
LIOTHIL: From Pattern to Scaffold
LIOTHIL is an open-source tool that generates filesystem-first research environments for Claude Code. It takes the architecture described above — the one that survived nine months of production use — and makes it reproducible in minutes.
The tool works through an interactive interview. You answer questions about your project’s domain, your research methodology, your epistemic standards, your team’s conventions, and what your agent needs to know. LIOTHIL takes those answers and generates a complete environment: identity file, rule files, agent definitions, skill templates, directory structure, memory infrastructure, session state management. The output is not a framework you import. It is not a dependency you maintain. It is files on disk — the same files Claude Code already knows how to read. You own every generated file. Edit them, delete the ones that do not apply, add your own.
LIOTHIL v1 captured approximately 80% of what the Falco environment proved necessary. It generated the knowledge architecture: the CLAUDE.md identity with epistemic standards and domain knowledge; .claude/rules/ files encoding domain-specific protocols; .claude/agents/ definitions for specialized sub-agents; .claude/skills/ templates for repeatable workflows; and the directory structure for sources, results, and working files. This was the structural layer — what the agent knows and how it operates.
What v1 missed was the runtime infrastructure — the operational machinery that keeps a long-running environment functional across dozens of sessions. The Falco environment had evolved this layer organically over months. It was not part of the initial design. It emerged from necessity: sessions that ran out of context before checkpointing, institutional knowledge that drifted because there was no canonical location for it, security patterns that were learned the hard way after a credentials file nearly made it into version control.
LIOTHIL v2, released today, completes the pattern. It adds four operational components that v1 lacked.
memory/MEMORY.md provides persistent cross-session memory with typed entries. Each entry carries a category — user preference, feedback, project state, reference — and a date stamp. The file has a soft capacity limit (150 lines) and a hard limit (175 lines), with extraction protocols for when accumulated knowledge needs to be moved to sub-files. This is the institutional memory layer: the agent reads it at session start and knows what happened in every previous session. The typing system prevents the file from becoming an unstructured dump.
STATUS.md provides volatile session state — what workstream is active, what was accomplished last session, what comes next, what is blocked. Where MEMORY.md is permanent and grows over the life of the project, STATUS.md is rewritten at the end of every session. It is the handover note between your current self and your next self, mediated through the agent.
.claude/settings.local.json configures project-level settings including a statusline that displays your agent’s identity name in the Claude Code interface. This is a small thing that matters more than it should: it anchors every session. The agent is not a generic assistant. It is the configured intelligence for this specific project, and the name in the statusline says so.
A session checkpoint skill automates the end-of-session protocol — capturing findings to disk, rewriting STATUS.md, and generating a handover block — before context exhaustion destroys unwritten work. This was one of the hardest operational lessons from the Falco environment: context windows are finite, and an agent that runs out of context mid-task loses everything it has not written to a file. The checkpoint skill makes the save point explicit and repeatable.
V2 also generates an enhanced .gitignore with patterns for secrets, credentials, and environment files — the security layer that prevents the kind of near-miss that the Falco environment experienced with credentials early in its history.
The philosophical point is worth stating plainly: the scaffold builder itself embodies the filesystem-first pattern. LIOTHIL does not install a runtime. It does not start a server. It does not configure a database. It generates files. Markdown files, JSON files, directory trees. The output is the architecture. You can read every file it creates, edit any of them, delete the ones you do not need, and add your own. The tool gets out of the way because the architecture is the filesystem, and the filesystem is already there.
Virtual Filesystems in Production
Two practitioners are pushing the abstraction further, mounting external services as navigable file trees.
Dens Sumesh at Mintlify built a virtual filesystem for Mintlify’s AI assistant that exposes documentation, API references, and other knowledge sources as files. The agent navigates documentation the same way it navigates code — through ls, cat, and grep. No special “documentation search” tool is needed; the filesystem is the search interface.
Arlan is working on turning the entire web into a filesystem — mounting web pages, APIs, and data sources as files that agents can read and traverse using standard Unix tools. The ambition is to eliminate the boundary between “local files” and “remote resources” from the agent’s perspective.
Both approaches embody the same principle: if you can mount it, you don’t need a tool for it.
The Architectural Pattern
The Universal Interface
Synthesizing across all of these sources, the filesystem-first agent architecture has a clear shape:
+--------------------------------------------------+
| AGENT (LLM) |
| Knows: ls, cat, grep, find, echo >, bash |
+--------------------------------------------------+
|
read / write / exec
|
+--------------------------------------------------+
| VIRTUAL FILESYSTEM |
| |
| /context/ - conversation history, memory |
| /data/ - databases, APIs (mounted) |
| /docs/ - documentation, specifications |
| /skills/ - capability definitions (md) |
| /workspace/ - agent's working directory |
| /tools/ - scripts, utilities |
| /output/ - results, artifacts |
+--------------------------------------------------+
|
mount / bridge / proxy
|
+--------------------------------------------------+
| UNDERLYING SYSTEMS |
| PostgreSQL | REST APIs | Git repos | Web | |
| Cloud storage | MCP servers | Local files |
+--------------------------------------------------+
The filesystem layer acts as a uniform adapter between the agent and heterogeneous backend systems. The agent sees only files and directories. The filesystem layer handles translation to whatever the underlying system requires.
What This Replaces
The traditional agent architecture requires the agent to juggle many distinct interfaces. Vector search for semantic retrieval becomes grep + find over structured files. Database query tools become a mounted database via TigerFS. API call tools become mounted API responses or curl. Memory tools become cat and echo > on memory files. Web search tools become mounted web content. Document parsing tools become pre-parsed markdown files in the filesystem. Schema lookup tools become ls and cat on schema files.
Each replacement eliminates a tool schema from the agent’s context window, a set of error handling logic from the framework, and a decision point from the agent’s planning process. The Vercel results confirm this is not theoretical — it measurably improves performance.
In concrete terms:
# Traditional: 12 tool calls, each with schema overhead
agent.call("schema_lookup", table="orders")
agent.call("query_builder", filter="region=US", agg="sum(revenue)")
agent.call("query_validator", sql=result)
agent.call("result_formatter", format="markdown")
# Filesystem-first: the same task
cat /data/schema/orders.yml
grep -r "revenue" /data/queries/
echo "SELECT region, SUM(revenue) FROM orders GROUP BY region" > /workspace/query.sql
bash /tools/run-query.sh /workspace/query.sql > /workspace/result.md
The first approach requires four distinct tool schemas, four sets of error handling, and four decision points. The second uses operations the LLM has seen millions of times in training data. The filesystem version is longer in characters but shorter in cognitive load — for the model and for the developer maintaining it.
The Three Pillars
The pattern rests on three pillars, identified by Liu and confirmed across all sources.
Files as Context. Agent memory, conversation history, project knowledge, and accumulated state all live as files. The agent reads them to recover context and writes to them to persist new knowledge. Karpathy’s wiki pattern is the fullest expression of this pillar, with compiled knowledge pages, cross-references, and a lint process to maintain consistency.
Files as Search. Instead of embedding-based retrieval from a vector store, the agent searches by traversing the filesystem. grep -r "error handling" docs/ is a search. find . -name "*.yaml" -newer last_check is a temporal query. ls data/customers/by-region/ is a faceted browse. The filesystem’s hierarchical structure is the index.
Files as Skills. New capabilities are taught to the agent by dropping markdown files into a skills directory. A skill file describes how to accomplish a task — what tools to use, what patterns to follow, what edge cases to watch for. The agent reads the skill, gains the capability, and discards it from context when done. This is “soft” tooling: capabilities defined in natural language rather than code.
Where This Leads
For Teams Building Agents Today
Four priorities emerge from the research.
Invest in data structure, not tool counts. Well-organized, well-documented files are more valuable than sophisticated retrieval mechanisms. Vercel’s success came not from a better agent architecture but from having a clean semantic layer already expressed as YAML and Markdown. The quality of your file tree is the quality of your agent.
Start with bash, add tools reluctantly. Give the agent filesystem access and a shell first. Only add specialized tools when you can demonstrate that the filesystem approach is insufficient for a specific use case. Every tool you add is complexity you maintain and a constraint you impose on the model.
Use skills over MCP where possible. If a capability can be taught through a markdown file, it should be. MCP tools are appropriate for capabilities that genuinely require programmatic access — real-time data streams, authenticated API calls, stateful interactions. But many “tools” are instructions the agent could follow if given in natural language. Write the markdown file instead.
Try the scaffold. If you want to see what this architecture looks like in practice, LIOTHIL generates a complete filesystem-first environment from an interactive interview. It takes minutes, produces only files, and you own the result completely. Fork it, edit it, delete what you do not need. The architecture is the filesystem, and the filesystem is already yours.
The Convergence Direction
The various projects and patterns documented here are converging toward a single architectural vision: the agent’s operating system is a filesystem. Memory is files. Knowledge is files. Skills are files. Data access is file mounts. The agent’s “tools” are the six Unix operations it already knows.
This convergence is being driven by a practical observation: LLMs are already better at using filesystems than they are at using custom tools. They have been trained on orders of magnitude more filesystem interaction examples than MCP tool usage examples. Fighting this statistical reality with ever-more-sophisticated tool frameworks is swimming upstream.
The logical endpoint — still some distance away — is that an agent’s environment is fully described by its filesystem tree. Deploying an agent becomes mounting the right directories. Configuring an agent becomes editing files in its skill and context directories. Debugging an agent becomes reading its output and log files. The entire lifecycle is file operations, end to end.
Open Questions
Several challenges remain. Non-plaintext documents (PDFs, images, video) still require parsing before they become file-navigable. LlamaIndex’s LlamaParse addresses this but it is not yet seamless. Scale is a constraint: filesystem search works well for small-to-medium collections, but at 100K+ documents you need indexing — and that starts to look like vector search again. Real-time data stretches the metaphor: files are snapshots, and streaming data or live dashboards do not fit naturally. Multi-agent coordination on shared filesystem state is still maturing, though TigerFS addresses it with database-backed atomicity.
The pattern also has hard limits worth naming. Real-time streaming data — market feeds, sensor telemetry, event buses — requires persistent connections and push semantics that files do not model well. Document collections exceeding ~100K items genuinely need indexing infrastructure; grep over 100,000 files is not a retrieval strategy, it is a denial-of-service attack on your own system. Authenticated stateful API sessions — OAuth flows, WebSocket connections, sessions requiring maintained connection state — cannot be reduced to file reads without losing the statefulness that makes them work. In these cases, purpose-built tools remain the right answer. The filesystem-first principle is not “files for everything” — it is “files first, tools when files genuinely cannot do the job.”
These are engineering challenges, not architectural ones. The pattern is sound. The implementations are catching up.
Sources and Resources
Academic Papers
Piskala, D.B. (2026). “From ‘Everything is a File’ to ‘Files Are All You Need’: How Unix Philosophy Informs the Design of Agentic AI Systems.” arXiv:2601.11672.
Blog Posts and Articles
Liu, J. (2026). “Files Are All You Need.” LlamaIndex Blog.
Vercel Engineering (2026). “We Removed 80% of Our Agent’s Tools.” Vercel Blog.
Karpathy, A. (2026). “LLM Wiki Pattern.” GitHub Gist.
Vercel Engineering (2026). “Build Knowledge Agents Without Embeddings.” Vercel Blog.
Open Source Projects
LIOTHIL — Generate filesystem-first research environments for Claude Code. The scaffold builder described in this article.
just-bash — Virtual bash environment with in-memory filesystem for AI agents.
TigerFS — Mount PostgreSQL as a filesystem.
GitNexus — Zero-server code intelligence engine with MCP integration.
semtools — LlamaIndex filesystem tools for semantic search.
Key Source Code
Social / Discussion
Related Reading
Horthy, D. “Advanced Context Engineering for Coding Agents.”
Mistele, K. “Writing a Good Claude.md.”
Willison, S. “Claude Skills vs MCP Analysis.”
Anthropic Engineering. “How We Built Our Multi-Agent Research System.”


