FartOS Framework · BUTTFART

    ______           __  ____  _____
   / ____/___ ______/ /_/ __ \/ ___/
  / /_  / __ `/ ___/ __/ / / /\__ \ 
 / __/ / /_/ / /  / /_/ /_/ /___/ / 
/_/    \__,_/_/   \__/\____//____/

Production-grade behavioral learning for agentic systems.

A TypeScript learning layer that wraps around existing agent runtimes. Not a replacement. A behavioral substrate. Push it any JSON - conversations, analytics, support tickets, on-chain stats - and it judges what the data means with models, remembers what is true without leaking across scopes, and proves what works with a bandit before promoting it. What comes out the other side is an instruction set shaped by evidence, gated by confidence, and validated under real statistics.

Download .zip

fartos/README.md

# FartOS

A learning + coordination layer for AI agents. Feed it any JSON, and it learns
what's true, proves what works, and improves your agent's instructions over time.

**Models do the thinking. The framework does the bookkeeping.**

## What this is

Bolt FartOS onto any agent. It ingests any data (chat, metrics, on-chain, tickets),
lets models judge what it means (no string matching), remembers what's true (scoped,
never leaking), split-tests behavior with a multi-arm bandit, promotes the winners,
and flags a human when it's unsure. It runs fully offline out of the box — no API key,
no database — and every piece is swappable.

**Not a chatbot. Not a wrapper. A learning engine.**

## Architecture

```
┌──────────────────────────────────────────────────────────┐
│                      EVOLVE (main API)                     │
├────────────────────────────────────────────────────────────┤
│  MIDDLEWARE — the thinking (every judgment is a model call)│
│  intent · sentiment · contradiction · distill · narrate · embed
├────────────────────────────────────────────────────────────┤
│  SUBSTRATE — the bookkeeping (deterministic)               │
│  scope · lessons · semantic recall · confidence · bandit   │
├────────────────────────────────────────────────────────────┤
│  ORCHESTRATION            │  COMMUNICATION (actuation)      │
│  durable queue · workers  │  promote · escalate · webhook · │
│  agents · coordinator     │  code-edit (a broken sender     │
│                           │  never crashes the loop)        │
└────────────────────────────────────────────────────────────┘
```

## Install

```bash
npm install @buttfart_os/fartos
# optional: a hosted model provider
npm install @buttfart_os/provider-anthropic   # or @buttfart_os/provider-openai
```

`@buttfart_os/fartos` ships an in-memory store and a local embedder, so it runs with zero
external services.

## Quick start

```typescript
import { Evolve } from '@buttfart_os/fartos';
import { AnthropicProvider } from '@buttfart_os/provider-anthropic';

const evolve = new Evolve({
  provider: new AnthropicProvider({ apiKey: process.env.ANTHROPIC_API_KEY! }),
  defaultScope: { tenant: 'my-app' },
});

// 1) Ingest ANY JSON — non-blocking, no adapter required.
await evolve.ingest(
  { userMessage: 'please keep replies short and warm' },
  { scope: { tenant: 'my-app', user: 'user-123' }, type: 'feedback' },
);

// 2) A worker processes the queue: judge → reflect → reconcile → learn.
await evolve.drain();

// 3) Serve evolved instructions (base prompt + learned, active lessons).
const { instructions } = await evolve.getInstructions({
  scope: { tenant: 'my-app', user: 'user-123' },
  baseInstructions: 'You are a helpful customer support agent.',
});
```

Want it fully offline (tests, CI, demos)? Swap the provider for the built-in
`MockProvider` and use the default `LocalEmbedder` — no keys, deterministic output.
See `examples/closed-loop/run.js` for an end-to-end run.

## Split-test behavior (multi-arm bandit)

```typescript
const exp = await evolve.experiments.create({
  scope: { tenant: 'my-app' },
  hypothesis: 'a warmer tone lifts replies',
  arms: [
    { label: 'control', instruction: 'reply normally' },
    { label: 'warm', instruction: 'reply warmly and concisely' },
  ],
  minSamples: 100,
  primaryMetric: 'reply_rate',
  guardrails: [{ name: 'unsubscribe', lowerIsBetter: true, tolerance: 0.05 }],
});

const a = await evolve.experiments.assign(exp.id, viewerId);     // stable per viewer
await evolve.experiments.recordOutcome(exp.id, viewerId, {       // push your metric
  primary: 0.42,
  guardrails: { unsubscribe: 0.01 },
});

const why = await evolve.experiments.explain(exp.id);            // plain-language rationale
```

Real statistics decide the winner (Welford variance, Welch's t-test, two-proportion
z-test); a guardrail can veto a "win"; the winner auto-promotes into instructions and
losers retire.

## API

```typescript
evolve.ingest(json, opts)            // Feed ANY JSON (non-blocking; enqueues)
evolve.drain(max?)                   // Worker: process queued events into lessons
evolve.queueDepth()                  // How many events are waiting
evolve.recall(query, opts)           // Semantic search over learned lessons
evolve.getInstructions(opts)         // Base prompt + active lessons (+ served arm)
evolve.promoteScope(lessonId, scope) // Share a corroborated lesson up-scope
evolve.maintain(opts)                // Evict weak lessons (cadence is yours)
evolve.registerActuator(actuator)    // Add an outbound channel (webhook, etc.)
evolve.recordedErrors()              // Inspect any degraded middleware calls

evolve.experiments.create(cfg)       // Start a multi-arm experiment
evolve.experiments.assign(id, key)   // Stable arm assignment for a key
evolve.experiments.recordOutcome(id, key, metrics)  // Push a success metric
evolve.experiments.explain(id)       // Read why a variant won (plain language)
```

## Core concepts

### Ingest — any data, no adapter
`ingest(json)` accepts any shape. A `context` object is first-class ("what this data
means"). Numeric/on-chain blobs are canonicalized into an embeddable surface, so even
data with no text is recallable. A math-only cost gate throttles high-volume numeric
streams so you don't pay for redundant model calls.

### Intelligence — model-driven, zero strings
Intent, sentiment, contradiction, distillation, and narration are each a model call
with a typed, validated output. Recall is semantic (vector similarity), never
`.includes()`. Every judgment unit is independently configurable — its model, role,
prompt, and on/off switch — via the `middlewares` config, not code. Sentiment is off by default.

### Memory — earned, scoped, decaying
Learned lessons gain confidence by **repetition**, climbing a ladder (`candidate →
testing → active`); only `active` lessons are served. Retrieval is by meaning. Weak
lessons are evicted. Everything is scoped `global > tenant > project > user > session
> agent` with zero cross-scope leakage — nothing becomes global by accident.

### Experimentation — prove before trust
A multi-arm bandit (epsilon-greedy / UCB / Thompson; explore-vs-exploit is a dial)
with real statistics and guardrail vetoes. Winners auto-promote into instructions;
the agent can read why in plain language.

### Communication — act and escalate
Decisions emit typed effects routed to pluggable actuators: promote into the
instruction store (pull), notify a webhook/email (your sender), or apply a code edit
(your writer). The framework never touches the network or filesystem itself, and a
broken sender can never crash the loop. Low-confidence judgments escalate to a human.

### Persistence — in-memory by default, swap to durable
Out of the box everything is in-memory. To persist, implement the small `LessonRepo`,
`VectorStore`, `QueueStore`, and `BanditStore` interfaces and pass them in. Back the
queue and stores with a shared service to run multiple workers. See `EXTENDING.md`.

## Packages

| Package | Description |
|---------|-------------|
| `@buttfart_os/fartos` | The learning engine (in-memory defaults, local embedder, runs offline) |
| `@buttfart_os/provider-anthropic` | Claude provider |
| `@buttfart_os/provider-openai` | GPT provider |

> No adapter packages needed — `ingest(json)` takes raw data of any shape directly.
> Persistence is bring-your-own: implement `LessonRepo` / `VectorStore` / `QueueStore` /
> `BanditStore` to swap the in-memory defaults for a durable backend (see `EXTENDING.md`).

## Extending

Bring your own data, storage, model, metric, and sender — swap any piece without
forking core. See **`EXTENDING.md`** for: custom middleware, custom storage, a custom
outbound channel, and pushing a metric. See **`DECISIONS.md`** for the design
rationale, and **`GOAL-TESTS.md`** for the capability → test coverage map.

## License

MIT

Explorer83

FartOS

The Architecture

Ingest

Push any JSON and the framework never fetches it for you. evolve.ingest normalizes the blob, merges scope over your default, keeps an optional context object first-class, and returns before any model runs. No adapter required. The work happens later, off the hot path.

Judgment Gate

Models do the thinking, so spending one on every tick of a numeric stream is waste. A gate decides which events earn a model call using math and structure alone - z-score moves against a per-scope baseline, dedup, and a sample rate. No keywords, ever. It is replaceable.

Middleware

Every act of judgment is a model call, never a keyword list. Six units ship - intent, sentiment, contradiction, distill, embed, narrate - each with a role, a prompt template, a JSON schema, and its own model set in config. Reflection is the drain loop, not a unit. Add your own in one registry call.

Memory

One unified Lesson record carries the memory type and a status of candidate, testing, active, or retired. Confidence is earned by repetition through a reconcile step, not lifted from a model. Old, weak memory decays and is evicted on maintain. In-memory by default; durable when you supply a store.

Semantic Recall

Retrieval is by meaning, not substring. The embed middleware turns text into a vector and a VectorStore returns the nearest lessons, scope-filtered. A plain in-memory cosine store ships as the zero-dependency default; real vector databases plug into the same interface.

Scope Architecture

Six isolation levels: global, tenant, project, user, session, agent. Every read and write routes through the enforcer, and an empty scope is never a read wildcard. Cross-scope leakage is structural impossibility, and sharing upward is explicit and guarded.

The Bandit

Behavior is proven by a multi-arm bandit, not a fixed A and B. Policy is pluggable between epsilon-greedy, UCB, and Thompson; exploration is a dial. Welford stats and a real two-sample test back every decision, and a guardrail veto blocks a win that damages an adjacent metric.

Communication

The core decides and emits a typed Effect; actuators you configure carry it out. The default writes a promotion into the active instruction set. A webhook actuator notifies, a code-edit actuator applies. A throwing sender never breaks the loop, and actuation is idempotent.

Orchestration

A coordination of many agents, not one object. An AgentRegistry holds first-class agents bound to scope subtrees; a Coordinator handles handoff, spawn, and report. ingest enqueues to a queue and a worker drains it, so model calls never block the hot path. Point the queue and stores at a shared backend to scale across processes.

Explain and Escalate

An agent can read why a variant won: explain assembles the statistics deterministically and narrate only phrases them, so no number is hallucinated. When confidence drops below the floor or a contradiction stays unresolved, an escalation fires to the human channel you wired.

Configuration

Whatever can be flexible must be. Lesson repo, vector store, queue, bandit store, providers, embedder, bandit policy, actuators, and middlewares are all bring-your-own behind one interface each, passed to the constructor. A zero-config instance still boots on sensible in-memory built-ins.

Offline by Default

Tests and examples run with no key and no network. A deterministic MockProvider and a LocalEmbedder ship in the testing module, the clock and id generator are injectable, and the closed-loop example evolves instructions, records a metric trend, and promotes an arm offline.