1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
# FartOS
A learning + coordination layer for AI agents. Feed it any JSON, and it learns
what's true, proves what works, and improves your agent's instructions over time.
**Models do the thinking. The framework does the bookkeeping.**
## What this is
Bolt FartOS onto any agent. It ingests any data (chat, metrics, on-chain, tickets),
lets models judge what it means (no string matching), remembers what's true (scoped,
never leaking), split-tests behavior with a multi-arm bandit, promotes the winners,
and flags a human when it's unsure. It runs fully offline out of the box — no API key,
no database — and every piece is swappable.
**Not a chatbot. Not a wrapper. A learning engine.**
## Architecture
```
┌──────────────────────────────────────────────────────────┐
│ EVOLVE (main API) │
├────────────────────────────────────────────────────────────┤
│ MIDDLEWARE — the thinking (every judgment is a model call)│
│ intent · sentiment · contradiction · distill · narrate · embed
├────────────────────────────────────────────────────────────┤
│ SUBSTRATE — the bookkeeping (deterministic) │
│ scope · lessons · semantic recall · confidence · bandit │
├────────────────────────────────────────────────────────────┤
│ ORCHESTRATION │ COMMUNICATION (actuation) │
│ durable queue · workers │ promote · escalate · webhook · │
│ agents · coordinator │ code-edit (a broken sender │
│ │ never crashes the loop) │
└────────────────────────────────────────────────────────────┘
```
## Install
```bash
npm install @buttfart_os/fartos
# optional: a hosted model provider
npm install @buttfart_os/provider-anthropic # or @buttfart_os/provider-openai
```
`@buttfart_os/fartos` ships an in-memory store and a local embedder, so it runs with zero
external services.
## Quick start
```typescript
import { Evolve } from '@buttfart_os/fartos';
import { AnthropicProvider } from '@buttfart_os/provider-anthropic';
const evolve = new Evolve({
provider: new AnthropicProvider({ apiKey: process.env.ANTHROPIC_API_KEY! }),
defaultScope: { tenant: 'my-app' },
});
// 1) Ingest ANY JSON — non-blocking, no adapter required.
await evolve.ingest(
{ userMessage: 'please keep replies short and warm' },
{ scope: { tenant: 'my-app', user: 'user-123' }, type: 'feedback' },
);
// 2) A worker processes the queue: judge → reflect → reconcile → learn.
await evolve.drain();
// 3) Serve evolved instructions (base prompt + learned, active lessons).
const { instructions } = await evolve.getInstructions({
scope: { tenant: 'my-app', user: 'user-123' },
baseInstructions: 'You are a helpful customer support agent.',
});
```
Want it fully offline (tests, CI, demos)? Swap the provider for the built-in
`MockProvider` and use the default `LocalEmbedder` — no keys, deterministic output.
See `examples/closed-loop/run.js` for an end-to-end run.
## Split-test behavior (multi-arm bandit)
```typescript
const exp = await evolve.experiments.create({
scope: { tenant: 'my-app' },
hypothesis: 'a warmer tone lifts replies',
arms: [
{ label: 'control', instruction: 'reply normally' },
{ label: 'warm', instruction: 'reply warmly and concisely' },
],
minSamples: 100,
primaryMetric: 'reply_rate',
guardrails: [{ name: 'unsubscribe', lowerIsBetter: true, tolerance: 0.05 }],
});
const a = await evolve.experiments.assign(exp.id, viewerId); // stable per viewer
await evolve.experiments.recordOutcome(exp.id, viewerId, { // push your metric
primary: 0.42,
guardrails: { unsubscribe: 0.01 },
});
const why = await evolve.experiments.explain(exp.id); // plain-language rationale
```
Real statistics decide the winner (Welford variance, Welch's t-test, two-proportion
z-test); a guardrail can veto a "win"; the winner auto-promotes into instructions and
losers retire.
## API
```typescript
evolve.ingest(json, opts) // Feed ANY JSON (non-blocking; enqueues)
evolve.drain(max?) // Worker: process queued events into lessons
evolve.queueDepth() // How many events are waiting
evolve.recall(query, opts) // Semantic search over learned lessons
evolve.getInstructions(opts) // Base prompt + active lessons (+ served arm)
evolve.promoteScope(lessonId, scope) // Share a corroborated lesson up-scope
evolve.maintain(opts) // Evict weak lessons (cadence is yours)
evolve.registerActuator(actuator) // Add an outbound channel (webhook, etc.)
evolve.recordedErrors() // Inspect any degraded middleware calls
evolve.experiments.create(cfg) // Start a multi-arm experiment
evolve.experiments.assign(id, key) // Stable arm assignment for a key
evolve.experiments.recordOutcome(id, key, metrics) // Push a success metric
evolve.experiments.explain(id) // Read why a variant won (plain language)
```
## Core concepts
### Ingest — any data, no adapter
`ingest(json)` accepts any shape. A `context` object is first-class ("what this data
means"). Numeric/on-chain blobs are canonicalized into an embeddable surface, so even
data with no text is recallable. A math-only cost gate throttles high-volume numeric
streams so you don't pay for redundant model calls.
### Intelligence — model-driven, zero strings
Intent, sentiment, contradiction, distillation, and narration are each a model call
with a typed, validated output. Recall is semantic (vector similarity), never
`.includes()`. Every judgment unit is independently configurable — its model, role,
prompt, and on/off switch — via the `middlewares` config, not code. Sentiment is off by default.
### Memory — earned, scoped, decaying
Learned lessons gain confidence by **repetition**, climbing a ladder (`candidate →
testing → active`); only `active` lessons are served. Retrieval is by meaning. Weak
lessons are evicted. Everything is scoped `global > tenant > project > user > session
> agent` with zero cross-scope leakage — nothing becomes global by accident.
### Experimentation — prove before trust
A multi-arm bandit (epsilon-greedy / UCB / Thompson; explore-vs-exploit is a dial)
with real statistics and guardrail vetoes. Winners auto-promote into instructions;
the agent can read why in plain language.
### Communication — act and escalate
Decisions emit typed effects routed to pluggable actuators: promote into the
instruction store (pull), notify a webhook/email (your sender), or apply a code edit
(your writer). The framework never touches the network or filesystem itself, and a
broken sender can never crash the loop. Low-confidence judgments escalate to a human.
### Persistence — in-memory by default, swap to durable
Out of the box everything is in-memory. To persist, implement the small `LessonRepo`,
`VectorStore`, `QueueStore`, and `BanditStore` interfaces and pass them in. Back the
queue and stores with a shared service to run multiple workers. See `EXTENDING.md`.
## Packages
| Package | Description |
|---------|-------------|
| `@buttfart_os/fartos` | The learning engine (in-memory defaults, local embedder, runs offline) |
| `@buttfart_os/provider-anthropic` | Claude provider |
| `@buttfart_os/provider-openai` | GPT provider |
> No adapter packages needed — `ingest(json)` takes raw data of any shape directly.
> Persistence is bring-your-own: implement `LessonRepo` / `VectorStore` / `QueueStore` /
> `BanditStore` to swap the in-memory defaults for a durable backend (see `EXTENDING.md`).
## Extending
Bring your own data, storage, model, metric, and sender — swap any piece without
forking core. See **`EXTENDING.md`** for: custom middleware, custom storage, a custom
outbound channel, and pushing a metric. See **`DECISIONS.md`** for the design
rationale, and **`GOAL-TESTS.md`** for the capability → test coverage map.
## License
MIT