# The Slingshot Method — Build Manual

> Run a huge skill library at ~1% of the context cost. Carry almost nothing, fetch anything in milliseconds, let a local model do the routing for free.
>
> This is the hands-on version. If you can run a terminal and write a little JavaScript, you can build the whole thing in an afternoon.

---

## The idea in one breath

An AI "skill" is just a markdown file: a **name**, a **description**, and **instructions**. The catch: for the model to even *know a skill exists*, that name + description sits in its context **every single turn** — whether you use it or not.

So 271 skills "always on" = **~14,651 tokens of dead weight on every message**. Most of it for tools you won't touch this session. Pure drag.

The slingshot: **don't keep them loaded.** Keep them as files on disk (cost: zero). When a task comes in, a tiny local pipeline figures out *which* skill is needed and reads in only that one, only then. The deciding is done by a model running on **your own machine** — so the routing itself costs nothing.

Three layers, cheapest first:

| Layer | What it does | Cost |
|---|---|---|
| **L1 · keyword** | match the prompt against skill *names* in plain code | ~0, instant |
| **L2 · local brain** | on a keyword miss, a local Ollama model decides by *meaning* | 0 API tokens |
| **L3 · fetch** | read in only the one chosen skill's full text, only now | 1 of N, briefly |

---

## Prerequisites

```bash
# 1. Node (any recent version)
node -v        # v18+

# 2. Ollama — the local model runtime (free, runs on your machine)
#    https://ollama.com/download
ollama --version

# 3. Pull two models: one for embeddings (semantic search), one for reasoning
ollama pull nomic-embed-text     # ~274 MB — turns text into vectors
ollama pull qwen2.5:7b           # ~4.7 GB — reads a task and picks skills
#    (lighter box? use llama3.2:3b. Beefier? qwen2.5:14b reasons better.)
```

That's it. No API keys. No cloud. Nothing rented.

---

## Step 1 — A skills folder

Each skill is a folder with a `SKILL.md`. The top of the file is YAML frontmatter with a **name** and a **description** — that description is what the router reads.

```
~/skills/
  rust-patterns/SKILL.md
  django-tdd/SKILL.md
  react-performance/SKILL.md
  ...
```

```markdown
---
name: react-performance
description: React and Next.js performance — memoization, list virtualization,
  bundle splitting, avoiding re-renders, Core Web Vitals.
---

# React performance

When the UI feels slow… (the actual playbook the agent follows)
```

You can write your own, or grab a pack (this method was built on the 271-skill ECC library). The point: **plain files, on disk, costing nothing until read.**

---

## Step 2 — Index them (`index.json`)

One tiny script walks the folder and pulls each skill's name + description + path into a flat list. This index is the only thing the search ever scans.

```js
// build-index.mjs
import fs from 'node:fs';
import path from 'node:path';
import os from 'node:os';

const ROOT = path.join(os.homedir(), 'skills');
const out = [];

function frontmatter(md) {
  const m = md.match(/^---\n([\s\S]*?)\n---/);          // grab the YAML block
  const fm = {};
  if (m) for (const line of m[1].split('\n')) {
    const i = line.indexOf(':');
    if (i > 0) fm[line.slice(0, i).trim()] = line.slice(i + 1).trim();
  }
  return fm;
}

for (const dir of fs.readdirSync(ROOT)) {
  const p = path.join(ROOT, dir, 'SKILL.md');
  if (!fs.existsSync(p)) continue;
  const fm = frontmatter(fs.readFileSync(p, 'utf8'));
  out.push({ name: fm.name || dir, description: fm.description || '', path: p });
}

fs.writeFileSync(path.join(ROOT, 'index.json'), JSON.stringify(out, null, 2));
console.log(`indexed ${out.length} skills`);
```

```bash
node build-index.mjs     # writes ~/skills/index.json
```

---

## Step 3 — L1: keyword search (`find.mjs`)

Score the prompt's words against each skill's *name*. High precision, instant, no model. This catches the obvious case ("django tests" -> `django-tdd`), which is most of them.

```js
// find.mjs  ->  node find.mjs "django tests"
import fs from 'node:fs'; import path from 'node:path'; import os from 'node:os';
const idx = JSON.parse(fs.readFileSync(path.join(os.homedir(),'skills','index.json'),'utf8'));
const q = process.argv.slice(2).join(' ').toLowerCase();
const terms = q.split(/\W+/).filter(t => t.length > 2);

const hits = idx.map(e => {
  const name = e.name.toLowerCase();
  let s = 0;
  for (const t of terms) { if (name === t) s += 12; else if (name.includes(t)) s += 6; }
  return { ...e, s };
}).filter(e => e.s > 0).sort((a,b) => b.s - a.s).slice(0, 3);

console.log(hits.length ? hits.map(h => `${h.name}  -> ${h.path}`).join('\n') : 'no keyword match');
```

The miss it can't fix: *"make my page load faster"* shares **no word** with `react-performance`. That's what L2 is for.

---

## Step 4 — L2: the local brain (`route.mjs`)

Two moves, both on Ollama:

1. **Embed** every skill description once (cache the vectors). Then embed the prompt and find the closest by *meaning* — "load faster" lands near "performance" even with no shared word.
2. **Reason**: hand the top ~12 candidates to a chat model (`qwen2.5`) and ask it to pick the ones that genuinely apply. The model *decides*.

```js
// route.mjs  ->  node route.mjs --reindex   (once)
//            ->  node route.mjs "make my page load faster"
import fs from 'node:fs'; import path from 'node:path'; import os from 'node:os';
const ROOT = path.join(os.homedir(), 'skills');
const O = 'http://127.0.0.1:11434';

const embed = async (text) =>
  (await (await fetch(`${O}/api/embeddings`,
    { method:'POST', body: JSON.stringify({ model:'nomic-embed-text', prompt:text }) })).json()).embedding;

const cos = (a,b) => { let d=0,x=0,y=0; for (let i=0;i<a.length;i++){d+=a[i]*b[i];x+=a[i]*a[i];y+=b[i]*b[i];} return d/(Math.sqrt(x)*Math.sqrt(y)); };

if (process.argv.includes('--reindex')) {                    // embed all skills, cache
  const idx = JSON.parse(fs.readFileSync(path.join(ROOT,'index.json'),'utf8'));
  const out = [];
  for (const e of idx) out.push({ ...e, vec: await embed(`${e.name}: ${e.description}`) });
  fs.writeFileSync(path.join(ROOT,'vectors.json'), JSON.stringify(out));
  console.log(`embedded ${out.length}`); process.exit(0);
}

const task = process.argv.slice(2).join(' ');
const vecs = JSON.parse(fs.readFileSync(path.join(ROOT,'vectors.json'),'utf8'));
const qv = await embed(task);
const shortlist = vecs.map(e => ({ ...e, s: cos(qv, e.vec) }))
                      .sort((a,b)=>b.s-a.s).slice(0, 12);     // top 12 by meaning

// let qwen pick which actually apply
const list = shortlist.map((c,i)=>`${i+1}. ${c.name} - ${c.description}`).join('\n');
const res = await (await fetch(`${O}/api/chat`, { method:'POST', body: JSON.stringify({
  model: 'qwen2.5:7b', stream: false, options: { temperature: 0 },
  messages: [
    { role:'system', content:'Pick ONLY skills that genuinely apply. Reply with a JSON array of names, e.g. ["react-performance"] or [].' },
    { role:'user', content:`TASK:\n${task}\n\nCANDIDATES:\n${list}\n\nJSON array of names:` },
  ],
})).json()).message.content;

const names = JSON.parse((res.match(/\[[\s\S]*?\]/) || ['[]'])[0]);
const picks = names.map(n => shortlist.find(c => c.name === n)).filter(Boolean);
console.log(picks.length ? picks.map(p => `${p.name}  -> ${p.path}`).join('\n') : 'router: none needed');
```

```bash
node route.mjs --reindex                       # one-time, ~40s
node route.mjs "make my page load faster"      # -> react-performance
```

> The reasoning runs entirely on your hardware. Every routing decision is compute you **don't rent**. That's the whole trick — intelligence offloaded to local silicon.

---

## Step 5 — Wire it into your agent

Most agent harnesses (Claude Code, etc.) let you run a script **before each prompt** (a `UserPromptSubmit` hook). That script is the slingshot in one place:

```
keyword pass  ->  hit?  -> surface it (instant)
              ->  miss? -> ask the local brain  -> surface what it returns
```

The hook prints the chosen skill's path; the agent reads that one `SKILL.md` and follows it. Pseudocode for the hook:

```js
// skill-hook.mjs  — reads the prompt on stdin, prints any matching skill
const prompt = readStdin();
let hits = keywordSearch(prompt);                 // L1, instant
if (!hits.length && isSubstantive(prompt))
  hits = await localRoute(prompt);                // L2, on miss only
if (hits.length)
  print(`Relevant skills — read the SKILL.md if it fits:\n` +
        hits.map(h => `  • ${h.name}  -> ${h.path}`).join('\n'));
// nothing matched -> print nothing. Zero noise, zero cost.
```

Register it as the harness's pre-prompt hook. Done. Now every message auto-surfaces the right skill, and you only ever pay for the one you actually open.

---

## Test it

```bash
node find.mjs  "django tests"                 # L1 -> django-tdd        (instant)
node route.mjs "my app feels sluggish"        # L2 -> react-performance (~1-2s, local)
node route.mjs "what's the weather"           # -> none needed          (silent)
```

If keyword catches it, you never even wake the model. If it misses, the local brain reasons it out. If nothing fits, nothing fires.

---

## Why it works (the numbers)

| | Tokens / turn | What you carry |
|---|---|---|
| **Stack all 271** | **14,651** | every description, every turn, mostly unused |
| **Slingshot** | **161** | 2 always-on + one tiny routing rule |

**~98.9% lighter.** Same firepower. On a flat-rate plan that's not a smaller bill — it's **range**: sessions run far deeper before they choke, turns are snappier, and the library can grow to thousands without the baseline moving. The router's thinking is free because it's yours.

---

## Troubleshooting

- **`route.mjs` errors / hangs** -> is Ollama running? `ollama list` should show your models. Start it: `ollama serve`.
- **Bad picks** -> try a stronger router model: `qwen2.5:14b`. Or widen the shortlist (top 12 -> 20).
- **Too slow on every prompt** -> only call `route.mjs` on a *keyword miss* (as in Step 5), and skip it for short/chit-chat prompts. Keep L1 instant; L2 is the exception, not the rule.
- **New skills not found** -> re-run `build-index.mjs` then `route.mjs --reindex`.

---

*Built by Synthera. Same firepower, 1% of the weight.*