Token Optimization for AI Agents

VulpineOS reduces the tokens required for an agent to understand a web page from tens of thousands down to a few thousand, cutting cost and latency while improving accuracy.

The Problem

A typical e-commerce page produces ~50,000 tokens when serialized as a full accessibility tree. Most agents hit context window limits or waste budget on structural noise. VulpineOS applies four optimization layers.

Viewport Pruning

Only nodes visible in the current viewport (plus a small buffer) are included in snapshots. Off-screen content is excluded entirely.


Full page AX tree:     ~50,000 tokens
Viewport-pruned:       ~3,000-5,000 tokens  (90-94% reduction)

The viewport buffer is configurable — by default it extends 200px beyond the visible area to capture elements the user is about to scroll into:


{ "method": "Page.getOptimizedDOM", "params": { "viewportBuffer": 200 } }

Result Caching

Consecutive snapshots of the same page are diffed internally. If the DOM hasn’t changed since the last snapshot, VulpineOS returns a cache hit flag instead of re-serializing the entire tree.


// First call: full snapshot (3,200 tokens)
{ "v": 1, "nodes": [...], "cached": false }
 
// Second call, no changes: cache hit (12 tokens)
{ "v": 1, "cached": true, "hash": "a1b2c3" }

Cache invalidation happens automatically on navigation, DOM mutations, or scroll position changes beyond the viewport buffer.

Incremental Snapshots

When the page has changed partially (e.g., a dropdown opened, a form field updated), VulpineOS sends only the delta:


{
  "v": 1,
  "incremental": true,
  "added": [[3, "li", "New option", {"idx": 7}]],
  "removed": [7, 12],
  "modified": [{"idx": 4, "name": "Updated text"}]
}

Incremental snapshots are typically 80% smaller than full snapshots:

Scenario	Full snapshot	Incremental	Reduction
Dropdown open	3,200 tokens	420 tokens	87%
Form field update	3,200 tokens	180 tokens	94%
Tab switch	3,200 tokens	890 tokens	72%
Page navigation	3,200 tokens	N/A (full)	0%

Tool Call Batching

Multiple tool calls in a single agent turn are batched into one Juggler round-trip, reducing protocol overhead:


// Agent requests 3 actions — batched into one message
{
  "batch": [
    { "method": "Page.getOptimizedDOM" },
    { "method": "Page.screenshot", "params": { "clip": true } },
    { "method": "Page.evaluate", "params": { "expression": "document.title" } }
  ]
}

Batching saves ~200ms of round-trip latency per additional tool call and avoids redundant page state serialization.

End-to-End Example

A typical agent interaction with a search results page:


Step 1 — Initial snapshot:          4,100 tokens (viewport-pruned + optimized DOM)
Step 2 — Click search result:          0 tokens (action only)
Step 3 — New page snapshot:         3,400 tokens (full, new page)
Step 4 — Scroll down:                 620 tokens (incremental, new viewport content)
Step 5 — Extract data:                  0 tokens (cached, no DOM change)
                                   ─────────
Total:                              8,120 tokens

Without optimization:              ~200,000 tokens (5 full AX tree dumps)
Savings:                                    96%

Configuration


const context = await browser.newContext({
  firefoxUserPrefs: {
    'vulpineos.dom_export.enabled': true,
    'vulpineos.dom_export.viewport_pruning': true,
    'vulpineos.dom_export.caching': true,
    'vulpineos.dom_export.incremental': true,
  }
})