A Agent A / docs
DOCUMENTATION

Data & storage

Where things live, who can read what, and the rules the agent follows so customer-facing pages never accidentally see internal data.

Three places things live

1. Your workspace (~/workspace/)

The filesystem the agent runs in. Markdown drafts, generated assets, custom skills, jobs, scratch space, and these docs all live here.

  • Owner: the agent Linux user (the agent itself).
  • Reachable from chat. NOT reachable from the Console process or the public site process.
  • Two conventional paths:
  • ~/workspace/uploads/ — files you paste into chat land here.
  • ~/workspace/downloads/ — files the agent registers for download via the workspace UI. Served by nginx at a gated URL.

2. PostgreSQL (three databases)

Persistent state for anything structured. Never SQLite, never JSON-on-disk for application state.

Database Who owns it Agent Console Public site
console_db console R/W R/W (blocked)
site_db site R/W R/W R/W
console_site_db shared R/W R/W R/W

What goes where:

  • console_db — internal-only state. Customer lists, deal stages, internal scoring tables, audit logs. The public site cannot reach it; the REVOKE is at the database role layer, not just convention.
  • site_db — content the public site renders. Pre-computed reports, doc pages, anything that should be cheap to read on a page render. The Console writes here, the site reads.
  • console_site_db — an explicit cross-surface channel. Console writes a moderation queue; site renders approved entries; site writes back interaction signals; Console reads them.

3. Memory and progress (markdown)

Two special files in your workspace:

  • ~/workspace/.memory.md — read at the start of every chat. Use "remember this: X" to append.
  • ~/workspace/progress.md — change log. The agent appends after meaningful changes; you can read it to reconstruct what happened.

These are markdown on purpose: you can grep them, version them, edit them by hand, and grep them again.

Why three databases instead of one

Two reasons.

  1. Surface separation. A public-site visitor could, in principle, find a way to query the database role the site authenticates as. If that role only has access to site_db, the blast radius of any leak is limited to data you already decided was public.
  2. Clarity of intent. When the agent writes a Console app, it knows to use console_db. When it writes a site page, site_db. When it builds a cross-surface workflow, console_site_db. No accidental "should this table be public?"

The agent does not try to bend these rules. If you ask "expose this Console page publicly," it will tell you it needs to move the data through console_site_db and rebuild the read path on the site, rather than reach into console_db from the site.

Working with the database

The pattern, every time:

  • SQLAlchemy session imported from src/db.py (per-app shared module).
  • Pydantic models guard every input and every API boundary.
  • Raw psycopg2 only when SQLAlchemy is overkill (read-only token-gated lookups, simple scripts).

You will see CREATE TABLE IF NOT EXISTS statements at the top of new apps. The agent treats schema as code: each app declares its own tables. There is no platform migration system; new columns are added with ALTER TABLE and the agent logs the change.

What about ClickHouse

ClickHouse is available locally for analytics workloads where PostgreSQL would be slow: time-series rollups over millions of rows, columnar aggregations, that kind of thing. The agent reads the ClickHouse skill before using it, picks it deliberately, and tells you why. Default remains PostgreSQL.

Where things do NOT live

  • No platform-managed object storage. If you want to store a 50 MB PDF or a video, save it as a file in ~/workspace/downloads/ (gated by nginx) or as a BYTEA column in Postgres (fine for small artifacts).
  • No environment variables for secrets. Credentials live in the typed secret store, scoped to surfaces and connectors. The agent never copies them into a file or a database column.
  • No "platform settings" page beyond what the workspace UI exposes. Schedules, modes, approvals are all explicit toggles.

Data lifecycle

  • Files you create live until you delete them. The agent does not garbage-collect your workspace.
  • Database rows live until you (or an explicit cleanup job) remove them. The agent writes cleanup jobs when you ask: "delete vsg_runs rows older than 90 days every Sunday."
  • Memory and progress files grow over time. The agent occasionally suggests pruning when memory gets noisy.
  • Connector audit logs are retained by the platform; you can browse them but cannot delete them.

Under the hood. Each Postgres role authenticates via Unix-socket peer auth (the OS user matches the role name). There is no password to leak, no connection string to misconfigure. Grants and REVOKEs are explicit and visible. The system_db database exists for platform internals (the api-proxy, the connectors service, the bridge); no surface ever connects to it.

Last updated 2026-05-29