Repository layout

The on-disk shape of a tori archive: the directory tree, what each file is, and the manifest fields.

A capture writes one self-contained repository. Everything it produces, records, media, views, styling, and the manifest, lives under a single root, and every internal reference is a relative path, so the folder is movable and opens with no network.

Where it lands

The root is <out>/x/<slug>, where <out> is -o/--out (default $HOME/data/tori, or $TORI_OUT). The slug encodes the target kind:

Target	Slug
Profile `karpathy`	`karpathy`
Tweet `20`	`status-20`
Thread `20`	`thread-20`
Search `from:nasa`	`search-from-nasa`
Likes of `karpathy`	`likes-karpathy`
List `123`	`list-123`
Bookmarks	`bookmarks`

The tree

A profile capture of karpathy looks like this:

$HOME/data/tori/x/karpathy/
├── tweets/                  # canonical records, the source of truth
│   ├── 1745...json          # tweets/<id>.json, one per tweet
│   └── 1745...raw.json      # the untouched upstream payload, beside it
├── html/                    # rendered inert per-tweet pages
│   └── 1745...html
├── threads/                 # reconstructed conversations
│   ├── 1740...html
│   └── 1740...md
├── md/                      # rendered per-tweet Markdown
│   └── 1745...md
├── media/                   # localised media, bucketed by type
│   ├── photo/
│   ├── video/
│   ├── gif/
│   ├── avatar/
│   └── banner/
├── _assets/
│   └── tori.css             # the one stylesheet the HTML views share
├── index.html               # the browsable archive home
├── README.md                # the Markdown index
├── profile.json             # the captured profile
└── manifest.json            # the repository index

Key points:

JSON is the source of truth. Each tweet is tweets/<id>.json, written the instant it arrives. The id is a snowflake string used verbatim, so the path is a pure function of the id and a re-capture overwrites the same file. A .raw.json sits beside it with the untouched upstream payload.
Views are derived. html/, md/, threads/, index.html, and README.md are all rebuilt from the JSON by the renderer. Delete them and tori render <repo> recreates them with no network.
Media is localised and deduped. Files go under media/<type>/, named by the media key plus a short hash of the source URL. Two renditions never collide, and one photo shared across many tweets resolves to a single file.
A standalone tweet versus a thread. A tweet with no surrounding conversation is rendered as a single page (html/<id>.html, md/<id>.md); a multi-tweet conversation is rendered as one page under threads/.

The manifest

manifest.json is the first file tori info, tori add, and tori render read. Its record-bearing fields are sorted so a re-capture of the same content writes a byte-identical manifest; the only wall-clock values live in the capture entries.

Field	Meaning
`service`	The source service, always `x`
`target`	What the repo archives: `kind`, `ref`, optional `user_id` and `query`
`tiers_used`	The access tiers that served records (`syndication`, `guest`, `session`)
`tweets`	Total records held
`media`	Count of media items localised (`status` == `local`)
`threads`	Number of reconstructed conversations
`range`	The `oldest` and `newest` captured tweet timestamps
`captures`	One entry per run: `at` (the stamp), `added`, and `tier`
`media_index`	Every media item with its `key`, `type`, `path`, `source`, and `status`
`tori_version`	The tori version that wrote the repo
`schema`	The on-disk layout version, for future migration

Each media item's status is one of local (on disk), unavailable (could not be fetched), stream-only (needs an external --tool like yt-dlp), or skipped. The index is the archive being honest about exactly what is and is not localised.