Standards and Conventions¶

This document defines the technical standards for tools to interoperate within this automation stack.

Taxonomy¶

The knowledge base uses a stable set of top-level categories. Do not create new top-level sections unless strictly necessary.

Category	Location	What belongs here
AI & Knowledge	`tools/ai_knowledge/`	General AI tools, knowledge management, LLM products
Frameworks	`tools/frameworks/`	Libraries for building LLM apps (LangChain, LlamaIndex, etc.)
Providers	`tools/providers/`	Companies offering LLM APIs or managed AI services
Agents	`tools/agents/`	Agent frameworks and autonomous AI tools
Orchestration	`tools/orchestration/`	Workflow automation, multi-agent routing, pipeline tools
Infrastructure	`tools/infrastructure/`	Inference engines, vector DBs, serving stacks, quantisation
Benchmarking	`tools/benchmarking/`	Eval frameworks, benchmarks, leaderboards
Development & Ops	`tools/development_ops/`	AI-assisted coding tools and IDEs
Patterns	`knowledge_base/patterns/`	Recurring design patterns (RAG, tool calling, routing, etc.)
Playbooks	`playbooks/`	Step-by-step workflow guides

Deduplication Rules¶

One canonical page per tool/framework/provider. All other mentions must link to that canonical page.
Before creating a new page, search the repo for the tool name, URL, and common aliases.
If a source maps to an existing page, update that page rather than creating a new one.
Merge duplicates rather than creating parallel pages.

Source Classification Tags¶

Items in new-sources.md use these tags: tool · framework · provider · paper/article · tutorial/guide · benchmark/eval · infrastructure · analysis

Naming Conventions¶

Tags (Paperless): kebab-case. Lowercase only. Prefix status tags with s: and category tags with c:.
Workflows (n8n): [Trigger Source] -> [Primary Action]. Example: IMAP -> Paperless Intake.
Prompts: Versioned using SemVer. Store as Markdown files in reference-implementations/llm-prompts/.

Document Lifecycle States¶

Ingested: Raw file received by the system.
OCRed: Searchable layer added (via OCRmyPDF).
Classified: Assigned a document type and category tags.
Actioned: Any extracted tasks/events have been synced to external systems.
Archived: Document is moved to long-term storage or its final tag state.

Minimal Metadata Schema¶

Every document processed by AI should attempt to populate: - extraction_date: ISO8601 of when AI ran. - source_origin: Email, Scan, Webhook. - action_required: Boolean. - due_date: If applicable. - confidence_score: 0.0 to 1.0 (from LLM).

Interoperability¶

Data Format: All cross-tool communication should prefer JSON.
Dates: Always use ISO8601 with UTC offsets.
IDs: Use the internal ID of the source system (e.g. Paperless document_id) in the metadata of the destination system (e.g. GCal event description).

What "Done" Means¶

An automated flow is considered "done" when: 1. The primary action is completed (Event created/Task synced). 2. The source document is updated with a processed or actioned tag. 3. No errors were logged in the orchestration engine (n8n). 4. If a critical failure occurred, a notification was sent to a human review channel.

AI-Authored Documentation Metadata (Required)¶

For AI-authored updates to knowledge pages (docs/tools/, docs/services/, docs/knowledge_base/, docs/architecture/, docs/playbooks/, and docs/reference-implementations/), include:

Last reviewed: ISO date (YYYY-MM-DD)
Confidence: high, medium, or low
Sources / References: at least one URL

Recommended section format:

## Sources / References
- [Official docs](https://example.com)

## Contribution Metadata
- Last reviewed: 2026-02-25
- Confidence: medium

These requirements are enforced by scripts/check_docs_contract.py in pull-request CI.