Triumvirate Schema & Corpus Adapter Interface — Design Document v1¶

Purpose: Define the universal input contract (Triumvirate) and the adapter interface that translates any regulated domain's corpus into that contract. This is the architectural boundary between client-specific content and the engine.

Predecessor: engine-blueprint-v1.md (codebase audit, 2026-03-09)

Frame of reference: The Gate whitepaper. UCCA's engine is a deterministic processing, verification, and production engine. It does not hold client IP. It does not claim domain expertise. Every regulated domain operates under the same structural pattern: three interconnected legislative instruments. Those three instruments — the Triumvirate — are the engine's universal input specification.

Date: 2026-03-09 Status: Design document — decisions and rationale. No code.

Table of Contents¶

Architectural Context
The Triumvirate Schema
2.1 Design Principles
2.2 Instrument 1: Outcome Specification
2.3 Instrument 2: Compliance Ruleset
2.4 Instrument 3: Credential Map
2.5 Triumvirate Envelope
2.6 Cross-Referencing Between Instruments
2.7 Schema Versioning
The Corpus Adapter Interface
3.1 Design Principles
3.2 Interface Methods
3.3 Validation Contract
3.4 Error Model
3.5 Audit Log Specification
3.6 Adapter Registration & Lifecycle
TGA as First Adapter — Mapping Proof
Request Flow in Detail
Decisions & Rationale
Open Questions

1. Architectural Context¶

The full request flow, as specified:

External call
     │
     ▼
Cloudflare edge
  • IP allowlist (per-client)
  • Time window enforcement
  • Rate limiting
  • Zero Trust auth (Cloudflare Access)
     │
     ▼
api.[client].ucca.online
  • Per-client isolated subdomain
  • Terraform-managed DNS + Worker
  • Client-specific API key / mTLS
     │
     ▼
Corpus Adapter (per-client instance)
  • Translation: raw corpus → Triumvirate
  • Validation: structural + semantic
  • Malformation gate: nothing invalid passes
  • Audit log: every submission recorded
     │
     ▼
Engine
  • Receives ONLY validated Triumvirate
  • Never sees raw client corpus
  • Deterministic processing
  • Returns CCO
     │
     ▼
CCO returned to client

Three isolation boundaries exist:

Network boundary — Cloudflare edge. Client never touches the engine directly.
Data boundary — the adapter. Client corpus format stays in adapter scope. Engine only sees Triumvirate.
World boundary — per-client subdomain, database, audit namespace. Inherited from the surfaces architecture.

The Triumvirate is the contract between boundary 2 and boundary 3. If the adapter produces a valid Triumvirate, the engine will process it. If it doesn't, the engine refuses. There is no partial acceptance, no best-effort parsing, no "we'll figure it out."

2. The Triumvirate Schema¶

2.1 Design Principles¶

P1: Legislative, not educational. The Triumvirate describes legislative instruments — formal specifications that define what is required in a regulated domain. The engine is not a course builder. Courses are one output format. The Triumvirate must be agnostic to output format.

P2: Declarative, not procedural. The Triumvirate says what must be true, not how to make it true. "Practitioners must demonstrate safe patient handling" is an outcome specification. "Show a video about patient handling" is an output decision the engine makes downstream.

P3: Observable and assessable. Every outcome, rule, and credential boundary must be expressed in terms that can be observed, measured, or verified. If it can't be assessed, it can't be certified. Vague aspirational statements are not valid Triumvirate content.

P4: Structurally complete. A valid Triumvirate must contain all three instruments. An Outcome Specification without a Compliance Ruleset is a wish list. A Compliance Ruleset without a Credential Map is unenforceable. Partial Triumvirates are rejected at the adapter gate.

P5: Source-annotated. Every element in the Triumvirate carries a provenance reference back to the source legislative instrument, clause, and version. The Triumvirate is a structured representation of a source document — not a replacement for it.

P6: Immutable per version. A Triumvirate version, once accepted by the engine, is frozen. Changes to the source legislation produce a new Triumvirate version, not an in-place mutation. The engine processes against a specific version and the CCO records which version it was processed against.

2.2 Instrument 1: Outcome Specification¶

What it answers: What does competent performance look like in this domain?

This is the instrument the current engine already partially models. In TGA, it maps to units of competency with elements and performance criteria. In other domains, it maps to whatever formal instrument defines observable outcomes — standards of practice, competency frameworks, operational specifications.

Structure:

outcome_specification:
  instrument_ref:
    source_system        # e.g. "TGA", "FAA", "GMC", "OSHA"
    instrument_id        # e.g. "CHCCOM005", "14 CFR Part 61"
    instrument_title
    version              # e.g. "R2", "Amendment 5"
    effective_date       # ISO 8601
    source_uri           # canonical URL or document reference
    jurisdiction         # e.g. "AU", "US", "UK", "international"

  domain:
    domain_code          # UCCA domain code (mapped by adapter)
    domain_name
    operational_context   # prose: what this domain covers

  outcomes[]:            # ordered list
    outcome_id           # unique within this instrument (e.g. "1", "E1")
    title                # human-readable
    description          # optional: fuller explanation
    source_ref           # clause/section in source document

    criteria[]:          # what observable evidence satisfies this outcome
      criterion_id       # unique within outcome (e.g. "1.1")
      description        # the assessable statement
      source_ref         # clause/section in source document
      evidence_type      # "performance" | "knowledge" | "product"

  evidence_requirements:
    knowledge_evidence[]   # what the practitioner must know
      statement
      source_ref
    performance_evidence[] # what the practitioner must demonstrate
      statement
      source_ref
    conditions[]           # conditions under which evidence must be gathered
      statement
      source_ref

Design decisions:

outcomes[] replaces the TGA-specific elements[]. "Outcome" is the universal term — what must be achieved. TGA calls them "elements," FAA calls them "areas of operation," OSHA calls them "performance requirements." The adapter translates.
criteria[] replaces performance_criteria[]. Broadened to include knowledge and product evidence types, not just performance observation.
evidence_requirements is a top-level section (not embedded per-outcome) because many frameworks define evidence holistically across all outcomes. TGA has knowledge_evidence and performance_evidence at the unit level, not per-element.
source_ref on every node. The engine must be able to trace any Triumvirate element back to the exact clause in the source legislation. This is non-negotiable for CCO provenance.
jurisdiction is explicit. The same domain (e.g., healthcare) operates under different legislative frameworks in different jurisdictions. The Triumvirate must declare which jurisdiction's instrument it represents.

2.3 Instrument 2: Compliance Ruleset¶

What it answers: What must operators do to remain compliant within this domain?

This is the instrument the current engine does NOT model at all. In Australian VET, it maps to the Standards for RTOs (SRTOs 2025) — the regulatory framework that tells training providers what they must do. In other domains: aviation has the FARs, healthcare has accreditation standards, nuclear has regulatory requirements.

The Compliance Ruleset is not about practitioner competence (that's Instrument 1). It's about operator obligations — the rules imposed on the entity that deploys or manages the practitioners.

Structure:

compliance_ruleset:
  instrument_ref:
    source_system
    instrument_id        # e.g. "SRTOs 2025", "14 CFR Part 121"
    instrument_title
    version
    effective_date
    source_uri
    jurisdiction
    regulator            # the authority that enforces this ruleset

  rules[]:               # ordered by clause hierarchy
    rule_id              # unique within instrument
    clause_ref           # e.g. "Clause 1.1", "§121.135"
    title
    requirement_text     # the actual regulatory requirement (verbatim or structured)
    source_ref

    obligation_type      # "mandatory" | "conditional" | "guidance"

    applies_to[]         # what this rule constrains
      target_type        # "operator" | "practitioner" | "assessor" | "output"
      scope              # optional: further scoping (e.g. "initial assessment only")

    evidence_of_compliance:
      description        # what constitutes evidence this rule is met
      evidence_type      # "documentation" | "system" | "audit" | "observation"
      retention_period   # how long evidence must be kept (ISO 8601 duration or null)

    cross_references[]:
      target_instrument  # "outcome_specification" | "credential_map"
      target_id          # the outcome_id or credential_id this rule relates to
      relationship       # "constrains" | "requires" | "modifies"

Design decisions:

obligation_type distinguishes between things you MUST do, things you must do IF a condition applies, and things that are recommended practice. This matters for CCO construction — mandatory rules become hard gates, guidance becomes advisory content.
applies_to[] identifies who the rule constrains. A rule that applies to the "operator" (e.g., "RTOs must maintain records") is different from one that applies to the "practitioner" (e.g., "assessors must hold the unit they assess"). The engine needs this to route rules correctly.
evidence_of_compliance defines what proof looks like. This directly feeds into CCO assessment criteria.
cross_references[] links rules to outcomes and credentials. A compliance rule that says "assessment must cover all elements" links to specific outcomes in Instrument 1. These links are explicit, not inferred.
The rules are stored at clause-level granularity, not as prose blocks. The adapter is responsible for parsing the source legislation into clause-level rules. This is the hardest part of adapter construction and where the domain expertise is.

2.4 Instrument 3: Credential Map¶

What it answers: Who is authorised to do what, and under what conditions?

This instrument defines the credential architecture of the domain — what qualifications exist, what they certify the holder to do, and what conditions apply to their issuance, maintenance, and revocation.

In Australian VET, this maps to the AQF levels, qualification packaging rules, and the relationship between units of competency and formal qualifications. In aviation, it maps to pilot certificates and type ratings. In healthcare, it maps to licences and scopes of practice.

Structure:

credential_map:
  instrument_ref:
    source_system
    instrument_id        # e.g. "AQF", "14 CFR Part 61"
    instrument_title
    version
    effective_date
    source_uri
    jurisdiction

  credentials[]:
    credential_id        # unique within instrument
    title                # e.g. "Certificate III in Individual Support"
    credential_type      # "qualification" | "licence" | "endorsement" |
                         # "unit_credential" | "skill_set" | "rating"
    level                # framework level if applicable (e.g. AQF 3, NQF 5)
    source_ref

    scope_of_practice:
      description        # what the holder is authorised to do
      limitations[]      # explicit restrictions
      conditions[]       # conditions that must be maintained

    requirements:
      required_outcomes[] # outcome_ids from Instrument 1 that must be achieved
        outcome_ref      # references outcome_specification.outcomes[].outcome_id
        requirement_type # "core" | "elective" | "prerequisite"

      required_credentials[] # other credentials required as prerequisites
        credential_ref   # references another credential_id
        relationship     # "prerequisite" | "corequisite" | "supersedes"

      compliance_rules[] # compliance rule_ids from Instrument 2 that govern issuance
        rule_ref         # references compliance_ruleset.rules[].rule_id
        applicability    # "issuance" | "maintenance" | "renewal"

    lifecycle:
      validity_period    # ISO 8601 duration, or null if perpetual
      renewal_requirements[]
        description
        source_ref
      revocation_conditions[]
        description
        source_ref

    packaging_rules:     # how outcomes combine into this credential
      minimum_total      # minimum number of outcomes required
      core_required      # number of core outcomes required
      elective_required  # number of elective outcomes required
      grouping_rules[]   # domain-specific packaging constraints
        description
        source_ref

Design decisions:

credential_type is an enumerated field, not free text. The types listed cover the major categories across all regulatory frameworks we've examined. New types can be added to the enum as new domains are onboarded.
scope_of_practice is critical for CCO construction. The CCO's operational domain definition comes directly from this — what the credential holder is authorised to do.
requirements.required_outcomes[] creates the formal link between credentials and outcomes. In TGA, this is the packaging rules for qualifications (X core units + Y elective units). The adapter maps this explicitly.
lifecycle captures validity, renewal, and revocation — the three temporal dimensions from the whitepaper. Every CCO inherits its lifecycle constraints from the Credential Map.
packaging_rules accommodates the diversity of credential construction rules across frameworks. TGA has specific core/elective/group packaging. Aviation has category/class/type hierarchy. Healthcare has progressive scope models. The structure is flexible enough for all of these.

2.5 Triumvirate Envelope¶

The three instruments are submitted together in an envelope that establishes identity, version, and provenance:

triumvirate:
  schema_version         # "1.0" — the Triumvirate schema version
  triumvirate_id               # UUID, assigned by the adapter
  created_at             # ISO 8601 UTC timestamp
  adapter_id             # which adapter produced this Triumvirate
  adapter_version        # adapter software version
  client_id              # the client/world this Triumvirate belongs to

  source_corpus:
    corpus_id            # adapter-assigned identifier for the source material
    corpus_title         # human-readable name
    corpus_version       # version of the source legislation
    corpus_hash          # SHA-256 of the source material as received
    received_at          # when the adapter received the corpus

  instruments:
    outcome_specification: { ... }   # Instrument 1
    compliance_ruleset: { ... }      # Instrument 2
    credential_map: { ... }          # Instrument 3

  validation:
    structural_valid     # boolean — adapter's structural validation passed
    semantic_valid       # boolean — adapter's semantic validation passed
    validation_timestamp # when validation was performed
    validation_errors[]  # empty if valid; populated if rejected
    validation_warnings[] # non-blocking observations

  provenance:
    adapter_signature    # adapter's cryptographic signature over this envelope
    submission_id        # unique ID for this submission attempt

Design decisions:

corpus_hash — the adapter hashes the raw corpus material before translation. This creates an immutable reference point. If the same corpus is submitted again, the hash proves identity. If the corpus changes, the hash proves it's different.
validation fields are populated by the adapter BEFORE submission to the engine. The engine performs its own validation, but the adapter's self-assessment is part of the audit trail.
adapter_signature — the adapter signs the envelope. This is the first half of the dual-signature model from the whitepaper. The adapter (as processor) warrants that the Triumvirate was constructed according to standard methodology. The domain expert's co-signature comes later in the CCO issuance flow.

2.6 Cross-Referencing Between Instruments¶

The three instruments are not independent — they form a connected graph:

Outcome Specification          Compliance Ruleset          Credential Map
─────────────────────         ──────────────────          ───────────────
outcomes[].outcome_id    ←──  rules[].cross_references
                              (target: outcome_specification)

outcomes[].outcome_id    ←──────────────────────────────  credentials[].requirements
                                                          .required_outcomes[].outcome_ref

                              rules[].rule_id        ←──  credentials[].requirements
                                                          .compliance_rules[].rule_ref

Integrity constraints enforced at validation:

Every outcome_ref in Credential Map must resolve to an existing outcome_id in Outcome Specification.
Every rule_ref in Credential Map must resolve to an existing rule_id in Compliance Ruleset.
Every cross_references[].target_id in Compliance Ruleset must resolve to an existing ID in the referenced instrument.
No dangling references. No circular dependencies between credentials (prerequisite chains must be acyclic).

If any integrity constraint fails, the Triumvirate is rejected. The adapter is responsible for ensuring referential integrity before submission.

2.7 Schema Versioning¶

The Triumvirate schema is versioned independently of the engine, adapters, and client corpora.

Schema version 1.0 is the initial release.
Minor versions (1.1, 1.2) add optional fields or new enum values. Existing Triumvirates remain valid.
Major versions (2.0) change required fields or structural shape. Existing Triumvirates may need re-translation.
The engine declares which schema versions it accepts. An adapter producing a schema version the engine doesn't support is rejected at the gate.
Schema version negotiation is explicit — there is no implicit upgrade or downgrade.

3. The Corpus Adapter Interface¶

3.1 Design Principles¶

A1: The adapter is the liability boundary. The adapter translates client corpus material into Triumvirate format. The adapter warrants that the translation is structurally and semantically faithful to the source. If the adapter produces garbage, the CCO is garbage. The adapter's quality IS the product quality for that client.

A2: The adapter never touches the engine directly. The adapter produces a Triumvirate. The engine consumes a Triumvirate. They communicate through the schema, not through shared memory, shared state, or direct function calls. In production, they run in different processes, potentially on different infrastructure.

A3: One adapter per corpus type, one instance per client. The TGA adapter knows how to translate Australian VET training packages. The FAA adapter would know how to translate US aviation regulations. Each client gets their own adapter instance with their own configuration, audit namespace, and state. Two different RTOs both use the TGA adapter, but they are separate instances with separate audit trails.

A4: Adapters are deterministic. Given the same corpus input, the same adapter version must produce the same Triumvirate output. If the adapter uses LLM extraction (as the current TGA pipeline does for PDF parsing), the extracted result is cached and the cache is the deterministic reference point — not the LLM call.

A5: Adapters fail loudly. If the adapter cannot translate a corpus element, it does not guess, skip, or substitute. It records the failure, includes it in the validation errors, and refuses to produce a Triumvirate. Partial translation is not a valid outcome.

A6: The adapter owns the corpus format. The engine never needs to understand TGA unit codes, AQF levels, FAA FARs, or any domain-specific format. That knowledge lives entirely within the adapter. If a new corpus format needs support, a new adapter is written — the engine doesn't change.

3.2 Interface Methods¶

The Corpus Adapter interface defines the following abstract methods that every conformant adapter must implement:

`identify(corpus_payload) → CorpusIdentity`¶

Examines raw corpus material and returns identification metadata without performing full translation. Used for deduplication, version detection, and pre-flight validation.

Returns: - corpus_id — adapter-assigned identifier - corpus_type — what kind of corpus this is (e.g. "tga_unit", "tga_qualification", "far_part") - corpus_version — version of the source material - corpus_hash — SHA-256 of the raw payload - source_system — which regulatory system (e.g. "TGA", "FAA") - jurisdiction — which jurisdiction - is_supported — boolean: can this adapter translate this corpus type? - requires_enrichment — boolean: does the adapter need to fetch additional data?

Rationale: Identification is cheap. Before committing to full translation (which may involve network calls, LLM extraction, or expensive parsing), the adapter should be able to answer: "Do I recognise this? Can I handle it? Have I seen it before?"

`ingest(corpus_payload, options) → IngestionResult`¶

Performs the raw corpus acquisition phase. Downloads, fetches, extracts, or otherwise obtains the full corpus material needed for translation. This is where network calls to upstream sources (training.gov.au, FAA databases, etc.) happen.

Parameters: - corpus_payload — the raw input (could be a code, a URL, a document, a file path) - options — adapter-specific configuration: - refresh — force re-acquisition even if cached - skip_network — use only local/cached data - enrichment_level — how much supplementary data to acquire

Returns: - raw_corpus — the complete raw corpus material - supplementary_data — any additional data acquired (prerequisites, packaging rules, etc.) - acquisition_log — timestamped log of what was fetched and from where - cache_status — "hit" | "miss" | "stale" | "refreshed"

Rationale: Ingestion is separated from translation because they have different failure modes, different caching strategies, and different cost profiles. Ingestion might fail due to network issues, upstream rate limits, or missing source material. Translation fails due to structural problems in the corpus. Keeping them separate means a network failure during ingestion doesn't corrupt a partially-translated Triumvirate.

`translate(raw_corpus, supplementary_data) → TranslationResult`¶

The core method. Translates raw corpus material into Triumvirate format.

Parameters: - raw_corpus — from ingest() - supplementary_data — from ingest()

Returns: - triumvirate — the complete Triumvirate envelope (or null if translation failed) - translation_log — detailed log of every mapping decision - unmapped_elements[] — source elements the adapter could not map (with reasons) - confidence — adapter's self-assessment: "full" | "partial" | "degraded" - warnings[] — non-fatal observations

Rationale: Translation is deterministic. The same inputs produce the same outputs. The translation_log is the audit trail that proves HOW the adapter arrived at each Triumvirate element — which source clause mapped to which outcome, which regulatory requirement mapped to which rule. This log is retained permanently and is part of the CCO's provenance chain.

`validate(triumvirate) → ValidationResult`¶

Validates a complete Triumvirate against the schema and the adapter's domain-specific semantic rules.

Two-phase validation:

Phase 1 — Structural validation (schema-driven): - All required fields present - All data types correct - All enums contain valid values - All arrays non-empty where required - Triumvirate envelope metadata complete

Phase 2 — Semantic validation (adapter-specific): - All cross-references resolve - No dangling outcome references in Credential Map - No dangling rule references in Credential Map - No circular credential prerequisites - Domain-specific coherence checks (e.g., TGA: every element has at least one performance criterion; packaging rules add up)

Returns: - structural_valid — boolean - semantic_valid — boolean - errors[] — blocking issues (any error = rejection) - warnings[] — non-blocking observations - coverage_report — summary of what percentage of the source corpus was mapped

Rationale: Validation is the gate. If validation fails, the Triumvirate does not reach the engine. Period. The adapter calls validate() on its own output before submitting. The engine calls its own schema validation on receipt. Both must pass. This is defense in depth — the adapter validates for correctness, the engine validates for conformance.

`audit_entry(event) → void`¶

Records an audit event to the adapter's isolated audit log.

Event types: - corpus_received — raw corpus arrived - ingestion_started / ingestion_completed / ingestion_failed - translation_started / translation_completed / translation_failed - validation_started / validation_passed / validation_failed - triumvirate_submitted — Triumvirate sent to engine - triumvirate_accepted / triumvirate_rejected — engine's response

Rationale: Every adapter interaction is logged. The audit log is per-client, per-adapter-instance. It never mixes with other clients' logs. It is append-only. It is the evidence trail for the processor's liability claim: "We followed the standard methodology, here is the record."

`version() → AdapterVersionInfo`¶

Returns the adapter's identity and capability declaration.

Returns: - adapter_id — unique identifier for this adapter type - adapter_version — semantic version - source_system — which regulatory system this adapter handles - supported_corpus_types[] — what corpus formats it can translate - triumvirate_schema_version — which Triumvirate schema version it produces - capabilities — feature flags (e.g., supports_enrichment, supports_incremental_update)

3.3 Validation Contract¶

The validation contract is strict and non-negotiable:

The adapter MUST call validate() on every Triumvirate it produces. An unvalidated Triumvirate cannot be submitted.
The engine MUST perform its own independent schema validation. The engine does not trust the adapter's validation result — it verifies independently.
Any structural error is a hard rejection. No partial acceptance.
Semantic errors are hard rejections for the adapter's validate(). The engine only performs structural validation — it has no domain knowledge to perform semantic checks.
Warnings are recorded but do not block submission. A warning might be: "Source legislation has an unusual structure — manual review recommended."
Validation results are immutable and part of the Triumvirate envelope. Once validation is performed, the results are stamped into the envelope and become part of the audit trail.

3.4 Error Model¶

Adapter errors are structured, not free-text:

AdapterError:
  error_code           # machine-readable (e.g. "TRIUMVIRATE_MISSING_OUTCOMES")
  severity             # "fatal" | "error" | "warning" | "info"
  phase                # "ingestion" | "translation" | "validation"
  source_element       # what in the source corpus caused the issue (if identifiable)
  triumvirate_element        # what in the Triumvirate is affected (if applicable)
  message              # human-readable description
  remediation          # what the client or adapter maintainer can do to fix it
  timestamp            # ISO 8601 UTC

Error code taxonomy:

Prefix	Phase	Examples
`ING_`	Ingestion	`ING_NETWORK_FAILURE`, `ING_UPSTREAM_404`, `ING_CACHE_CORRUPT`
`TRN_`	Translation	`TRN_UNMAPPABLE_ELEMENT`, `TRN_AMBIGUOUS_CRITERION`, `TRN_MISSING_EVIDENCE`
`VAL_`	Validation	`VAL_MISSING_REQUIRED_FIELD`, `VAL_DANGLING_REFERENCE`, `VAL_CIRCULAR_DEPENDENCY`
`TRD_`	Triumvirate-level	`TRD_INCOMPLETE_INSTRUMENTS`, `TRD_SCHEMA_VERSION_UNSUPPORTED`

3.5 Audit Log Specification¶

Each adapter instance maintains an append-only audit log in its isolated namespace.

Log format: JSONL (one JSON object per line), matching the existing cognitive cost ledger pattern.

Required fields per entry:

{
  "ts_utc": "2026-03-09T14:30:00.000Z",
  "adapter_id": "tga-v1",
  "client_id": "rtopacks",
  "submission_id": "uuid",
  "event_type": "translation_completed",
  "corpus_id": "CHCCOM005_R2",
  "corpus_hash": "sha256:...",
  "triumvirate_id": "uuid",
  "duration_ms": 1234,
  "outcome": "success",
  "detail": { ... }     // event-specific payload
}

Retention: Audit logs are retained indefinitely. They are part of the processor's liability evidence and the CCO's provenance chain. They belong to the world — if a client disconnects, their audit logs are exportable as part of their world data.

Storage: Per-world database (following the existing two-tier data isolation architecture). Adapter audit data goes to the world's database, not to ops-db. Platform telemetry about adapter performance (error rates, latency) goes to ops-db.

3.6 Adapter Registration & Lifecycle¶

Adapters are registered with the engine before they can submit Triumvirates.

Registration declares: - Adapter identity (id, version, source system) - Supported corpus types - Triumvirate schema version produced - Client binding (which client/world this instance serves) - Validation capabilities - Signing key (for adapter signature on Triumvirate envelope)

Lifecycle states: - registered — declared but not yet active - active — accepting corpus submissions and producing Triumvirates - suspended — temporarily disabled (e.g., client payment lapse) - deprecated — being replaced by a newer adapter version - revoked — permanently disabled (e.g., quality failure)

When a client disconnects, their adapter instance is suspended and eventually revoked. The audit trail persists.

4. TGA as First Adapter — Mapping Proof¶

The TGA adapter is the first concrete implementation. This section proves the Triumvirate schema accommodates everything TGA currently provides, and identifies what TGA doesn't provide (which the Triumvirate still requires).

Instrument 1: Outcome Specification — TGA Mapping¶

Triumvirate Field	TGA Source	Notes
`instrument_ref.source_system`	`"TGA"`	Hardcoded per adapter
`instrument_ref.instrument_id`	Unit code (e.g. `"CHCCOM005"`)
`instrument_ref.instrument_title`	Unit title
`instrument_ref.version`	Release (e.g. `"R2"`)
`instrument_ref.effective_date`	`release_date` from TGA data
`instrument_ref.source_uri`	`training.gov.au/Training/Details/{code}`
`instrument_ref.jurisdiction`	`"AU"`	All TGA is Australian
`domain.domain_code`	Resolved via `training_package_domain_map`	Existing logic, reusable
`outcomes[]`	`elements[]`	Direct mapping: element → outcome
`outcomes[].outcome_id`	`element_number`
`outcomes[].title`	Element title
`outcomes[].criteria[]`	`performance_criteria[]`
`outcomes[].criteria[].criterion_id`	`number` (e.g. `"1.1"`)
`outcomes[].criteria[].description`	Criterion description
`outcomes[].criteria[].evidence_type`	`"performance"`	TGA criteria are all performance-based
`evidence_requirements.knowledge_evidence`	`knowledge_evidence[]`	Direct mapping
`evidence_requirements.performance_evidence`	`performance_evidence[]`	Direct mapping

Verdict: Instrument 1 maps cleanly. Every field in the current TGA JSON has a home. The application field from TGA maps to domain.operational_context.

Instrument 2: Compliance Ruleset — TGA Mapping¶

Triumvirate Field	TGA Source	Notes
`instrument_ref.instrument_id`	`"SRTOs 2025"`	Standards for RTOs
Rules at clause level	SRTOs clauses	NOT YET INGESTED

Verdict: The TGA adapter needs to ingest the Standards for RTOs (SRTOs 2025) at clause level. This is already on the UCCA roadmap as a near-term priority ("Ingest the three VET legislative instruments at clause level"). The adapter framework gives this work a clear destination. Until the SRTOs are ingested, the TGA adapter can produce a minimal Compliance Ruleset that declares the governing instrument without clause-level rules — but the engine should flag this as a warning ("Compliance Ruleset contains instrument reference only, no clause-level rules").

Instrument 3: Credential Map — TGA Mapping¶

Triumvirate Field	TGA Source	Notes
`instrument_ref.instrument_id`	`"AQF"`	Australian Qualifications Framework
`credentials[]`	Qualifications (e.g. `"BSB50420"`)	From training.gov.au
`credential_type`	`"qualification"` or `"skill_set"`	TGA has both
`level`	AQF level (1-10)
`requirements.required_outcomes[]`	Packaging rules (core/elective units)	PARTIALLY MODELED
`lifecycle.validity_period`	Generally perpetual for TGA qualifications
`packaging_rules`	TGA qualification packaging (X core + Y elective from groups)	PARTIALLY MODELED

Verdict: The qualification/credential layer is partially modeled in the current engine (build_qualification_run_v0.py, seed_qualification_stub_v0.py exist). The TGA adapter needs to formalize this into Credential Map format. The AQF level mapping already exists in ucca_domains.py and can be reused.

What TGA Doesn't Provide (and What the Adapter Does About It)¶

Missing from TGA data	Adapter Strategy
Clause-level compliance rules	Phase 2: SRTOs ingestion (already roadmapped)
Evidence retention periods	Default to TGA regulatory requirement (7 years)
Credential revocation conditions	Extract from AQF/ASQA documentation
Scope of practice descriptions	Derive from unit application text + qualification descriptor
Cross-references between instruments	Build from packaging rules + SRTOs references to units

5. Request Flow in Detail¶

Expanding the high-level flow into concrete adapter operations:

1. CLIENT CALL
   POST https://api.rtopacks.ucca.online/v1/corpus/submit
   Headers: Authorization: Bearer <client-api-key>
   Body: { corpus_type: "tga_unit", payload: { code: "CHCCOM005" } }

2. CLOUDFLARE EDGE
   ✓ IP allowlist check
   ✓ Rate limit check
   ✓ Time window check
   ✓ Zero Trust auth (Cloudflare Access JWT validation)
   → Forward to Worker

3. API WORKER (api.rtopacks.ucca.online)
   Receives request
   Validates API key → resolves client_id
   Loads adapter instance for this client
   Calls adapter.identify(payload)
   ├── If not supported → 400 response with error
   └── If supported → continues

4. ADAPTER: INGEST
   adapter.ingest(payload, options)
   ├── Check cache: tga_data/CHCCOM005.json exists? → cache hit
   ├── If miss: download PDF from training.gov.au
   ├── If miss: LLM extraction → structured JSON
   ├── Fetch supplementary: prerequisites, packaging rules
   ├── Record audit_entry(ingestion_completed)
   └── Return raw_corpus + supplementary_data

5. ADAPTER: TRANSLATE
   adapter.translate(raw_corpus, supplementary_data)
   ├── Map TGA elements → Triumvirate outcomes
   ├── Map TGA performance_criteria → Triumvirate criteria
   ├── Map TGA knowledge/performance_evidence → Triumvirate evidence_requirements
   ├── Resolve domain via training_package_domain_map
   ├── Build Instrument 1 (Outcome Specification)
   ├── Build Instrument 2 (Compliance Ruleset) — currently minimal
   ├── Build Instrument 3 (Credential Map) — from qualification data
   ├── Assemble Triumvirate envelope
   ├── Compute corpus_hash
   ├── Record audit_entry(translation_completed)
   └── Return triumvirate + translation_log

6. ADAPTER: VALIDATE
   adapter.validate(triumvirate)
   ├── Phase 1: structural validation against schema
   ├── Phase 2: semantic validation (cross-references, completeness)
   ├── Stamp validation results into Triumvirate envelope
   ├── Sign Triumvirate envelope with adapter key
   ├── Record audit_entry(validation_passed)
   └── If failed → Record audit_entry(validation_failed), return error

7. SUBMIT TO ENGINE
   engine.process(validated_triumvirate)
   ├── Engine performs independent schema validation
   ├── If rejected → adapter records audit_entry(triumvirate_rejected)
   ├── Engine processes against Triumvirate
   ├── Engine produces CCO
   └── Return CCO to adapter

8. RESPONSE TO CLIENT
   adapter records audit_entry(triumvirate_accepted)
   API Worker returns CCO to client
   200 OK with CCO payload

6. Decisions & Rationale¶

D1: Three instruments, not two or four.¶

Decision: The Triumvirate is exactly three instruments. Not "outcomes plus rules" (two). Not "outcomes plus rules plus credentials plus assessment" (four).

Rationale: Every regulated domain we've examined operates under exactly three interconnected specifications:

What good looks like (Outcome Specification)
What operators must do (Compliance Ruleset)
Who is authorised to do what (Credential Map)

Assessment is not a fourth instrument — it is an activity governed BY all three. The CCO itself IS the assessment framework. Adding assessment as a separate instrument would conflate the input (legislative specification) with the output (certification).

D2: The adapter translates, the engine processes. Never the reverse.¶

Decision: The engine never sees raw corpus material. The engine never needs to understand TGA codes, AQF levels, or any domain-specific format.

Rationale: This is the core architectural principle from the whitepaper: "The engine does not claim domain expertise — the domain owner knows their domain." If the engine understood TGA, it would need to understand FAA, GMC, OSHA, etc. That doesn't scale and it creates liability confusion. The adapter is where domain expertise lives. The engine is domain-agnostic.

D3: Validation is two-phase: structural (engine) + semantic (adapter).¶

Decision: The adapter validates semantics (do the cross-references resolve? are the packaging rules coherent?). The engine validates structure (is this a valid Triumvirate schema? are all required fields present?).

Rationale: The engine can't validate semantics because it has no domain knowledge. It doesn't know whether "3 core + 2 elective" is a valid packaging rule for TGA — only the TGA adapter knows that. But the engine CAN validate that the Triumvirate conforms to the schema, which it must do independently because it doesn't trust any external system.

D4: The Compliance Ruleset is a first-class instrument, not metadata.¶

Decision: Compliance rules are a full Triumvirate instrument with the same structural rigour as outcomes and credentials.

Rationale: This is the instrument the current engine doesn't model at all, and it's the one that matters most for CCO construction. The CCO's mandatory procedural checks, prohibition boundaries, and escalation triggers all come from the Compliance Ruleset. Without it, the CCO is just a competency description — which is what we have now, and which is insufficient. The SRTOs ingestion (already roadmapped) is the work that populates this instrument for TGA.

D5: Source references on every element, not just at the top.¶

Decision: Every outcome, criterion, rule, and credential carries a source_ref pointing to the exact clause in the source legislation.

Rationale: CCO provenance requires traceability to the source. "This CCO was processed against SRTOs 2025" is useful but insufficient. "This prohibition boundary was derived from SRTOs 2025 Clause 1.8(2)(a)" is what a regulator needs to see. Source references also protect UCCA's liability — we can prove exactly which legislative clause drove each decision in the CCO.

D6: Adapter instances are per-client, not shared.¶

Decision: Two RTOs that both use the TGA adapter get separate adapter instances with separate audit trails.

Rationale: Follows the existing world isolation architecture. When Client A disconnects, their adapter instance and audit trail are frozen as a complete unit. Client B's adapter is unaffected. No cross-contamination. This is the same principle as per-world databases — "each world's database must be independently freezable, exportable, and deletable."

D7: Triumvirate schema is versioned independently.¶

Decision: The Triumvirate schema version is separate from the engine version, adapter version, and client corpus version.

Rationale: These four things change at different rates. The schema changes rarely (it's a standard). Adapters change when corpus formats change. The engine changes when processing logic changes. Client corpora change when legislation is updated. Coupling their versions would create unnecessary release coordination.

D8: The adapter MAY use LLM extraction, but the result must be deterministic.¶

Decision: Adapters can use LLM-powered extraction (as the current TGA PDF processor does), but the extracted result must be cached and the cache is the deterministic reference point.

Rationale: Some corpus formats (TGA PDFs, scanned regulatory documents) are not machine-readable without extraction. LLM extraction is the practical solution. But LLM calls are non-deterministic — the same PDF might extract slightly differently on two calls. The solution: extract once, cache the result, use the cache as the canonical source. The cached extraction becomes the "raw corpus" that the adapter translates deterministically. The extraction itself is logged in the audit trail with the LLM model, call ID, and response hash.

7. Open Questions¶

These are questions that need answers before implementation begins. They do not block this design document — they block the implementation.

Q1: Minimum viable Compliance Ruleset for TGA¶

The SRTOs 2025 ingestion is roadmapped but not complete. What is the minimum Compliance Ruleset the TGA adapter can produce today? Options:

(a) Instrument reference only, no clause-level rules. Engine accepts with warning.
(b) A small set of universal TGA compliance rules (e.g., "assessment must cover all elements", "evidence must be gathered in accordance with assessment plan") hardcoded in the adapter.
(c) Block TGA adapter from producing Triumvirates until SRTOs ingestion is complete.

Recommendation: Option (a) initially, with (b) as a near-term improvement. Option (c) is too conservative — the Outcome Specification alone has value and the current engine proves it.

DECIDED: Option (a). Instrument reference only, no clause-level rules. Engine accepts with warning. Do not block on SRTOs ingestion. — Tim, 2026-03-09

Q2: Credential Map scope for unit-level processing¶

The current engine processes at unit level (one TGA unit → one course/module). Credentials (qualifications) are aggregations of units. Should the TGA adapter:

(a) Produce a Credential Map only when processing at qualification level.
(b) Always produce a minimal Credential Map that declares which credentials this unit contributes to.
(c) Make Credential Map optional for unit-level processing.

Recommendation: Option (b). The unit-to-credential relationship is valuable context for CCO construction even at unit level. The adapter should always declare "this outcome contributes to Credential X as a core/elective requirement."

Q3: How does the adapter handle superseded legislation?¶

When TGA supersedes a unit (e.g., CHCCOM004 → CHCCOM005), the adapter receives the new unit. What happens to:

Existing Triumvirates built from the old unit?
CCOs processed against the old Triumvirate?
Credentials that reference the old unit?

Recommendation: The old Triumvirate version persists (immutable). The adapter produces a new Triumvirate with the new unit, annotated with the supersession relationship. The engine can then produce a new CCO that supersedes the old one. CCO lifecycle management (expiry, renewal) handles the transition. This mirrors the existing supersedes / superseded_by fields in the SourceUnit model.

Q4: Adapter testing and certification¶

Before a new adapter (e.g., FAA adapter) can submit Triumvirates to the production engine, how is it validated?

Recommendation: Adapter certification suite — a set of canonical test corpora with known-correct Triumvirate outputs. The adapter must produce Triumvirates that match the expected output within structural equivalence. This is the adapter equivalent of the existing canary test suite for run bundles. Details deferred to implementation.

Q5: Enrichment boundary¶

The current TGA pipeline enriches data by scraping training.gov.au for prerequisites, companions, and supplementary data. How much enrichment is the adapter allowed to do?

Recommendation: The adapter may enrich ONLY from the same source system (TGA adapter enriches from training.gov.au). Cross-system enrichment (e.g., TGA adapter fetching data from ASQA) requires explicit configuration and audit logging. The principle: the adapter's domain is its source system. Going outside that domain is an escalation, not a default.

This document is the architectural specification for the Triumvirate schema and Corpus Adapter interface. Implementation follows after review.

Version History¶

Version	Date	Change	Author
1.0	2026-03-09	Initial creation	Claude Code

Triumvirate Schema & Corpus Adapter Interface — Design Document v1¶

Table of Contents¶

1. Architectural Context¶

2. The Triumvirate Schema¶

2.1 Design Principles¶

2.2 Instrument 1: Outcome Specification¶

2.3 Instrument 2: Compliance Ruleset¶

2.4 Instrument 3: Credential Map¶

2.5 Triumvirate Envelope¶

2.6 Cross-Referencing Between Instruments¶

2.7 Schema Versioning¶

3. The Corpus Adapter Interface¶

3.1 Design Principles¶

3.2 Interface Methods¶

identify(corpus_payload) → CorpusIdentity¶

ingest(corpus_payload, options) → IngestionResult¶

translate(raw_corpus, supplementary_data) → TranslationResult¶

validate(triumvirate) → ValidationResult¶

audit_entry(event) → void¶

version() → AdapterVersionInfo¶

3.3 Validation Contract¶

3.4 Error Model¶

3.5 Audit Log Specification¶

3.6 Adapter Registration & Lifecycle¶

4. TGA as First Adapter — Mapping Proof¶

Instrument 1: Outcome Specification — TGA Mapping¶

Instrument 2: Compliance Ruleset — TGA Mapping¶

Instrument 3: Credential Map — TGA Mapping¶

What TGA Doesn't Provide (and What the Adapter Does About It)¶

5. Request Flow in Detail¶

6. Decisions & Rationale¶

D1: Three instruments, not two or four.¶

D2: The adapter translates, the engine processes. Never the reverse.¶

D3: Validation is two-phase: structural (engine) + semantic (adapter).¶

D4: The Compliance Ruleset is a first-class instrument, not metadata.¶

D5: Source references on every element, not just at the top.¶

D6: Adapter instances are per-client, not shared.¶

D7: Triumvirate schema is versioned independently.¶

D8: The adapter MAY use LLM extraction, but the result must be deterministic.¶

7. Open Questions¶

Q1: Minimum viable Compliance Ruleset for TGA¶

Q2: Credential Map scope for unit-level processing¶

Q3: How does the adapter handle superseded legislation?¶

Q4: Adapter testing and certification¶

Q5: Enrichment boundary¶

Version History¶

`identify(corpus_payload) → CorpusIdentity`¶

`ingest(corpus_payload, options) → IngestionResult`¶

`translate(raw_corpus, supplementary_data) → TranslationResult`¶

`validate(triumvirate) → ValidationResult`¶

`audit_entry(event) → void`¶

`version() → AdapterVersionInfo`¶