Skip to content

UCCA Engine Blueprint v1

Purpose: Structural audit of ucca-engine to plan the refactor from a TGA-specific course generator into a corpus-agnostic CCO processing engine.

Frame of reference: The Gate whitepaper — UCCA's engine processes any regulated domain's legislative framework through the Triumvirate (Outcome Specification, Compliance Ruleset, Credential Map), producing Certified Capability Objects (CCOs). TGA/Australian VET is one client implementation, not the world.

Date: 2026-03-09 Status: Read-only audit — no files modified


1. Structural Map

Repository Layout

ucca-engine/
├── backend/                          ← Core application (FastAPI + SQLAlchemy)
│   ├── app/
│   │   ├── main.py                   ← FastAPI app ("UCCA Engine" v0.1.0), health check
│   │   ├── db.py                     ← PostgreSQL via SQLAlchemy, SessionLocal
│   │   ├── models/
│   │   │   ├── core.py               ← Framework, Course, Module, Artifact, UCCAUnit models
│   │   │   ├── models_additions.py   ← UCCADomain, SourceUnit, CourseSourceMap, UCCACodeSequence
│   │   │   └── training_package.py   ← TrainingPackage, TrainingPackageDomainMap
│   │   ├── routers/
│   │   │   └── internal.py           ← Admin API: coverage endpoint, admin key auth
│   │   └── utils/
│   │       ├── ucca_domains.py       ← Domain taxonomy (22 domains) + TGA→domain resolution
│   │       ├── course_artifacts.py   ← Artifact metadata derivation + upsert
│   │       ├── provenance_v0.py      ← Provenance block management
│   │       └── run_resolution_v0.py  ← Run bundle resolution (STRICT/PERMISSIVE modes)
│   ├── alembic/                      ← Database migrations (PostgreSQL)
│   ├── scripts/                      ← 24 CLI tools (see below)
│   ├── run_bundle/
│   │   └── validate_run_bundle_v1.py ← Contract v1 validator
│   ├── .env                          ← Local dev config (DATABASE_URL, admin key)
│   └── requirements.txt              ← Python deps (FastAPI, SQLAlchemy, Anthropic SDK, etc.)
├── generator/                        ← Content generation layer (LLM-powered)
│   ├── ucca_config.py                ← Branding, pricing, AI model config, terminology tables
│   ├── tga_packages.py              ← 54 TGA package registry with industry context
│   ├── us_course_generator.py       ← USCourseGenerator class — main LLM content generation
│   ├── generate_complete_course.py  ← Master orchestrator (Steps 4–6)
│   ├── tga_scraper.py               ← HTML scraper for training.gov.au
│   ├── tga_pdf_processor.py         ← PDF download + LLM extraction from TGA PDFs
│   ├── learnworlds_generator.py     ← Marketing copy generation (LLM)
│   ├── pictory_generator.py         ← Sales video script generation (LLM)
│   └── folder_manager.py            ← Course folder structure creation
├── tga_data/                         ← Cached TGA unit JSON (extracted from PDFs)
├── tga_meta/                         ← TGA unit metadata JSON
├── tga_pdfs/                         ← Downloaded TGA PDF documents
├── courses/                          ← Generated course output folders (20+)
├── human_runs/                       ← Archived run bundles
├── oscal/                            ← OSCAL compliance framework (SSP, NIST catalog, schema)
├── web/                              ← Preview web interface
├── viewer/                           ← Course viewer
├── docs/                             ← Documentation
├── ucca                              ← CLI wrapper (bash, 9 commands)
├── docker-compose.yml                ← Local PostgreSQL 16
├── Dockerfile.preview                ← Preview server container
└── requirements.txt                  ← Root-level Python deps

Module Interconnections

                    ┌──────────────────┐
                    │   CLI Wrapper    │  (ucca bash script)
                    │   ./ucca run     │
                    └────────┬─────────┘
              ┌──────────────┼──────────────┐
              ▼              ▼              ▼
   ┌──────────────┐  ┌────────────┐  ┌──────────────┐
   │  bootstrap_  │  │  validate_ │  │  inspect_    │
   │  from_tga_   │  │  run_      │  │  run_bundle  │
   │  unit.py     │  │  bundle_v2 │  │  _v2.py      │
   └──────┬───────┘  └────────────┘  └──────────────┘
    ┌─────┼────────────────────────┐
    ▼     ▼                        ▼
┌────────────┐  ┌──────────────┐  ┌────────────────────┐
│ create_    │  │ generate_    │  │ ingest_course.py   │
│ course_    │  │ complete_    │  │ (DB write layer)   │
│ spec.py    │  │ course.py    │  └────────┬───────────┘
└────────────┘  └──────┬───────┘           │
                       │                    ▼
          ┌────────────┼───────┐    ┌──────────────┐
          ▼            ▼       ▼    │  SQLAlchemy   │
   ┌────────────┐  ┌──────┐  ┌──┐  │  Models       │
   │ us_course_ │  │picto │  │LW│  │  (core.py +   │
   │ generator  │  │ry    │  │  │  │  additions)   │
   └──────┬─────┘  └──────┘  └──┘  └──────────────┘
   ┌────────────┐
   │ tga_pdf_   │ → training.gov.au
   │ processor  │
   └────────────┘
   ┌────────────┐
   │ tga_       │ → training.gov.au
   │ packages   │
   └────────────┘

Key Scripts (24 total)

Script Purpose Status
bootstrap_from_tga_unit.py Full pipeline: metadata → DB → spec → generate → ingest Working, primary entry point
create_course_spec.py Create CourseSpec JSON from DB course record Working
ingest_course.py Write generated course data into DB Working
pipeline_generate_and_ingest.py Full pipeline orchestration Working
validate_run_bundle_v2.py Contract v2 validation Working
validate_run_bundle_v2_batch.py Batch validation Working
validate_run_bundle_v2_qualification.py Qualification-specific validation Working
build_qualification_run_v0.py Build qualification run bundles Working
build_qualification_linked_run_v0.py Build linked qualification runs Working
export_scorm_12.py SCORM 1.2 export Working
generate_content_html.py HTML content generation Working
generate_preview.py Preview HTML generation Working
sync_training_packages.py Import TGA packages into DB Working
sync_training_package_domain_map.py Sync domain mappings Working
sync_training_package_us_meta.py Sync US-facing metadata Working
inspect_run_bundle_v2.py Inspect run bundle structure Working
run_canary_suite.py Canary test execution Working
canary_contract_v2.py Canary suite runner Working
seed_qualification_stub_v0.py Seed stub qualifications Working
reconcile_training_packages_v0.py Package reconciliation Working
serve_remote_preview.py Preview server Working
_course_data_repair_helpers.py Data repair utilities Utility
check_import_roots.py Import path verification Utility

2. TGA Dependency Audit

Every location where Australian VET or TGA-specific assumptions are hardcoded. These are the extraction points for the refactor.

CRITICAL — Schema-Level TGA Assumptions

File Lines What Severity
backend/app/models/models_additions.py 56–107 SourceUnit model: source_system defaults to "TGA", field names package, version, status, supersedes, superseded_by mirror TGA lifecycle. Comment says "TGA lifecycle fields" (line 69) CRITICAL
backend/app/models/models_additions.py 110–157 SourceUnitDependency: depends_on_system defaults to "TGA" (line 130), source defaults to "tga" (line 135). Docstring says "TGA" (line 112) CRITICAL
backend/app/models/training_package.py 15–63 Entire TrainingPackage + TrainingPackageDomainMap models — TGA-only concept. Fields: tga_name, nrt_flag, tp_developer, regulator are all TGA terminology CRITICAL
backend/app/models/core.py 55–75 FrameworkUnit.source_code / source_release — designed around TGA unit code patterns but named generically MODERATE

CRITICAL — Business Logic TGA Hardcoding

File Lines What Severity
backend/app/utils/ucca_domains.py 198–228 get_domain_for_tga_package() — function name, parameter name, error messages all say "TGA". Queries training_package_domain_map which is TGA-specific CRITICAL
backend/app/utils/ucca_domains.py 235–256 get_level_from_tga_code() — extracts AQF level digit from TGA unit code format. AQF is an Australian-specific qualifications framework. Regex assumes [A-Z]{3,5}\d{4,6}[A-Z]? pattern CRITICAL
backend/scripts/bootstrap_from_tga_unit.py 1–1025 Entire file is TGA-specific. Function name, CLI args, URL construction (training.gov.au/Training/Details/), metadata fetching, PDF downloading, field extraction — all TGA CRITICAL
backend/scripts/bootstrap_from_tga_unit.py 67 _TGA_UNIT_CODE_RE = re.compile(r"\b[A-Z]{3,5}\d{4,6}[A-Z]?\b") — hardcoded TGA unit code regex CRITICAL
backend/scripts/bootstrap_from_tga_unit.py 404 details_url = f"https://training.gov.au/Training/Details/{tga_unit_code}" — hardcoded upstream URL CRITICAL
backend/scripts/bootstrap_from_tga_unit.py 406–408 _infer_pkg() — infers 3-letter package code from TGA unit code prefix CRITICAL
backend/scripts/bootstrap_from_tga_unit.py 656 source_system == "TGA" — hardcoded in DB queries CRITICAL
backend/scripts/create_course_spec.py 39–54 _default_target_job_for_package() — hardcoded fallback job titles per TGA package code (CPC→"Construction Trades Assistant", CHC→"Home Health Aide") HIGH
backend/scripts/create_course_spec.py 102–121 Spec structure has source.tga_unit_code key — not generic HIGH
backend/scripts/ingest_course.py 34 from app.utils.ucca_domains import get_domain_for_tga_package — TGA-specific import HIGH
backend/scripts/sync_training_packages.py all Entire file syncs TGA training packages from tga_packages.py into DB CRITICAL
backend/scripts/sync_training_package_domain_map.py all Syncs TGA package→domain mappings CRITICAL
backend/scripts/sync_training_package_us_meta.py all Syncs TGA US-facing metadata CRITICAL
backend/scripts/reconcile_training_packages_v0.py all Reconciles TGA training packages CRITICAL
backend/run_bundle/validate_run_bundle_v1.py 8, 13–20 Required files include inputs/tga_unit.json; run_meta requires tga_code HIGH

CRITICAL — Generator Layer TGA Hardcoding

File Lines What Severity
generator/tga_packages.py 1–731 Entire file — 54 TGA training packages with Australian-specific industry context, roles, concepts. This IS the TGA client implementation CRITICAL
generator/tga_scraper.py 1–230 Entire file — scrapes training.gov.au HTML for unit structure CRITICAL
generator/tga_pdf_processor.py 1–444 Entire file — downloads PDFs from training.gov.au/TrainingComponentFiles/, extracts TGA unit data using TGA-specific prompt template CRITICAL
generator/tga_pdf_processor.py 31 self.base_url = "https://training.gov.au/TrainingComponentFiles" CRITICAL
generator/tga_pdf_processor.py 225–278 LLM extraction prompt hardcodes TGA JSON schema (code, title, application, elements, performance_criteria, knowledge_evidence, performance_evidence, status, version, supersedes, superseded_by, release_date) CRITICAL
generator/tga_pdf_processor.py 337 unit_data["source_url"] = f"https://training.gov.au/Training/Details/{tga_code}" CRITICAL
generator/us_course_generator.py 45 from tga_packages import get_package_context — direct TGA dependency CRITICAL
generator/us_course_generator.py 657–673 _get_package_context_safe() — calls TGA package registry HIGH
generator/us_course_generator.py 675–687 _authority_header() — references "TGA Unit Code" in every AI prompt HIGH
generator/us_course_generator.py 693–735 generate_complete_course() — expects tga_data dict with TGA structure (code, elements, application) CRITICAL
generator/us_course_generator.py 872–876 Ledger records unit_id as "authoritative TGA unit code" MODERATE
generator/us_course_generator.py 1032 CQ-02A hardcoded to UCCA-CARE-100-002 specifically LOW (spike)
generator/us_course_generator.py 1049 CQ-01 hardcoded to UCCA-CARE-100-002 specifically LOW (spike)
generator/generate_complete_course.py 66–71 _spec_mode_values() extracts tga_unit_code from spec HIGH
generator/generate_complete_course.py 107 meta["tga_unit_code"] — TGA-specific meta key HIGH
generator/generate_complete_course.py 593–658 _normalize_course_data() stamps tga_unit_code into metadata HIGH
generator/generate_complete_course.py 602 m.setdefault("tga_unit_code", tga_code) on every module HIGH
generator/folder_manager.py 70–71 pdf_url = f"https://training.gov.au/TrainingComponentFiles/CHC/{tga_code}_R2.pdf" — hardcoded CHC package and R2 version HIGH
generator/folder_manager.py 130–175 Course info template references TGA mapping, training.gov.au URL MODERATE

MODERATE — Configuration & Branding TGA Assumptions

File Lines What Severity
generator/ucca_config.py 13 DEFAULT_TARGET_JOB = "Home Health Aide" — US healthcare assumption MODERATE
generator/ucca_config.py 82–110 INDUSTRY_TERMINOLOGY — Australian→US translation tables (healthcare, automotive, construction) HIGH
generator/ucca_config.py 112–126 US_CONTEXT — US healthcare regulations/certifications MODERATE
generator/ucca_config.py 137–156 MARKETING — healthcare-specific value proposition, trust signals, target audience MODERATE
generator/ucca_config.py 144 "Developed by vocational education experts" — VET-specific claim MODERATE

LOW — Data Files (TGA Content)

Location What
tga_data/*.json 30+ cached TGA unit JSON files
tga_meta/*.json 10+ TGA metadata files
tga_pdfs/*.pdf Downloaded TGA PDF documents
courses/ 20+ generated courses, all from TGA units
human_runs/ Archived run bundles, all TGA-based

Summary: TGA Dependency Density

Layer TGA References Severity
Database schema (models) 4 models, 15+ fields CRITICAL
Business logic (utils) 2 functions, hardcoded CRITICAL
Bootstrap pipeline Entire file (1025 lines) CRITICAL
Generator layer 5 files, ~2500 lines CRITICAL
Spec/validation 4 files, ~200 lines HIGH
Configuration 1 file, ~80 lines MODERATE
Data files 3 directories LOW (content, not code)

3. Data Flow Diagram

Current Pipeline: TGA Unit → Course Output

┌─────────────────────────────────────────────────────────────────┐
│                    CORPUS INGESTION                              │
│                                                                  │
│  Input: TGA Unit Code (e.g., "CHCCCS023")                       │
│                                                                  │
│  1. training.gov.au/Training/Details/{code}                      │
│     → HTML scrape for title, prereqs                             │
│                                                                  │
│  2. training.gov.au/TrainingComponentFiles/{PKG}/{code}_R{n}.pdf │
│     → PDF download (tries R5→R1)                                 │
│                                                                  │
│  3. LLM extraction (Claude Sonnet) on PDF                        │
│     → Structured JSON: {code, title, application, elements[],    │
│        performance_criteria[], knowledge_evidence[],             │
│        performance_evidence[], status, version}                   │
│                                                                  │
│  Output: tga_data/{code}.json + tga_meta/{code}.json             │
└──────────────────────────┬──────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│                    DATABASE BOOTSTRAP                             │
│                                                                  │
│  1. Resolve TGA package code (3-letter prefix from unit code)    │
│  2. Look up package→domain mapping in DB (hard fail if missing)  │
│  3. Create/update SourceUnit row (source_system="TGA")           │
│  4. Create/update UCCAUnit row (canonical, level-neutral)        │
│  5. Create UCCAUnitSourceMap (current/superseded)                │
│  6. Allocate UCCA code: UCCA-{DOMAIN}-{LEVEL}-{SEQ}             │
│  7. Create/update Course row with ucca_code                      │
│  8. Create CourseUnitMap + CourseSourceMap                        │
│                                                                  │
│  Output: Course record in PostgreSQL with all linkages           │
└──────────────────────────┬──────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│                    SPEC CREATION                                 │
│                                                                  │
│  Input: UCCA code from DB                                        │
│  Reads: Course + SourceUnit from DB                              │
│                                                                  │
│  Creates CourseSpec JSON:                                         │
│  {                                                               │
│    spec_version: 1,                                              │
│    course: { ucca_code, title, domain, level },                  │
│    source: { tga_unit_code, source_unit_version },               │
│    generation: { target_job }                                    │
│  }                                                               │
│                                                                  │
│  Output: backend/data_outbox/{UCCA_CODE}_spec.json               │
└──────────────────────────┬──────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│                    CONTENT GENERATION (Step 4)                    │
│                                                                  │
│  Input: CourseSpec + TGA unit data                                │
│  Engine: USCourseGenerator (Claude Sonnet via Anthropic SDK)     │
│                                                                  │
│  For each element in tga_data.elements[]:                        │
│    1. Build prompt with:                                         │
│       - Authority header (TGA code + UCCA code)                  │
│       - Contract v0 formatting rules                             │
│       - Element title + performance criteria                     │
│       - Package industry context (from tga_packages.py)          │
│    2. Call LLM → JSON module response                            │
│    3. Normalize: title, learning_outcomes, video_script,         │
│       quiz_questions, content_html                               │
│    4. Apply CQ-01/CQ-02A spikes (UCCA-CARE-100-002 only)       │
│    5. Assert no Python repr leaks in content_html                │
│                                                                  │
│  Also generates:                                                 │
│    - Course description (LLM)                                    │
│    - Learning outcomes (LLM)                                     │
│    - Target job inference (LLM, if not in spec)                  │
│                                                                  │
│  Best-effort capture: prompts/, ai/ directories                  │
│  Best-effort cost ledger: cognitive_cost.jsonl                   │
│                                                                  │
│  Output: course_data dict with modules[] + artifacts[]           │
└──────────────────────────┬──────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│                    ARTIFACT GENERATION (Steps 5-6)               │
│                                                                  │
│  Step 5: Pictory script (LLM)                                    │
│    → marketing.video.promo.script artifact                       │
│                                                                  │
│  Step 6: Course page copy (LLM)                                  │
│    → marketing.ucca.course_description artifact                  │
│    → marketing.ucca.course_outline artifact                      │
│                                                                  │
│  Output: course_data dict updated with artifact entries           │
└──────────────────────────┬──────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│                    FILE OUTPUT                                    │
│                                                                  │
│  courses/{UCCA_CODE}_{title}/                                    │
│  ├── 00_course_info.txt                                          │
│  ├── 01_source_materials/                                        │
│  │   ├── {UCCA_CODE}_TGA-{code}.json                            │
│  │   └── {UCCA_CODE}_TGA-{code}_R{n}.pdf                        │
│  ├── 02_generated_content/                                       │
│  │   ├── {UCCA_CODE}_course_data.json                            │
│  │   └── {UCCA_CODE}_workbook.md                                 │
│  ├── 03_job_aids/                                                │
│  ├── 04_marketing/                                               │
│  │   └── {UCCA_CODE}_pictory_script.txt                          │
│  └── 05_learnworlds/                                             │
│      └── {UCCA_CODE}_upload_checklist.txt                        │
└──────────────────────────┬──────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│                    DATABASE INGEST                                │
│                                                                  │
│  Input: TGA JSON + course_data JSON                              │
│                                                                  │
│  1. Parse TGA JSON → FrameworkUnit + Elements + Criteria         │
│  2. Parse course_data → CourseModules + Artifacts                │
│  3. Link modules to framework elements (ModuleElementMap)        │
│  4. Store dependency edges (SourceUnitDependency)                │
│  5. Upsert artifacts with derived metadata                       │
│  6. Post-ingest assertion: module count > 0                      │
│                                                                  │
│  Output: Complete course in PostgreSQL                           │
└──────────────────────────┬──────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│                    EXPORT / VALIDATION                            │
│                                                                  │
│  Run bundles: meta/manifest.json + meta/run_meta.json            │
│  Validators: Contract v1, v2, v2-batch, v2-qualification         │
│  Exports: SCORM 1.2, Preview HTML, Content HTML                  │
│  Canary suite: automated contract compliance tests               │
└─────────────────────────────────────────────────────────────────┘

What Would Change for Corpus-Agnostic CCO Processing

The current pipeline is:

TGA PDF → LLM extraction → TGA JSON → LLM generation → Course → DB

The target pipeline (per whitepaper) would be:

Corpus Document → Triumvirate Parser → Structured Spec → CCO Processor → CCO → Signed Output

The Triumvirate is: 1. Outcome Specification — what good looks like (currently: TGA elements + performance criteria) 2. Compliance Ruleset — what operators must do (currently: not modeled) 3. Credential Map — who is authorised to do what (currently: not modeled)


4. Gap Assessment

What Is Working End to End

Capability Status Notes
TGA PDF download from training.gov.au Working Revision-aware (R5→R1), local-first caching
TGA PDF → structured JSON extraction Working LLM-powered, cached in tga_data/
TGA HTML scraping (title, prereqs) Working Fallback to PDF extraction
Database bootstrap (SourceUnit → UCCAUnit → Course) Working Transaction-safe code allocation
CourseSpec creation from DB Working Deterministic, DB-authoritative
LLM content generation (modules, outcomes, descriptions) Working Claude Sonnet, with capture + cost ledger
Course folder structure creation Working Organized 5-subfolder layout
Database ingest (course + modules + artifacts) Working Idempotent artifact upsert, element mapping
Run bundle validation (Contract v1 + v2) Working Strict mode with preflight gate
SCORM 1.2 export Working Full package generation
Preview HTML generation Working Server-side rendering
Canary test suite Working Automated contract compliance
UCCA code allocation (domain-level-sequence) Working Transaction-safe sequences
UCCAUnit identity layer (level-neutral) Working Clean abstraction
Artifact metadata derivation Working Deterministic, intent/audience/content_type
Provenance block management Working Idempotent manifest updates
Run resolution (STRICT/PERMISSIVE) Working RUN_REF.json authoritative
Cognitive cost ledger Working JSONL, per-run + global
Capture system (prompt/response) Working Best-effort, zero-impact
Domain taxonomy (22 domains) Working Seeded, queryable
Training package registry (54 packages) Working Complete with industry context
Package → domain mapping Working Explicit, no fallbacks
Qualification run bundles Working Build + validate
Batch run validation Working Contract v2

What Is Incomplete or Scaffolded

Capability Status Notes
FastAPI web application Scaffolded Only health check + 1 admin endpoint. No public API. No web UI.
Alembic migrations Present but unclear env.py exists, migration history not audited
Docker deployment Partial docker-compose for local dev only. Dockerfile.preview exists. No production deployment.
Workbook generation Conditional Only runs if generator has generate_workbook_markdown method (not always present)
LearnWorlds integration Checklist only Generates upload checklist, no actual API integration
Pictory integration Script only Generates video script text, no actual Pictory API integration
Target job resolution Temporary fallback Marked UCCA_TEMP_FALLBACK_TARGET_JOB_DELETE_ME, hardcoded per-package
CQ-01 / CQ-02A quality controls Spike Hardcoded to UCCA-CARE-100-002 only, not generalized
Source unit dependencies Data capture only Stored but "NOT enforced by the engine" (line 113, models_additions.py)
OSCAL compliance Minimal Single SSP test file with one control (AC-1). Schema + catalog present.
Multi-source-system support Designed but unused source_system field exists on SourceUnit, but all code assumes "TGA"
US terminology translation Partial Tables exist for healthcare/automotive/construction only
Video/media production Not started Thumbnail suggestions in comments only
Certificate generation Template only Text template in ucca_config.py, no actual PDF/image generation
Student enrollment/completion tracking Not present No student-facing models or logic

What Exists in Name Only

Capability Status Notes
Corpus-agnostic processing Not started Every code path assumes TGA. No abstraction layer for alternative corpora.
CCO (Certified Capability Object) Not started The whitepaper concept has zero implementation. No CCO model, no signing, no distribution format.
Triumvirate input specification Not started Only the first leg (Outcome Specification = TGA elements/criteria) is partially modeled. Compliance Ruleset and Credential Map are not present.
Cryptographic signing / dual-key co-signing Not started No cryptographic infrastructure at all.
CCO expiry / revocation Not started No temporal validity on any output.
Trust tier classification Not started No risk tier model.
Operational telemetry loop Not started The cognitive cost ledger tracks LLM costs, not CCO operational telemetry as described in the whitepaper.
UCCO open standard format Not started No formal schema or specification.
Multi-domain support Architecturally possible UCCADomain taxonomy is generic, SourceUnit has source_system field, but zero non-TGA implementation exists.
Production API Not started FastAPI app has no production endpoints.
Client onboarding Not started No mechanism for a new domain/client to bring their own corpus.

Architecture Assessment: Current vs Target

Dimension Current State Target (per whitepaper)
Input TGA unit code (string) Any corpus document conforming to Triumvirate specification
Ingestion Scrape/download from training.gov.au Client uploads or API delivers corpus documents
Schema TGA-specific (elements, performance_criteria, knowledge_evidence) Universal Triumvirate (Outcome Specification, Compliance Ruleset, Credential Map)
Processing LLM-generated course content Deterministic verification + structured processing
Output Course data (modules, artifacts, marketing copy) Certified Capability Object (CCO)
Integrity No cryptographic guarantees Dual-key co-signing, hash fingerprinting, immutability
Lifecycle Static output files Expiry, renewal, revocation
Trust model Implicit (UCCA generates, UCCA trusts) Explicit tiers with domain expert co-certification
Provenance Run metadata + cost ledger Full cryptographic audit trail
Distribution File system (courses/, run bundles) Compiled, encrypted CCO packages

What Can Be Preserved in the Refactor

The following architectural primitives are sound and reusable:

  1. UCCADomain taxonomy — generic, not TGA-specific
  2. UCCAUnit identity layer — level-neutral canonical identifiers
  3. UCCA code allocation — transaction-safe sequence generation
  4. Artifact metadata system — deterministic derivation of intent/audience/content_type
  5. Provenance/capture architecture — best-effort, zero-impact, vendor-replaceable
  6. Run bundle contract system — validation, preflight gates, canary suites
  7. Run resolution (STRICT/PERMISSIVE) — portable RUN_REF.json pattern
  8. Cognitive cost ledger — append-only JSONL with cumulative tracking
  9. CourseSpec contract — DB-authoritative, spec-driven generation
  10. Course → Module → Artifact hierarchy — clean public product model
  11. SourceUnit → UCCAUnit → Course mapping — the abstraction is right, just needs generalization of source_system

What Must Be Created New

  1. Corpus Adapter interface — abstract base class for ingesting any corpus (TGA becomes one adapter)
  2. Triumvirate schema — formal specification for the three legislative instrument types
  3. CCO model — data model for the Certified Capability Object
  4. Cryptographic layer — signing, verification, key management
  5. CCO lifecycle — expiry, renewal, revocation registry
  6. Trust tier framework — risk classification model
  7. Domain expert registration — co-signer identity and credential management
  8. Production API — FastAPI endpoints for corpus submission, CCO retrieval, verification
  9. Client isolation — per-world/per-domain data boundaries (mirrors the surfaces architecture)

Appendix: File-by-File TGA Reference Count

Quick grep results for density of TGA references:

File tga refs TGA refs training.gov.au refs
bootstrap_from_tga_unit.py 120+ 40+ 2
tga_packages.py 80+ 60+ 1
tga_pdf_processor.py 40+ 20+ 3
tga_scraper.py 15+ 10+ 2
us_course_generator.py 20+ 15+ 0
generate_complete_course.py 30+ 10+ 0
ingest_course.py 50+ 20+ 0
models_additions.py 10+ 8 0
training_package.py 5 3 0
ucca_domains.py 8 6 0
create_course_spec.py 8 3 0
folder_manager.py 10+ 5 1
ucca_config.py 0 0 0

Total: ~400+ TGA-specific references across the codebase.


This document is the foundation for planning the refactor. The next step is to design the abstraction layer that separates the universal engine primitives from the TGA client implementation, so that TGA becomes adapters/tga/ and the core engine processes any Triumvirate-conformant corpus.

Version History

Version Date Change Author
1.0 2026-03-09 Initial creation Claude Code