UCCA Engine Blueprint v1
Purpose: Structural audit of ucca-engine to plan the refactor from a TGA-specific course generator into a corpus-agnostic CCO processing engine.
Frame of reference: The Gate whitepaper — UCCA's engine processes any regulated domain's legislative framework through the Triumvirate (Outcome Specification, Compliance Ruleset, Credential Map), producing Certified Capability Objects (CCOs). TGA/Australian VET is one client implementation, not the world.
Date: 2026-03-09
Status: Read-only audit — no files modified
1. Structural Map
Repository Layout
ucca-engine/
├── backend/ ← Core application (FastAPI + SQLAlchemy)
│ ├── app/
│ │ ├── main.py ← FastAPI app ("UCCA Engine" v0.1.0), health check
│ │ ├── db.py ← PostgreSQL via SQLAlchemy, SessionLocal
│ │ ├── models/
│ │ │ ├── core.py ← Framework, Course, Module, Artifact, UCCAUnit models
│ │ │ ├── models_additions.py ← UCCADomain, SourceUnit, CourseSourceMap, UCCACodeSequence
│ │ │ └── training_package.py ← TrainingPackage, TrainingPackageDomainMap
│ │ ├── routers/
│ │ │ └── internal.py ← Admin API: coverage endpoint, admin key auth
│ │ └── utils/
│ │ ├── ucca_domains.py ← Domain taxonomy (22 domains) + TGA→domain resolution
│ │ ├── course_artifacts.py ← Artifact metadata derivation + upsert
│ │ ├── provenance_v0.py ← Provenance block management
│ │ └── run_resolution_v0.py ← Run bundle resolution (STRICT/PERMISSIVE modes)
│ ├── alembic/ ← Database migrations (PostgreSQL)
│ ├── scripts/ ← 24 CLI tools (see below)
│ ├── run_bundle/
│ │ └── validate_run_bundle_v1.py ← Contract v1 validator
│ ├── .env ← Local dev config (DATABASE_URL, admin key)
│ └── requirements.txt ← Python deps (FastAPI, SQLAlchemy, Anthropic SDK, etc.)
│
├── generator/ ← Content generation layer (LLM-powered)
│ ├── ucca_config.py ← Branding, pricing, AI model config, terminology tables
│ ├── tga_packages.py ← 54 TGA package registry with industry context
│ ├── us_course_generator.py ← USCourseGenerator class — main LLM content generation
│ ├── generate_complete_course.py ← Master orchestrator (Steps 4–6)
│ ├── tga_scraper.py ← HTML scraper for training.gov.au
│ ├── tga_pdf_processor.py ← PDF download + LLM extraction from TGA PDFs
│ ├── learnworlds_generator.py ← Marketing copy generation (LLM)
│ ├── pictory_generator.py ← Sales video script generation (LLM)
│ └── folder_manager.py ← Course folder structure creation
│
├── tga_data/ ← Cached TGA unit JSON (extracted from PDFs)
├── tga_meta/ ← TGA unit metadata JSON
├── tga_pdfs/ ← Downloaded TGA PDF documents
├── courses/ ← Generated course output folders (20+)
├── human_runs/ ← Archived run bundles
├── oscal/ ← OSCAL compliance framework (SSP, NIST catalog, schema)
├── web/ ← Preview web interface
├── viewer/ ← Course viewer
├── docs/ ← Documentation
├── ucca ← CLI wrapper (bash, 9 commands)
├── docker-compose.yml ← Local PostgreSQL 16
├── Dockerfile.preview ← Preview server container
└── requirements.txt ← Root-level Python deps
Module Interconnections
┌──────────────────┐
│ CLI Wrapper │ (ucca bash script)
│ ./ucca run │
└────────┬─────────┘
│
┌──────────────┼──────────────┐
▼ ▼ ▼
┌──────────────┐ ┌────────────┐ ┌──────────────┐
│ bootstrap_ │ │ validate_ │ │ inspect_ │
│ from_tga_ │ │ run_ │ │ run_bundle │
│ unit.py │ │ bundle_v2 │ │ _v2.py │
└──────┬───────┘ └────────────┘ └──────────────┘
│
┌─────┼────────────────────────┐
▼ ▼ ▼
┌────────────┐ ┌──────────────┐ ┌────────────────────┐
│ create_ │ │ generate_ │ │ ingest_course.py │
│ course_ │ │ complete_ │ │ (DB write layer) │
│ spec.py │ │ course.py │ └────────┬───────────┘
└────────────┘ └──────┬───────┘ │
│ ▼
┌────────────┼───────┐ ┌──────────────┐
▼ ▼ ▼ │ SQLAlchemy │
┌────────────┐ ┌──────┐ ┌──┐ │ Models │
│ us_course_ │ │picto │ │LW│ │ (core.py + │
│ generator │ │ry │ │ │ │ additions) │
└──────┬─────┘ └──────┘ └──┘ └──────────────┘
│
▼
┌────────────┐
│ tga_pdf_ │ → training.gov.au
│ processor │
└────────────┘
┌────────────┐
│ tga_ │ → training.gov.au
│ packages │
└────────────┘
Key Scripts (24 total)
| Script |
Purpose |
Status |
bootstrap_from_tga_unit.py |
Full pipeline: metadata → DB → spec → generate → ingest |
Working, primary entry point |
create_course_spec.py |
Create CourseSpec JSON from DB course record |
Working |
ingest_course.py |
Write generated course data into DB |
Working |
pipeline_generate_and_ingest.py |
Full pipeline orchestration |
Working |
validate_run_bundle_v2.py |
Contract v2 validation |
Working |
validate_run_bundle_v2_batch.py |
Batch validation |
Working |
validate_run_bundle_v2_qualification.py |
Qualification-specific validation |
Working |
build_qualification_run_v0.py |
Build qualification run bundles |
Working |
build_qualification_linked_run_v0.py |
Build linked qualification runs |
Working |
export_scorm_12.py |
SCORM 1.2 export |
Working |
generate_content_html.py |
HTML content generation |
Working |
generate_preview.py |
Preview HTML generation |
Working |
sync_training_packages.py |
Import TGA packages into DB |
Working |
sync_training_package_domain_map.py |
Sync domain mappings |
Working |
sync_training_package_us_meta.py |
Sync US-facing metadata |
Working |
inspect_run_bundle_v2.py |
Inspect run bundle structure |
Working |
run_canary_suite.py |
Canary test execution |
Working |
canary_contract_v2.py |
Canary suite runner |
Working |
seed_qualification_stub_v0.py |
Seed stub qualifications |
Working |
reconcile_training_packages_v0.py |
Package reconciliation |
Working |
serve_remote_preview.py |
Preview server |
Working |
_course_data_repair_helpers.py |
Data repair utilities |
Utility |
check_import_roots.py |
Import path verification |
Utility |
2. TGA Dependency Audit
Every location where Australian VET or TGA-specific assumptions are hardcoded. These are the extraction points for the refactor.
CRITICAL — Schema-Level TGA Assumptions
| File |
Lines |
What |
Severity |
backend/app/models/models_additions.py |
56–107 |
SourceUnit model: source_system defaults to "TGA", field names package, version, status, supersedes, superseded_by mirror TGA lifecycle. Comment says "TGA lifecycle fields" (line 69) |
CRITICAL |
backend/app/models/models_additions.py |
110–157 |
SourceUnitDependency: depends_on_system defaults to "TGA" (line 130), source defaults to "tga" (line 135). Docstring says "TGA" (line 112) |
CRITICAL |
backend/app/models/training_package.py |
15–63 |
Entire TrainingPackage + TrainingPackageDomainMap models — TGA-only concept. Fields: tga_name, nrt_flag, tp_developer, regulator are all TGA terminology |
CRITICAL |
backend/app/models/core.py |
55–75 |
FrameworkUnit.source_code / source_release — designed around TGA unit code patterns but named generically |
MODERATE |
CRITICAL — Business Logic TGA Hardcoding
| File |
Lines |
What |
Severity |
backend/app/utils/ucca_domains.py |
198–228 |
get_domain_for_tga_package() — function name, parameter name, error messages all say "TGA". Queries training_package_domain_map which is TGA-specific |
CRITICAL |
backend/app/utils/ucca_domains.py |
235–256 |
get_level_from_tga_code() — extracts AQF level digit from TGA unit code format. AQF is an Australian-specific qualifications framework. Regex assumes [A-Z]{3,5}\d{4,6}[A-Z]? pattern |
CRITICAL |
backend/scripts/bootstrap_from_tga_unit.py |
1–1025 |
Entire file is TGA-specific. Function name, CLI args, URL construction (training.gov.au/Training/Details/), metadata fetching, PDF downloading, field extraction — all TGA |
CRITICAL |
backend/scripts/bootstrap_from_tga_unit.py |
67 |
_TGA_UNIT_CODE_RE = re.compile(r"\b[A-Z]{3,5}\d{4,6}[A-Z]?\b") — hardcoded TGA unit code regex |
CRITICAL |
backend/scripts/bootstrap_from_tga_unit.py |
404 |
details_url = f"https://training.gov.au/Training/Details/{tga_unit_code}" — hardcoded upstream URL |
CRITICAL |
backend/scripts/bootstrap_from_tga_unit.py |
406–408 |
_infer_pkg() — infers 3-letter package code from TGA unit code prefix |
CRITICAL |
backend/scripts/bootstrap_from_tga_unit.py |
656 |
source_system == "TGA" — hardcoded in DB queries |
CRITICAL |
backend/scripts/create_course_spec.py |
39–54 |
_default_target_job_for_package() — hardcoded fallback job titles per TGA package code (CPC→"Construction Trades Assistant", CHC→"Home Health Aide") |
HIGH |
backend/scripts/create_course_spec.py |
102–121 |
Spec structure has source.tga_unit_code key — not generic |
HIGH |
backend/scripts/ingest_course.py |
34 |
from app.utils.ucca_domains import get_domain_for_tga_package — TGA-specific import |
HIGH |
backend/scripts/sync_training_packages.py |
all |
Entire file syncs TGA training packages from tga_packages.py into DB |
CRITICAL |
backend/scripts/sync_training_package_domain_map.py |
all |
Syncs TGA package→domain mappings |
CRITICAL |
backend/scripts/sync_training_package_us_meta.py |
all |
Syncs TGA US-facing metadata |
CRITICAL |
backend/scripts/reconcile_training_packages_v0.py |
all |
Reconciles TGA training packages |
CRITICAL |
backend/run_bundle/validate_run_bundle_v1.py |
8, 13–20 |
Required files include inputs/tga_unit.json; run_meta requires tga_code |
HIGH |
CRITICAL — Generator Layer TGA Hardcoding
| File |
Lines |
What |
Severity |
generator/tga_packages.py |
1–731 |
Entire file — 54 TGA training packages with Australian-specific industry context, roles, concepts. This IS the TGA client implementation |
CRITICAL |
generator/tga_scraper.py |
1–230 |
Entire file — scrapes training.gov.au HTML for unit structure |
CRITICAL |
generator/tga_pdf_processor.py |
1–444 |
Entire file — downloads PDFs from training.gov.au/TrainingComponentFiles/, extracts TGA unit data using TGA-specific prompt template |
CRITICAL |
generator/tga_pdf_processor.py |
31 |
self.base_url = "https://training.gov.au/TrainingComponentFiles" |
CRITICAL |
generator/tga_pdf_processor.py |
225–278 |
LLM extraction prompt hardcodes TGA JSON schema (code, title, application, elements, performance_criteria, knowledge_evidence, performance_evidence, status, version, supersedes, superseded_by, release_date) |
CRITICAL |
generator/tga_pdf_processor.py |
337 |
unit_data["source_url"] = f"https://training.gov.au/Training/Details/{tga_code}" |
CRITICAL |
generator/us_course_generator.py |
45 |
from tga_packages import get_package_context — direct TGA dependency |
CRITICAL |
generator/us_course_generator.py |
657–673 |
_get_package_context_safe() — calls TGA package registry |
HIGH |
generator/us_course_generator.py |
675–687 |
_authority_header() — references "TGA Unit Code" in every AI prompt |
HIGH |
generator/us_course_generator.py |
693–735 |
generate_complete_course() — expects tga_data dict with TGA structure (code, elements, application) |
CRITICAL |
generator/us_course_generator.py |
872–876 |
Ledger records unit_id as "authoritative TGA unit code" |
MODERATE |
generator/us_course_generator.py |
1032 |
CQ-02A hardcoded to UCCA-CARE-100-002 specifically |
LOW (spike) |
generator/us_course_generator.py |
1049 |
CQ-01 hardcoded to UCCA-CARE-100-002 specifically |
LOW (spike) |
generator/generate_complete_course.py |
66–71 |
_spec_mode_values() extracts tga_unit_code from spec |
HIGH |
generator/generate_complete_course.py |
107 |
meta["tga_unit_code"] — TGA-specific meta key |
HIGH |
generator/generate_complete_course.py |
593–658 |
_normalize_course_data() stamps tga_unit_code into metadata |
HIGH |
generator/generate_complete_course.py |
602 |
m.setdefault("tga_unit_code", tga_code) on every module |
HIGH |
generator/folder_manager.py |
70–71 |
pdf_url = f"https://training.gov.au/TrainingComponentFiles/CHC/{tga_code}_R2.pdf" — hardcoded CHC package and R2 version |
HIGH |
generator/folder_manager.py |
130–175 |
Course info template references TGA mapping, training.gov.au URL |
MODERATE |
MODERATE — Configuration & Branding TGA Assumptions
| File |
Lines |
What |
Severity |
generator/ucca_config.py |
13 |
DEFAULT_TARGET_JOB = "Home Health Aide" — US healthcare assumption |
MODERATE |
generator/ucca_config.py |
82–110 |
INDUSTRY_TERMINOLOGY — Australian→US translation tables (healthcare, automotive, construction) |
HIGH |
generator/ucca_config.py |
112–126 |
US_CONTEXT — US healthcare regulations/certifications |
MODERATE |
generator/ucca_config.py |
137–156 |
MARKETING — healthcare-specific value proposition, trust signals, target audience |
MODERATE |
generator/ucca_config.py |
144 |
"Developed by vocational education experts" — VET-specific claim |
MODERATE |
LOW — Data Files (TGA Content)
| Location |
What |
tga_data/*.json |
30+ cached TGA unit JSON files |
tga_meta/*.json |
10+ TGA metadata files |
tga_pdfs/*.pdf |
Downloaded TGA PDF documents |
courses/ |
20+ generated courses, all from TGA units |
human_runs/ |
Archived run bundles, all TGA-based |
Summary: TGA Dependency Density
| Layer |
TGA References |
Severity |
| Database schema (models) |
4 models, 15+ fields |
CRITICAL |
| Business logic (utils) |
2 functions, hardcoded |
CRITICAL |
| Bootstrap pipeline |
Entire file (1025 lines) |
CRITICAL |
| Generator layer |
5 files, ~2500 lines |
CRITICAL |
| Spec/validation |
4 files, ~200 lines |
HIGH |
| Configuration |
1 file, ~80 lines |
MODERATE |
| Data files |
3 directories |
LOW (content, not code) |
3. Data Flow Diagram
Current Pipeline: TGA Unit → Course Output
┌─────────────────────────────────────────────────────────────────┐
│ CORPUS INGESTION │
│ │
│ Input: TGA Unit Code (e.g., "CHCCCS023") │
│ │
│ 1. training.gov.au/Training/Details/{code} │
│ → HTML scrape for title, prereqs │
│ │
│ 2. training.gov.au/TrainingComponentFiles/{PKG}/{code}_R{n}.pdf │
│ → PDF download (tries R5→R1) │
│ │
│ 3. LLM extraction (Claude Sonnet) on PDF │
│ → Structured JSON: {code, title, application, elements[], │
│ performance_criteria[], knowledge_evidence[], │
│ performance_evidence[], status, version} │
│ │
│ Output: tga_data/{code}.json + tga_meta/{code}.json │
└──────────────────────────┬──────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ DATABASE BOOTSTRAP │
│ │
│ 1. Resolve TGA package code (3-letter prefix from unit code) │
│ 2. Look up package→domain mapping in DB (hard fail if missing) │
│ 3. Create/update SourceUnit row (source_system="TGA") │
│ 4. Create/update UCCAUnit row (canonical, level-neutral) │
│ 5. Create UCCAUnitSourceMap (current/superseded) │
│ 6. Allocate UCCA code: UCCA-{DOMAIN}-{LEVEL}-{SEQ} │
│ 7. Create/update Course row with ucca_code │
│ 8. Create CourseUnitMap + CourseSourceMap │
│ │
│ Output: Course record in PostgreSQL with all linkages │
└──────────────────────────┬──────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ SPEC CREATION │
│ │
│ Input: UCCA code from DB │
│ Reads: Course + SourceUnit from DB │
│ │
│ Creates CourseSpec JSON: │
│ { │
│ spec_version: 1, │
│ course: { ucca_code, title, domain, level }, │
│ source: { tga_unit_code, source_unit_version }, │
│ generation: { target_job } │
│ } │
│ │
│ Output: backend/data_outbox/{UCCA_CODE}_spec.json │
└──────────────────────────┬──────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ CONTENT GENERATION (Step 4) │
│ │
│ Input: CourseSpec + TGA unit data │
│ Engine: USCourseGenerator (Claude Sonnet via Anthropic SDK) │
│ │
│ For each element in tga_data.elements[]: │
│ 1. Build prompt with: │
│ - Authority header (TGA code + UCCA code) │
│ - Contract v0 formatting rules │
│ - Element title + performance criteria │
│ - Package industry context (from tga_packages.py) │
│ 2. Call LLM → JSON module response │
│ 3. Normalize: title, learning_outcomes, video_script, │
│ quiz_questions, content_html │
│ 4. Apply CQ-01/CQ-02A spikes (UCCA-CARE-100-002 only) │
│ 5. Assert no Python repr leaks in content_html │
│ │
│ Also generates: │
│ - Course description (LLM) │
│ - Learning outcomes (LLM) │
│ - Target job inference (LLM, if not in spec) │
│ │
│ Best-effort capture: prompts/, ai/ directories │
│ Best-effort cost ledger: cognitive_cost.jsonl │
│ │
│ Output: course_data dict with modules[] + artifacts[] │
└──────────────────────────┬──────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ ARTIFACT GENERATION (Steps 5-6) │
│ │
│ Step 5: Pictory script (LLM) │
│ → marketing.video.promo.script artifact │
│ │
│ Step 6: Course page copy (LLM) │
│ → marketing.ucca.course_description artifact │
│ → marketing.ucca.course_outline artifact │
│ │
│ Output: course_data dict updated with artifact entries │
└──────────────────────────┬──────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ FILE OUTPUT │
│ │
│ courses/{UCCA_CODE}_{title}/ │
│ ├── 00_course_info.txt │
│ ├── 01_source_materials/ │
│ │ ├── {UCCA_CODE}_TGA-{code}.json │
│ │ └── {UCCA_CODE}_TGA-{code}_R{n}.pdf │
│ ├── 02_generated_content/ │
│ │ ├── {UCCA_CODE}_course_data.json │
│ │ └── {UCCA_CODE}_workbook.md │
│ ├── 03_job_aids/ │
│ ├── 04_marketing/ │
│ │ └── {UCCA_CODE}_pictory_script.txt │
│ └── 05_learnworlds/ │
│ └── {UCCA_CODE}_upload_checklist.txt │
└──────────────────────────┬──────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ DATABASE INGEST │
│ │
│ Input: TGA JSON + course_data JSON │
│ │
│ 1. Parse TGA JSON → FrameworkUnit + Elements + Criteria │
│ 2. Parse course_data → CourseModules + Artifacts │
│ 3. Link modules to framework elements (ModuleElementMap) │
│ 4. Store dependency edges (SourceUnitDependency) │
│ 5. Upsert artifacts with derived metadata │
│ 6. Post-ingest assertion: module count > 0 │
│ │
│ Output: Complete course in PostgreSQL │
└──────────────────────────┬──────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ EXPORT / VALIDATION │
│ │
│ Run bundles: meta/manifest.json + meta/run_meta.json │
│ Validators: Contract v1, v2, v2-batch, v2-qualification │
│ Exports: SCORM 1.2, Preview HTML, Content HTML │
│ Canary suite: automated contract compliance tests │
└─────────────────────────────────────────────────────────────────┘
What Would Change for Corpus-Agnostic CCO Processing
The current pipeline is:
TGA PDF → LLM extraction → TGA JSON → LLM generation → Course → DB
The target pipeline (per whitepaper) would be:
Corpus Document → Triumvirate Parser → Structured Spec → CCO Processor → CCO → Signed Output
The Triumvirate is:
1. Outcome Specification — what good looks like (currently: TGA elements + performance criteria)
2. Compliance Ruleset — what operators must do (currently: not modeled)
3. Credential Map — who is authorised to do what (currently: not modeled)
4. Gap Assessment
What Is Working End to End
| Capability |
Status |
Notes |
| TGA PDF download from training.gov.au |
Working |
Revision-aware (R5→R1), local-first caching |
| TGA PDF → structured JSON extraction |
Working |
LLM-powered, cached in tga_data/ |
| TGA HTML scraping (title, prereqs) |
Working |
Fallback to PDF extraction |
| Database bootstrap (SourceUnit → UCCAUnit → Course) |
Working |
Transaction-safe code allocation |
| CourseSpec creation from DB |
Working |
Deterministic, DB-authoritative |
| LLM content generation (modules, outcomes, descriptions) |
Working |
Claude Sonnet, with capture + cost ledger |
| Course folder structure creation |
Working |
Organized 5-subfolder layout |
| Database ingest (course + modules + artifacts) |
Working |
Idempotent artifact upsert, element mapping |
| Run bundle validation (Contract v1 + v2) |
Working |
Strict mode with preflight gate |
| SCORM 1.2 export |
Working |
Full package generation |
| Preview HTML generation |
Working |
Server-side rendering |
| Canary test suite |
Working |
Automated contract compliance |
| UCCA code allocation (domain-level-sequence) |
Working |
Transaction-safe sequences |
| UCCAUnit identity layer (level-neutral) |
Working |
Clean abstraction |
| Artifact metadata derivation |
Working |
Deterministic, intent/audience/content_type |
| Provenance block management |
Working |
Idempotent manifest updates |
| Run resolution (STRICT/PERMISSIVE) |
Working |
RUN_REF.json authoritative |
| Cognitive cost ledger |
Working |
JSONL, per-run + global |
| Capture system (prompt/response) |
Working |
Best-effort, zero-impact |
| Domain taxonomy (22 domains) |
Working |
Seeded, queryable |
| Training package registry (54 packages) |
Working |
Complete with industry context |
| Package → domain mapping |
Working |
Explicit, no fallbacks |
| Qualification run bundles |
Working |
Build + validate |
| Batch run validation |
Working |
Contract v2 |
What Is Incomplete or Scaffolded
| Capability |
Status |
Notes |
| FastAPI web application |
Scaffolded |
Only health check + 1 admin endpoint. No public API. No web UI. |
| Alembic migrations |
Present but unclear |
env.py exists, migration history not audited |
| Docker deployment |
Partial |
docker-compose for local dev only. Dockerfile.preview exists. No production deployment. |
| Workbook generation |
Conditional |
Only runs if generator has generate_workbook_markdown method (not always present) |
| LearnWorlds integration |
Checklist only |
Generates upload checklist, no actual API integration |
| Pictory integration |
Script only |
Generates video script text, no actual Pictory API integration |
| Target job resolution |
Temporary fallback |
Marked UCCA_TEMP_FALLBACK_TARGET_JOB_DELETE_ME, hardcoded per-package |
| CQ-01 / CQ-02A quality controls |
Spike |
Hardcoded to UCCA-CARE-100-002 only, not generalized |
| Source unit dependencies |
Data capture only |
Stored but "NOT enforced by the engine" (line 113, models_additions.py) |
| OSCAL compliance |
Minimal |
Single SSP test file with one control (AC-1). Schema + catalog present. |
| Multi-source-system support |
Designed but unused |
source_system field exists on SourceUnit, but all code assumes "TGA" |
| US terminology translation |
Partial |
Tables exist for healthcare/automotive/construction only |
| Video/media production |
Not started |
Thumbnail suggestions in comments only |
| Certificate generation |
Template only |
Text template in ucca_config.py, no actual PDF/image generation |
| Student enrollment/completion tracking |
Not present |
No student-facing models or logic |
What Exists in Name Only
| Capability |
Status |
Notes |
| Corpus-agnostic processing |
Not started |
Every code path assumes TGA. No abstraction layer for alternative corpora. |
| CCO (Certified Capability Object) |
Not started |
The whitepaper concept has zero implementation. No CCO model, no signing, no distribution format. |
| Triumvirate input specification |
Not started |
Only the first leg (Outcome Specification = TGA elements/criteria) is partially modeled. Compliance Ruleset and Credential Map are not present. |
| Cryptographic signing / dual-key co-signing |
Not started |
No cryptographic infrastructure at all. |
| CCO expiry / revocation |
Not started |
No temporal validity on any output. |
| Trust tier classification |
Not started |
No risk tier model. |
| Operational telemetry loop |
Not started |
The cognitive cost ledger tracks LLM costs, not CCO operational telemetry as described in the whitepaper. |
| UCCO open standard format |
Not started |
No formal schema or specification. |
| Multi-domain support |
Architecturally possible |
UCCADomain taxonomy is generic, SourceUnit has source_system field, but zero non-TGA implementation exists. |
| Production API |
Not started |
FastAPI app has no production endpoints. |
| Client onboarding |
Not started |
No mechanism for a new domain/client to bring their own corpus. |
Architecture Assessment: Current vs Target
| Dimension |
Current State |
Target (per whitepaper) |
| Input |
TGA unit code (string) |
Any corpus document conforming to Triumvirate specification |
| Ingestion |
Scrape/download from training.gov.au |
Client uploads or API delivers corpus documents |
| Schema |
TGA-specific (elements, performance_criteria, knowledge_evidence) |
Universal Triumvirate (Outcome Specification, Compliance Ruleset, Credential Map) |
| Processing |
LLM-generated course content |
Deterministic verification + structured processing |
| Output |
Course data (modules, artifacts, marketing copy) |
Certified Capability Object (CCO) |
| Integrity |
No cryptographic guarantees |
Dual-key co-signing, hash fingerprinting, immutability |
| Lifecycle |
Static output files |
Expiry, renewal, revocation |
| Trust model |
Implicit (UCCA generates, UCCA trusts) |
Explicit tiers with domain expert co-certification |
| Provenance |
Run metadata + cost ledger |
Full cryptographic audit trail |
| Distribution |
File system (courses/, run bundles) |
Compiled, encrypted CCO packages |
What Can Be Preserved in the Refactor
The following architectural primitives are sound and reusable:
- UCCADomain taxonomy — generic, not TGA-specific
- UCCAUnit identity layer — level-neutral canonical identifiers
- UCCA code allocation — transaction-safe sequence generation
- Artifact metadata system — deterministic derivation of intent/audience/content_type
- Provenance/capture architecture — best-effort, zero-impact, vendor-replaceable
- Run bundle contract system — validation, preflight gates, canary suites
- Run resolution (STRICT/PERMISSIVE) — portable RUN_REF.json pattern
- Cognitive cost ledger — append-only JSONL with cumulative tracking
- CourseSpec contract — DB-authoritative, spec-driven generation
- Course → Module → Artifact hierarchy — clean public product model
- SourceUnit → UCCAUnit → Course mapping — the abstraction is right, just needs generalization of
source_system
What Must Be Created New
- Corpus Adapter interface — abstract base class for ingesting any corpus (TGA becomes one adapter)
- Triumvirate schema — formal specification for the three legislative instrument types
- CCO model — data model for the Certified Capability Object
- Cryptographic layer — signing, verification, key management
- CCO lifecycle — expiry, renewal, revocation registry
- Trust tier framework — risk classification model
- Domain expert registration — co-signer identity and credential management
- Production API — FastAPI endpoints for corpus submission, CCO retrieval, verification
- Client isolation — per-world/per-domain data boundaries (mirrors the surfaces architecture)
Appendix: File-by-File TGA Reference Count
Quick grep results for density of TGA references:
| File |
tga refs |
TGA refs |
training.gov.au refs |
bootstrap_from_tga_unit.py |
120+ |
40+ |
2 |
tga_packages.py |
80+ |
60+ |
1 |
tga_pdf_processor.py |
40+ |
20+ |
3 |
tga_scraper.py |
15+ |
10+ |
2 |
us_course_generator.py |
20+ |
15+ |
0 |
generate_complete_course.py |
30+ |
10+ |
0 |
ingest_course.py |
50+ |
20+ |
0 |
models_additions.py |
10+ |
8 |
0 |
training_package.py |
5 |
3 |
0 |
ucca_domains.py |
8 |
6 |
0 |
create_course_spec.py |
8 |
3 |
0 |
folder_manager.py |
10+ |
5 |
1 |
ucca_config.py |
0 |
0 |
0 |
Total: ~400+ TGA-specific references across the codebase.
This document is the foundation for planning the refactor. The next step is to design the abstraction layer that separates the universal engine primitives from the TGA client implementation, so that TGA becomes adapters/tga/ and the core engine processes any Triumvirate-conformant corpus.
Version History
| Version |
Date |
Change |
Author |
| 1.0 |
2026-03-09 |
Initial creation |
Claude Code |