Designing a Constraint-Driven AI Agent for Hardware Specification Validation
The Plausibility Trap
Language models predict the next plausible token — they do not compute. In hardware configuration, this distinction is critical. A CPU has a fixed TDP. A GPU has a fixed length. A chassis has a fixed cooler clearance. These are physical facts, not interpretations.
A vanilla LLM asked to configure a system will produce output that sounds architecturally sound. It will pair a 180W CPU with a 65W cooler. It will recommend a GPU that doesn't fit the chassis. It will undersize the PSU for peak draw. Not occasionally — systematically, with full confidence. This is the Plausibility Trap: high fluency masking zero correctness.
Existing configurators (PC Part Picker and similar) solve basic compatibility — socket match, form factor. They don't compute thermal envelopes, don't reason about workload-specific tradeoffs, and don't produce an explanation of why a component was chosen or rejected. They lack context. LLMs have context but lack correctness. The problem required combining both.
Boundaries established before implementation
Before writing a line of code, I established what a valid solution must satisfy. These constraints shaped every architectural decision that followed.
Three-layer constraint pipeline
The system executes three layers in sequence for every generation cycle. The key insight: constraints are not a post-processing filter — they are a precondition of output. The LLM cannot exit the loop until the full specification passes mathematical validation.
Natural language
Intent + Proposal
5000+ SKU graph
Generate → Validate
Traced output
Every build produces a structured Decision Artifact — the mandatory trace output that makes the system's reasoning auditable without reading source code. Below is the cooler selection node from the benchmark build. Toggle between raw terminal output and structured card view.
Silent build at +32°C ambient · $1,500 budget
Comparative test: design a thermally stable, acoustically quiet PC within a fixed budget under elevated ambient temperature. Air cooling only. Scored across four dimensions: thermal validation, compatibility, decision trace, and acoustic compliance.
| System | Score | Thermal | Compatibility | Trace | Notes |
|---|---|---|---|---|---|
| PCBO v2 | 23/25 | ✓ Pass | ✓ All 7 | ✓ Complete | −2 pts: ambient risk note insufficiently specific |
| Manual (Senior Engineer) | 19/25 | ✓ Pass | ✓ Pass | ✗ None | Budget exceeded ~$80 after adequate cooling added |
| Claude 3 (vanilla) | 14/25 | ✗ No calc | ~ Partial | ✗ None | Better component awareness, PSU undersized for peak draw |
| GPT-4 (vanilla) | 11/25 | ✗ Mismatch | ✗ GPU OOB | ✗ None | CPU/cooler TDP mismatch, GPU clearance not verified |
−2 pts: ambient risk note insufficiently specific
Budget exceeded ~$80 after adequate cooling
Better component awareness than GPT-4
High confidence, zero correctness
The architecture is the transferable asset
The constraint-driven pattern — LLM proposal → ontology lookup → constraint solver → looped regeneration → decision artifact — is domain-agnostic. PCBO v2 is the first implementation. The same architecture applies to any domain where correctness is a physical or logical fact, not an interpretation.
→ LLM Layer — intent extraction + component proposal
→ Deterministic Ontology — URI-addressable fact lookup
→ Constraint Solver — generate / validate / reject loop
→ Decision Artifact — mandatory trace output