Eugene Kaizin · Case Studies · CS-001
CS-001 Complete Solo

Designing a Constraint-Driven AI Agent for Hardware Specification Validation

AI Systems Constraint Solver Knowledge Graph Deterministic Ontology LLM Integration Decision Artifact 23/25 benchmark ±5% thermal accuracy
Executive Summary
Domain
PC hardware configuration in constraint-heavy physical domains
Core Problem
LLMs produce plausible but physically invalid hardware configurations
Primary Constraint
Constraints must block generation — not filter output after the fact
Architecture Focus
Deterministic ontology layer + iterative constraint solver loop
Scale
5,000+ SKU knowledge graph · 200 physical assemblies validated
Outcome
23/25 benchmark · ±5% thermal accuracy · <30 sec per build
Problem Space

The Plausibility Trap

Language models predict the next plausible token — they do not compute. In hardware configuration, this distinction is critical. A CPU has a fixed TDP. A GPU has a fixed length. A chassis has a fixed cooler clearance. These are physical facts, not interpretations.

A vanilla LLM asked to configure a system will produce output that sounds architecturally sound. It will pair a 180W CPU with a 65W cooler. It will recommend a GPU that doesn't fit the chassis. It will undersize the PSU for peak draw. Not occasionally — systematically, with full confidence. This is the Plausibility Trap: high fluency masking zero correctness.

Existing configurators (PC Part Picker and similar) solve basic compatibility — socket match, form factor. They don't compute thermal envelopes, don't reason about workload-specific tradeoffs, and don't produce an explanation of why a component was chosen or rejected. They lack context. LLMs have context but lack correctness. The problem required combining both.

Design Constraints

Boundaries established before implementation

Before writing a line of code, I established what a valid solution must satisfy. These constraints shaped every architectural decision that followed.

01 The agent must not be able to output a physically invalid configuration. Not "rarely" — never within the validated knowledge graph.
02 Every component selection must trace to a specific parameter in a specific datasheet — not to model weights.
03 The system must explain rejected alternatives — not just present a final answer. Decisions must be auditable.
04 Full build validation must complete in under 60 seconds. Target: under 30.
05 The architecture must be domain-agnostic — replicable to other constraint-heavy domains without rebuilding from scratch.
06 Scope is limited to components with published datasheets. No-name parts and custom liquid cooling loops are explicitly out of scope — reliability requires verified data.
Architecture

Three-layer constraint pipeline

The system executes three layers in sequence for every generation cycle. The key insight: constraints are not a post-processing filter — they are a precondition of output. The LLM cannot exit the loop until the full specification passes mathematical validation.

Architecture Diagram / In Development
Three-layer constraint pipeline
User Input
Natural language
LLM Layer
Intent + Proposal
Ontology
5000+ SKU graph
Solver
Generate → Validate
Artifact
Traced output
Interactive SVG diagram · Atomic Design component · Scheduled for v1.1
Layer 1 — LLM
Intent Extraction & Component Proposal
Parses natural language input: budget ceiling, use case, fixed components, soft preferences (acoustic class, size). Produces a structured component proposal — not a text response.
Layer 2 — Deterministic Ontology
Semantic Fact-Checking Against 5,000+ SKU Knowledge Graph
Each proposed component is looked up by URI against its physical parameters: socket type, TDP envelope, dimensional footprint, PCIe lane requirements, voltage phase. No hallucination is possible at this layer — the layer is not probabilistic.
Layer 3 — Constraint Solver
Generate → Validate → Reject Loop
If the proposal violates any constraint class, the solver constructs a structured error — which constraint failed, by what margin, which component triggered it — and injects it back into LLM context. The model rebuilds. The loop exits only on a full pass across all seven constraint classes.
Architecture Decisions
Decision 01
Constraints as a generation precondition, not a filter
Reason
Post-generation filtering catches errors but doesn't prevent them. The model can still generate invalid reasoning paths that contaminate the output even when the final answer is corrected.
Tradeoff
30–40 sec validation vs. 5–10 sec for vanilla LLM. Accepted: the cost of a wrong answer in hardware procurement far exceeds the time delta.
Alternative rejected: output validation layer → generates plausible-sounding invalid paths internally before correction
Decision 02
URI-addressable knowledge graph over model memory
Reason
Physical parameters are facts, not distributions. A model that "knows" a component's TDP is making a probabilistic guess. The ontology holds the exact value from the manufacturer datasheet.
Tradeoff
Requires maintaining the SKU database. Components without confirmed datasheets are excluded. Reliability requires data — this is a feature, not a limitation.
Alternative rejected: RAG over PDF datasheets → extraction errors propagate as valid facts
Decision 03
Structured error injection over simple retry
Reason
A retry without structured feedback causes random substitution. The model needs to know exactly which constraint failed, by what margin, and which component triggered it.
Tradeoff
Error message format required 6 iterations to converge. Too vague → random substitution. Too specific → over-optimization breaks other constraints.
Alternative rejected: simple regeneration prompt → model makes random substitutions without targeted correction
Decision Artifact

Every build produces a structured Decision Artifact — the mandatory trace output that makes the system's reasoning auditable without reading source code. Below is the cooler selection node from the benchmark build. Toggle between raw terminal output and structured card view.

View mode
decision_artifact.log — cooler_selection
# ─── DECISION ARTIFACT ─── NODE: cpu_cooler ─────────────── COMPONENT: CPU Cooler SELECTED be quiet! Dark Rock 4 RATIONALE: rated_tdp 250W cpu_tdp 105W (AMD Ryzen 7 7700X) margin_factor 2.38× ✓ above 2.0× threshold at +32°C ambient height_mm 162.8mm chassis_clearance 165.0mm fit_margin 2.2mm ⚠ within spec — flagged, no adjustment room ram_conflict none ✓ width 135.4mm clears slot 1 REJECTED: Noctua NH-D15 → height: 165.0mm chassis_clearance: 165.0mm margin: 0mm [FAIL: dimensional] Thermalright Peerless Assassin 120 SE → tdp: 220W margin_at_32c: 1.76× threshold: 2.0× [FAIL: thermal] RESIDUAL RISK: ⚠ fit_margin 2.2mm — verify clearance after assembly before closing panel ⚠ if case panel bows under transport, contact is possible # ─── VALIDATION: PASS ──────────────────────────────────────
Benchmark

Silent build at +32°C ambient · $1,500 budget

Comparative test: design a thermally stable, acoustically quiet PC within a fixed budget under elevated ambient temperature. Air cooling only. Scored across four dimensions: thermal validation, compatibility, decision trace, and acoustic compliance.

View mode
System Score Thermal Compatibility Trace Notes
PCBO v2 23/25 ✓ Pass ✓ All 7 ✓ Complete −2 pts: ambient risk note insufficiently specific
Manual (Senior Engineer) 19/25 ✓ Pass ✓ Pass ✗ None Budget exceeded ~$80 after adequate cooling added
Claude 3 (vanilla) 14/25 ✗ No calc ~ Partial ✗ None Better component awareness, PSU undersized for peak draw
GPT-4 (vanilla) 11/25 ✗ Mismatch ✗ GPU OOB ✗ None CPU/cooler TDP mismatch, GPU clearance not verified
What I Learned
Error message format is the hardest engineering problem in the loop
The constraint solver itself is essentially a rule engine — straightforward to build. The hard part was designing the error message that re-enters the LLM context. Too vague: the model makes random substitutions. Too specific: the model over-optimizes for the failing constraint and breaks others. Six iterations to converge. The format that worked encodes: constraint class, failing component, margin delta, and the constraint threshold — nothing more.
Vanilla LLMs fail not from ignorance but from the absence of uncertainty signals
GPT-4 proposed a thermally mismatched cooler with full confidence. There was no hesitation, no caveat, no signal that the model was uncertain. The solver creates that signal artificially: constraint failure operationalizes uncertainty and forces a rebuild rather than a confident wrong answer. This is why the architecture works — it manufactures epistemic humility where none exists natively.
Knowing your system's validity boundary is as important as knowing what's inside it
Custom liquid cooling loops and extreme overclocking were deliberately excluded — not because the architecture can't support them, but because the reliability guarantee only holds where the knowledge graph has verified data. Extending scope without verified data would convert a reliable system into a plausible-sounding one. This is a design principle, not a limitation.
Replicability

The architecture is the transferable asset

The constraint-driven pattern — LLM proposal → ontology lookup → constraint solver → looped regeneration → decision artifact — is domain-agnostic. PCBO v2 is the first implementation. The same architecture applies to any domain where correctness is a physical or logical fact, not an interpretation.

Electrical Engineering
PCB component selection against power rail, thermal, and EMC constraints. Same solver loop, different ontology.
Infrastructure Sizing
Server configuration against rack unit, power draw, and thermal density constraints for datacenter design.
Procurement Validation
Vendor proposals validated against specification compliance requirements. Replaces manual audit with traced output.
Structural Specification
Material selection against load, weight, and cost constraints. Decision artifact provides mandatory engineering trace.
Pattern Core
User Input (natural language constraints)
LLM Layer — intent extraction + component proposal
Deterministic Ontology — URI-addressable fact lookup
Constraint Solver — generate / validate / reject loop
Decision Artifact — mandatory trace output
This is one of several projects I'm documenting
Available to walk through the architecture, the tradeoffs, and what this approach could look like applied to your domain.