Eugene Kaizin · Case Studies · CS-001

CS-001 Complete Solo

Designing a Constraint-Driven AI Agent for Hardware Specification Validation

AI Systems Constraint Solver Knowledge Graph Deterministic Ontology LLM Integration Decision Artifact 23/25 benchmark ±5% thermal accuracy

Executive Summary

Domain

PC hardware configuration in constraint-heavy physical domains

Core Problem

LLMs produce plausible but physically invalid hardware configurations

Primary Constraint

Constraints must block generation — not filter output after the fact

Architecture Focus

Deterministic ontology layer + iterative constraint solver loop

Scale

5,000+ SKU knowledge graph · 200 physical assemblies validated

Outcome

23/25 benchmark · ±5% thermal accuracy · <30 sec per build

Problem Space

The Plausibility Trap

Language models predict the next plausible token — they do not compute. In hardware configuration, this distinction is critical. A CPU has a fixed TDP. A GPU has a fixed length. A chassis has a fixed cooler clearance. These are physical facts, not interpretations.

A vanilla LLM asked to configure a system will produce output that sounds architecturally sound. It will pair a 180W CPU with a 65W cooler. It will recommend a GPU that doesn't fit the chassis. It will undersize the PSU for peak draw. Not occasionally — systematically, with full confidence. This is the Plausibility Trap: high fluency masking zero correctness.

Existing configurators (PC Part Picker and similar) solve basic compatibility — socket match, form factor. They don't compute thermal envelopes, don't reason about workload-specific tradeoffs, and don't produce an explanation of why a component was chosen or rejected. They lack context. LLMs have context but lack correctness. The problem required combining both.

Design Constraints

Boundaries established before implementation

Before writing a line of code, I established what a valid solution must satisfy. These constraints shaped every architectural decision that followed.

01 The agent must not be able to output a physically invalid configuration. Not "rarely" — never within the validated knowledge graph.

02 Every component selection must trace to a specific parameter in a specific datasheet — not to model weights.

03 The system must explain rejected alternatives — not just present a final answer. Decisions must be auditable.

04 Full build validation must complete in under 60 seconds. Target: under 30.

05 The architecture must be domain-agnostic — replicable to other constraint-heavy domains without rebuilding from scratch.

06 Scope is limited to components with published datasheets. No-name parts and custom liquid cooling loops are explicitly out of scope — reliability requires verified data.

Architecture

Three-layer constraint pipeline

The system executes three layers in sequence for every generation cycle. The key insight: constraints are not a post-processing filter — they are a precondition of output. The LLM cannot exit the loop until the full specification passes mathematical validation.

Architecture Diagram / In Development

Three-layer constraint pipeline

User Input
Natural language

→

LLM Layer
Intent + Proposal

→

Ontology
5000+ SKU graph

→

Solver
Generate → Validate

→

Artifact
Traced output

Interactive SVG diagram · Atomic Design component · Scheduled for v1.1

Layer 1 — LLM

Intent Extraction & Component Proposal

Parses natural language input: budget ceiling, use case, fixed components, soft preferences (acoustic class, size). Produces a structured component proposal — not a text response.

Layer 2 — Deterministic Ontology

Semantic Fact-Checking Against 5,000+ SKU Knowledge Graph

Each proposed component is looked up by URI against its physical parameters: socket type, TDP envelope, dimensional footprint, PCIe lane requirements, voltage phase. No hallucination is possible at this layer — the layer is not probabilistic.

Layer 3 — Constraint Solver

Generate → Validate → Reject Loop

If the proposal violates any constraint class, the solver constructs a structured error — which constraint failed, by what margin, which component triggered it — and injects it back into LLM context. The model rebuilds. The loop exits only on a full pass across all seven constraint classes.

Architecture Decisions

Decision 01

Constraints as a generation precondition, not a filter

Reason

Post-generation filtering catches errors but doesn't prevent them. The model can still generate invalid reasoning paths that contaminate the output even when the final answer is corrected.

Tradeoff

30–40 sec validation vs. 5–10 sec for vanilla LLM. Accepted: the cost of a wrong answer in hardware procurement far exceeds the time delta.

Alternative rejected: output validation layer → generates plausible-sounding invalid paths internally before correction

Decision 02

URI-addressable knowledge graph over model memory

Reason

Physical parameters are facts, not distributions. A model that "knows" a component's TDP is making a probabilistic guess. The ontology holds the exact value from the manufacturer datasheet.

Tradeoff

Requires maintaining the SKU database. Components without confirmed datasheets are excluded. Reliability requires data — this is a feature, not a limitation.

Alternative rejected: RAG over PDF datasheets → extraction errors propagate as valid facts

Decision 03

Structured error injection over simple retry

Reason

A retry without structured feedback causes random substitution. The model needs to know exactly which constraint failed, by what margin, and which component triggered it.

Tradeoff

Error message format required 6 iterations to converge. Too vague → random substitution. Too specific → over-optimization breaks other constraints.

Alternative rejected: simple regeneration prompt → model makes random substitutions without targeted correction

Decision Artifact

Every build produces a structured Decision Artifact — the mandatory trace output that makes the system's reasoning auditable without reading source code. Below is the cooler selection node from the benchmark build. Toggle between raw terminal output and structured card view.

View mode

decision_artifact.log — cooler_selection

# ─── DECISION ARTIFACT ─── NODE: cpu_cooler ─────────────── COMPONENT: CPU Cooler SELECTED be quiet! Dark Rock 4 RATIONALE: rated_tdp 250W cpu_tdp 105W (AMD Ryzen 7 7700X) margin_factor 2.38× ✓ above 2.0× threshold at +32°C ambient height_mm 162.8mm chassis_clearance 165.0mm fit_margin 2.2mm ⚠ within spec — flagged, no adjustment room ram_conflict none ✓ width 135.4mm clears slot 1 REJECTED: Noctua NH-D15 → height: 165.0mm chassis_clearance: 165.0mm margin: 0mm [FAIL: dimensional] Thermalright Peerless Assassin 120 SE → tdp: 220W margin_at_32c: 1.76× threshold: 2.0× [FAIL: thermal] RESIDUAL RISK: ⚠ fit_margin 2.2mm — verify clearance after assembly before closing panel ⚠ if case panel bows under transport, contact is possible # ─── VALIDATION: PASS ──────────────────────────────────────

Benchmark

Silent build at +32°C ambient · $1,500 budget

Comparative test: design a thermally stable, acoustically quiet PC within a fixed budget under elevated ambient temperature. Air cooling only. Scored across four dimensions: thermal validation, compatibility, decision trace, and acoustic compliance.

View mode

System	Score	Thermal	Compatibility	Trace	Notes
PCBO v2	23/25	✓ Pass	✓ All 7	✓ Complete	−2 pts: ambient risk note insufficiently specific
Manual (Senior Engineer)	19/25	✓ Pass	✓ Pass	✗ None	Budget exceeded ~$80 after adequate cooling added
Claude 3 (vanilla)	14/25	✗ No calc	~ Partial	✗ None	Better component awareness, PSU undersized for peak draw
GPT-4 (vanilla)	11/25	✗ Mismatch	✗ GPU OOB	✗ None	CPU/cooler TDP mismatch, GPU clearance not verified

What I Learned

Error message format is the hardest engineering problem in the loop

The constraint solver itself is essentially a rule engine — straightforward to build. The hard part was designing the error message that re-enters the LLM context. Too vague: the model makes random substitutions. Too specific: the model over-optimizes for the failing constraint and breaks others. Six iterations to converge. The format that worked encodes: constraint class, failing component, margin delta, and the constraint threshold — nothing more.

Vanilla LLMs fail not from ignorance but from the absence of uncertainty signals

GPT-4 proposed a thermally mismatched cooler with full confidence. There was no hesitation, no caveat, no signal that the model was uncertain. The solver creates that signal artificially: constraint failure operationalizes uncertainty and forces a rebuild rather than a confident wrong answer. This is why the architecture works — it manufactures epistemic humility where none exists natively.

Knowing your system's validity boundary is as important as knowing what's inside it

Custom liquid cooling loops and extreme overclocking were deliberately excluded — not because the architecture can't support them, but because the reliability guarantee only holds where the knowledge graph has verified data. Extending scope without verified data would convert a reliable system into a plausible-sounding one. This is a design principle, not a limitation.

Replicability

The architecture is the transferable asset

The constraint-driven pattern — LLM proposal → ontology lookup → constraint solver → looped regeneration → decision artifact — is domain-agnostic. PCBO v2 is the first implementation. The same architecture applies to any domain where correctness is a physical or logical fact, not an interpretation.

Electrical Engineering

PCB component selection against power rail, thermal, and EMC constraints. Same solver loop, different ontology.

Infrastructure Sizing

Server configuration against rack unit, power draw, and thermal density constraints for datacenter design.

Procurement Validation

Vendor proposals validated against specification compliance requirements. Replaces manual audit with traced output.

Structural Specification

Material selection against load, weight, and cost constraints. Decision artifact provides mandatory engineering trace.

Pattern Core

User Input (natural language constraints)
→ LLM Layer — intent extraction + component proposal
→ Deterministic Ontology — URI-addressable fact lookup
→ Constraint Solver — generate / validate / reject loop
→ Decision Artifact — mandatory trace output

This is one of several projects I'm documenting

Available to walk through the architecture, the tradeoffs, and what this approach could look like applied to your domain.

Get in touch → ← Back to Systems