PatMatch: Fix Industrial Parts Lookup

Kaman Industrial's 2019 parts matching system costs $47K/year in manual lookups. AI could cut that 10x.

What This Is

PatMatch is a specialized parts matching system built in 2019 for Kaman Industrial. It helps sales reps match customer part descriptions ("SKF sealed bearing 6203") to catalog items (SKF 6203-2RS1).

The current system is a C# Excel add-in with 50+ hand-written regex rules. It works 85% of the time. The other 15% requires manual database lookups — and that's expensive.

Context: This is an internal modernization project, not a customer-facing product.

Data: 50K+ parts catalog, 18K brand relationships, years of user corrections stored.

Users: Kaman Industrial sales reps processing RFQs (requests for quote).

The Problem: Brittle Pattern Matching

The current regex-based system fails on variations that humans handle easily. Here are real examples from the code:

// Test case 1: Word order matters (fails) ✓ "DODGE bearing 126220" → DODGE 126220 ✗ "bearing DODGE 126220" → Can't find brand // Test case 2: Embedded brands (fails) ✓ "FLANGE BLOCK, 2-1/2 BORE DODGE 126220" ✗ "FLANGE BLOCK, 2-1/2 BORE XXXDODGEYYY 126220" → No word boundary // Test case 3: Typos (fails completely) ✓ "6203RS" → Matches 6203-RS ✗ "62O3RS" → Typo (O vs 0) = no match // Test case 4: Measurement edge case (inconsistent) "4-3V33.5 QD" → Sometimes "43V335QD", want "43V3350QD"

What this costs:

• Sales reps manually search database for failed lookups

• ~50 minutes per day per sales rep wasted on edge cases

• $47,600/year in lost productivity (measured 2019-2026)

Why It Breaks

The 2019 system uses hand-written regex patterns stored in a database. Each brand has custom rules:

-- Database table: PatternMatcher_RegEx Brand MatchPattern ReplacePattern SKF ^(\d+)-(JEM|RS|ZZ)$ $1$2 // Remove hyphens NTN ^(\d+)(LLU|VV)$ $1$2 // Normalize suffixes DOD ^(\d+)-(\d+)V(\d+\.\d+)$ ... // Dodge sheaves

The problem: Every new part format requires a new regex rule. After 6 years in production, we have 50+ brand-specific patterns — and they still miss edge cases.

  • Typos break exact string matching
  • Word order must be exact ("DODGE bearing" ≠ "bearing DODGE")
  • Decimal handling is fragile ("33.5" sometimes becomes "335" instead of "3350")
  • No fuzzy matching or confidence scores

The AI Solution

Use AI to understand part descriptions semantically

• LLMs (GPT-4o-mini/Claude Haiku) extract brand + part number from unstructured text

• Vector embeddings handle typos, word order, abbreviations automatically

• Target: 95% accuracy (vs 85% today), with confidence scores

What changes: The matching engine (regex → AI embeddings)

What stays: Excel interface, database structure, user workflow

// Same examples, AI-powered parsing ✓ "DODGE bearing 126220" ✓ "bearing DODGE 126220" → Brand: DOD, Part: 126220 (95% confident) ✓ "XXXDODGEYYY 126220" → Extracts "DODGE" from context ✓ "62O3RS" → Matches 6203RS (typo corrected via vector similarity)

Key improvement: Learns from corrections automatically. When sales reps fix a wrong match, the system updates — no manual regex editing.

The Business Case

$1,900 2-week validation spike
$47K Annual cost of manual lookups
25x ROI If AI matches expectations

Current accuracy: ~85% match rate (15% require manual work)
AI target: 95% match rate (5% manual work)
Time savings: 50 min/day → 5 min/day per sales rep

The test: Run AI on 500 gold-standard test cases (real customer queries). Measure: Does it beat current system on edge cases (typos, word order, complex descriptions)?

Technical Architecture

Current System (2019):

  • C# Excel add-in (NetOffice)
  • SQL Server with 6-level waterfall matching
  • 50+ brand-specific regex patterns
  • BlobHistory cache, PartHistory user corrections
  • 50K+ parts catalog, 18K brand relationships

Proposed AI System (2026):

  • Python + FastHTML web backend
  • OpenAI embeddings + GPT-4o-mini for parsing
  • Vector database (Pinecone/Weaviate) for fuzzy matching
  • LangGraph agents for multi-step reasoning
  • REST API + Excel add-in (keep familiar UI)
  • PostgreSQL + vector store

What we preserve: All domain knowledge (measurement conversions, brand hierarchies, user corrections, interchange groups). The waterfall matching logic stays — we just make each level smarter.

De-Risking Strategy

Phase 1: Validation Spike (2 weeks, $1,900)

  • Build gold-standard test dataset (500 expert-labeled cases)
  • Baseline current system accuracy on edge cases
  • Test AI on same dataset
  • Side-by-side comparison: precision, recall, latency
  • Go/no-go decision: Does AI beat current system? If yes → Phase 2. If no → stop.

Phase 2: Full Build (10 weeks, $14,000)

  • Only execute if Phase 1 proves AI superiority
  • Build production system with web UI + API
  • Shadow deployment (run both systems in parallel)
  • Gradual rollout with user feedback loop
  • A/B testing before full cutover

Total risk exposure: $1,900 to validate. Only commit $14K after proof.

Decision Point

Approve $1,900 for 2-week validation spike

We'll test AI on real Kaman parts data and show you if it actually solves the edge case problem. If it does, we build it. If it doesn't, we know definitively — and we only spent $1,900 to find out.

Supporting Documentation

Analysis completed January 2026 (3 technical documents, 150KB+):

  • README.md - Current system architecture, 6-level waterfall, glossary of bearing terms
  • PATMATCH_MODERNIZATION.md (64KB) - AI redesign: vector search, LLM parsing, agentic matching
  • PATMATCH_EVALUATION_FRAMEWORK.md (68KB) - Testing methodology, 500-case gold standard, success criteria
  • PATMATCH_BEFORE_AFTER.md (22KB) - Transformation analysis, failure modes, expected improvements

Test data available: PM Samples for Scott.xlsx, QA Items.xlsx, 50K parts catalog (ItemMaster.tsv), historical queries (PartHistory/BlobHistory tables)