PatMatch | Industrial Parts Matching Modernization

What This Is

PatMatch is a specialized parts matching system built in 2019 for Kaman Industrial. It helps sales reps match customer part descriptions ("SKF sealed bearing 6203") to catalog items (SKF 6203-2RS1).

The current system is a C# Excel add-in with 50+ hand-written regex rules. It works 85% of the time. The other 15% requires manual database lookups — and that's expensive.

Context: This is an internal modernization project, not a customer-facing product.

Data: 50K+ parts catalog, 18K brand relationships, years of user corrections stored.

Users: Kaman Industrial sales reps processing RFQs (requests for quote).

The Problem: Brittle Pattern Matching

The current regex-based system fails on variations that humans handle easily. Here are real examples from the code:

// Test case 1: Word order matters (fails)
✓ "DODGE bearing 126220"        → DODGE 126220
✗ "bearing DODGE 126220"        → Can't find brand


// Test case 2: Embedded brands (fails)
✓ "FLANGE BLOCK, 2-1/2 BORE DODGE 126220"
✗ "FLANGE BLOCK, 2-1/2 BORE XXXDODGEYYY 126220"  → No word boundary


// Test case 3: Typos (fails completely)
✓ "6203RS"   → Matches 6203-RS
✗ "62O3RS"   → Typo (O vs 0) = no match


// Test case 4: Measurement edge case (inconsistent)
"4-3V33.5 QD"  →  Sometimes "43V335QD", want "43V3350QD"
                

What this costs:

• Sales reps manually search database for failed lookups

• ~50 minutes per day per sales rep wasted on edge cases

• $47,600/year in lost productivity (measured 2019-2026)

Why It Breaks

The 2019 system uses hand-written regex patterns stored in a database. Each brand has custom rules:

-- Database table: PatternMatcher_RegEx

Brand   MatchPattern              ReplacePattern
SKF     ^(\d+)-(JEM|RS|ZZ)$       $1$2          // Remove hyphens
NTN     ^(\d+)(LLU|VV)$           $1$2          // Normalize suffixes
DOD     ^(\d+)-(\d+)V(\d+\.\d+)$  ...           // Dodge sheaves
                

The problem: Every new part format requires a new regex rule. After 6 years in production, we have 50+ brand-specific patterns — and they still miss edge cases.

Typos break exact string matching
Word order must be exact ("DODGE bearing" ≠ "bearing DODGE")
Decimal handling is fragile ("33.5" sometimes becomes "335" instead of "3350")
No fuzzy matching or confidence scores

The AI Solution

Use AI to understand part descriptions semantically

• LLMs (GPT-4o-mini/Claude Haiku) extract brand + part number from unstructured text

• Vector embeddings handle typos, word order, abbreviations automatically

• Target: 95% accuracy (vs 85% today), with confidence scores

What changes: The matching engine (regex → AI embeddings)

What stays: Excel interface, database structure, user workflow

// Same examples, AI-powered parsing

✓ "DODGE bearing 126220"
✓ "bearing DODGE 126220"        → Brand: DOD, Part: 126220 (95% confident)

✓ "XXXDODGEYYY 126220"           → Extracts "DODGE" from context

✓ "62O3RS"   → Matches 6203RS (typo corrected via vector similarity)

Key improvement: Learns from corrections automatically. When sales reps fix a wrong match, the system updates — no manual regex editing.

The Business Case

$1,900 2-week validation spike

$47K Annual cost of manual lookups

25x ROI If AI matches expectations

Current accuracy: ~85% match rate (15% require manual work)
AI target: 95% match rate (5% manual work)
Time savings: 50 min/day → 5 min/day per sales rep

The test: Run AI on 500 gold-standard test cases (real customer queries). Measure: Does it beat current system on edge cases (typos, word order, complex descriptions)?

Technical Architecture

Current System (2019):

C# Excel add-in (NetOffice)
SQL Server with 6-level waterfall matching
50+ brand-specific regex patterns
BlobHistory cache, PartHistory user corrections
50K+ parts catalog, 18K brand relationships

Proposed AI System (2026):

Python + FastHTML web backend
OpenAI embeddings + GPT-4o-mini for parsing
Vector database (Pinecone/Weaviate) for fuzzy matching
LangGraph agents for multi-step reasoning
REST API + Excel add-in (keep familiar UI)
PostgreSQL + vector store

What we preserve: All domain knowledge (measurement conversions, brand hierarchies, user corrections, interchange groups). The waterfall matching logic stays — we just make each level smarter.

De-Risking Strategy

Phase 1: Validation Spike (2 weeks, $1,900)

Build gold-standard test dataset (500 expert-labeled cases)
Baseline current system accuracy on edge cases
Test AI on same dataset
Side-by-side comparison: precision, recall, latency
Go/no-go decision: Does AI beat current system? If yes → Phase 2. If no → stop.

Phase 2: Full Build (10 weeks, $14,000)

Only execute if Phase 1 proves AI superiority
Build production system with web UI + API
Shadow deployment (run both systems in parallel)
Gradual rollout with user feedback loop
A/B testing before full cutover

Total risk exposure: $1,900 to validate. Only commit $14K after proof.

Decision Point

Approve $1,900 for 2-week validation spike

We'll test AI on real Kaman parts data and show you if it actually solves the edge case problem. If it does, we build it. If it doesn't, we know definitively — and we only spent $1,900 to find out.

View Executive Presentation View Current System (C# Source)

Supporting Documentation

Analysis completed January 2026 (3 technical documents, 150KB+):

README.md - Current system architecture, 6-level waterfall, glossary of bearing terms
PATMATCH_MODERNIZATION.md (64KB) - AI redesign: vector search, LLM parsing, agentic matching
PATMATCH_EVALUATION_FRAMEWORK.md (68KB) - Testing methodology, 500-case gold standard, success criteria
PATMATCH_BEFORE_AFTER.md (22KB) - Transformation analysis, failure modes, expected improvements

Test data available: PM Samples for Scott.xlsx, QA Items.xlsx, 50K parts catalog (ItemMaster.tsv), historical queries (PartHistory/BlobHistory tables)