AI Invoice Processing for Construction: Making AI Learn Your Patterns (Part 2 of 4)

Teach AI to suggest job codes based on vendor history. No model retraining required—just pattern matching from your own data.

Difficulty: Journeyman

This is Part 2 of a 4-part series on AI invoice processing for construction. Read Part 1: When One AI Isn't Enough


In Part 1, we covered why a hybrid approach—combining OCR for extraction and an LLM for interpretation—works better than either tool alone. The system can now read invoices, including handwritten job codes scrawled in the margins.

But there's still a problem. Every invoice gets treated like the first one the system has ever seen. Your lumber supplier sends you 50 invoices a month, all coded to the same handful of jobs and cost codes. The AI doesn't remember any of that. It starts from scratch every time.

This article covers how to fix that by teaching the system to recognize patterns without actually retraining the AI model.


The Problem With Starting Fresh

Most AI tools are stateless. They process what you give them, return a result, and forget everything. That works fine for one-off tasks. It doesn't work for repetitive workflows where context matters.

Construction invoicing is repetitive by nature. The same vendors show up month after month. Certain vendors always bill to certain jobs. Material suppliers hit the same cost codes repeatedly. An experienced AP clerk knows all of this. They see an invoice from the concrete supplier and already know it's probably job 24-112, cost code 03-300.

The AI doesn't know any of that unless you tell it. And telling it every single time defeats the purpose of automation.


Coding History: Memory Without Retraining

The fix is simpler than it sounds. Instead of retraining the model, you store a history of past coding decisions and feed relevant examples to the AI at processing time.

Here's how it works:

When an invoice gets processed and approved, the system saves the vendor name, the job number, cost codes, and any other coded fields. That record goes into a database—nothing fancy, just a log of what was coded and when.

The next time an invoice arrives from the same vendor, the system pulls up the last several coding records for that vendor and includes them in the prompt. The AI now has context: "Here's how invoices from this vendor were coded previously."

The AI uses that history to make better suggestions. If the last ten invoices from your drywall supplier all went to job 24-087, cost code 09-250, the system suggests the same coding for the new invoice. The person reviewing it can accept, modify, or override.

No model retraining. No fine-tuning. Just pattern matching based on your own historical data.


What Gets Stored

The coding history doesn't need to be complicated. At minimum, you want:

  • Vendor name (normalized so "ACME SUPPLY" and "Acme Supply Inc" match)
  • Invoice date
  • Job number
  • Cost code(s)
  • Amount
  • Who approved it

You can add more if it helps—GL accounts, tax codes, retention flags. The point is to capture whatever fields your accounting system needs so the AI can suggest complete coding, not just partial matches.

Vendor normalization matters. Invoices come in with inconsistent vendor names. The OCR might read "MARTIN LUMBER CO" on one invoice and "Martin Lumber" on another. If you treat those as different vendors, the history doesn't connect. A simple normalization step—lowercase, strip punctuation, maybe fuzzy matching—keeps the history useful.


Confidence Scoring

Not all suggestions are equal. If a vendor has 50 invoices in the history and they all went to the same job, that's a high-confidence suggestion. If a vendor has two invoices that went to different jobs, that's low confidence.

The system can calculate a confidence score based on:

  • How many historical records exist for this vendor
  • How consistent the past coding was
  • How recent the records are (a job from two years ago might be closed)
  • Whether the invoice amount is similar to past invoices

High-confidence suggestions can be auto-filled. Low-confidence suggestions get flagged for human review. This is where automation actually saves time—the routine invoices flow through with minimal touch, while the exceptions get attention.


Handling Exceptions

Patterns break. A vendor you've always coded to one job starts billing to a new project. The system suggests the old coding, which is now wrong.

This is fine. The human reviewer catches it, corrects it, and approves. That correction goes into the history. After a few invoices to the new job, the system learns the new pattern.

The key is treating corrections as training data. Every override teaches the system something. Over time, the suggestions get better because they're based on your actual coding decisions, not generic rules.


Why This Works Better Than Rules

You could try to build a rules engine instead. "If vendor = ACME SUPPLY, then job = 24-105." But rules are brittle. They break when projects change, when vendor names vary, when someone forgets to update them.

The history-based approach adapts automatically. You don't maintain rules. You just process invoices normally, and the system learns from what you do.

It also handles ambiguity better. Rules are binary—match or don't match. The AI can look at history and make a judgment call: "This vendor usually bills to job 24-105, but the invoice mentions 'Phase 2 foundation work,' and there's a newer job 24-112 that's in the foundation phase. Suggesting 24-112 with medium confidence."

That kind of reasoning is hard to encode in rules. It's natural for an LLM with the right context.


What's Next

The pattern learning system handles the "what job does this go to" question. But construction accounting has more moving parts. The job number needs to exist in your system. The cost code needs to be valid for that job. The vendor needs to be set up.

Part 3 covers how to connect the AI to your accounting software—validating codes, looking up vendor records, and making sure suggestions are actually usable before anyone sees them.


This series is based on a production invoice processing system built for Sage 300 CRE. The concepts apply to any construction accounting workflow where historical patterns can inform future coding decisions.