Document processing pipeline

Image cleanup. OCR. AI.
Stapling. Format conversion.
One pipeline.

Pull recording packets from any source, clean and OCR every page, and extract instrument-level data with an engine that verifies its own work: every field scored and cited, every batch shipped with a report of exactly what to review. Most go-forward batches post the same day, as full automation or as a first-pass in front of your keying team.

See the pipeline Book a demo

app.titletools.io · batch processing

Not another AI startup

A decade of title-plant engineering.

We’ve spent more than ten years building the storage, format-conversion, and examination systems that title plants actually run on, shipping production software against real county data, recording formats, and plant operations, long before AI was the headline. This is title-industry infrastructure with modern AI inside it, not a general-purpose model wrapped in a title-industry landing page.

10+ years in production title plantsBuilt by title-industry domain expertsStorage · Conversion · Examination

Our extraction engine

We built the extraction. And the part that checks it.

Accuracy doesn't come from pointing a general-purpose model at a scan and hoping. Over a decade against real county data, we built our own extraction-and-verification engine: every field is read, then independently checked, then scored, so you know exactly how much to trust each instrument before it posts.

Step 1 · Read

Read every field, with a citation.

Layout-aware OCR and extraction pull doc type, parties, dates, recording info, and legal description from each instrument, then pin every value to the exact pixel region it came from.

Step 2 · Verify

Check the read against itself.

A separate verification pass re-checks each field: format and pattern rules, cross-field consistency (parties against doc type, dates against the recording date), and the known quirks of that county's forms. When the passes disagree, confidence drops and the field is flagged.

Step 3 · Score

Score it, and surface the doubt.

Every field and classification carries a calibrated confidence score tied to its citation. High-confidence instruments clear on their own; anything below the bar you set lands in the extraction report for a human to confirm.

The engine is tuned in layers (base, then state, county, customer, and document type), so it keeps sharpening on the documents you actually process instead of starting cold on every batch.

The extraction report

Every batch tells you where to look.

When a batch finishes, it produces an extraction report: a confidence summary across the whole batch, the specific documents and fields that fell below your threshold, and plain-language recommendations for what to review before anything posts. You never have to guess whether a batch is safe to deliver.

Batch-level confidence summary: how many instruments cleared, how many need a second look.
Field-level flags with the reason attached: low OCR confidence, a failed format check, a cross-field mismatch.
Recommendations in plain language: "review four handwritten grantor names," not a wall of error codes.
Exported alongside the transmission, so QA keeps a record of exactly what was checked.

app.titletools.io · extraction report

Buncombe County, NC

381 documents · go-forward batch

Ready to post

Batch confidence

94%

358 auto-cleared17 to review6 failed validation

Flagged for review

Deed of Trust #2026-04188

Grantor name: Low OCR confidence · handwritten

71%

Warranty Deed #2026-04221

Legal description: Failed closure check

64%

Satisfaction #2026-04193

Recording date: Cross-field mismatch

78%

Recommended: review 6 documents before posting. 4 handwritten grantor names and 2 legal descriptions failed verification. Everything else cleared above your 90% threshold.

Built for volume

Designed to process millions of documents.

From day one the pipeline was built for the largest loads in the title industry: full county back-files, multi-year archives, daily go-forward at scale. Work fans across a pool of parallel workers, so throughput scales with the hardware behind it, not with the size of any one batch.

Speed

Hours, not days

Pages clear in parallel, not single-file. A hundred-thousand-page load finishes in roughly the window a few thousand would; you just put more workers on it. Same-day go-forward posting instead of next-day batch turnaround.

Capacity

Millions of pages

Batches in the hundreds of thousands of pages run unattended and resume cleanly on any partial failure, so a million-page back-file is a matter of time, never a restart from zero. Add workers to add throughput.

Economics

↑ 10–40× pages / $

Highest leverage on routine instrument types (deeds, releases, satisfactions) that run at full automation. As a first-pass in front of an existing keying team, per-seat throughput typically lands 3 to 6 times higher without adding heads.

Under the hood: a horizontally-scaled worker pipeline that pulls, cleans, OCRs, classifies, and extracts in parallel, the same engine whether you’re posting one day’s recordings or rebuilding an entire plant. Throughput and cost figures are indicative ranges, not promises; we’ll model your cost per instrument against a sample batch of your data on the demo call.

Format-to-format conversion

From any source. To any plant.

Flexible import and export adapters. We support the industry-standard transmission formats (TitleSearch, PropertySync, custom pipe-delimited), and we add new adapters all the time, county by county and plant by plant.

Import adapters

Industry-standard transmission formatsTitleSearch and other plant-vendor formats
County recording ZIPMulti-page TIFFs + thin EXTRACT*.csv
Mixed-format PDF packetMulti-document PDFs auto-split per instrument
Bulk S3 / object-store pullContinuous daily ingestion
Watch folderDrop files into a tracked directory
Scheduled batchPull on a cron from any HTTP / SFTP source
Direct uploadPDF, TIFF, PNG, JPG via the workspace
Your company dropboxDrag-and-drop web uploads, emailed share links, or SFTP
Shipped drivesFor back-file collections too big to upload

Export adapters

PropertySyncDirect API post or JSON payload
TitleSearch + other industry-standard transmissionsPipe-delimited docs.txt + STAT.txt + Images/
Custom pipe-delimitedField mapping configured per project
JSON metadata sidecarFull field capture, per instrument
Multi-page TIFF per instrumentPlant-image-bank ready
ZIP archive downloadSingle bundle for manual ingest

County recorders and aggregators use the same adapters to standardize ad-hoc scanner output into clean, structured transmissions for downstream consumers. Don't see your plant's format? Tell us what you need. New adapters land continuously.

PropertySync customers

Posted straight into your plant. Validated by your plant.

We co-engineered PropertySync, the platform running production title plants on billions of records, so for PropertySync customers this isn't an export you import. It's a native post that lands under your rules and passes your plant's own checks on the way in.

A direct post, not a drop file.

Instruments post straight into your plant through the PropertySync API, with no intermediate transmission to generate, hand off, and re-import. The batch you approve is the batch your plant receives.

Under your own posting rules.

The pipeline posts using your plant's configured posting rules, so data lands formatted and routed exactly the way your plant already expects it, not a generic export you have to reshape.

Checked by your live validations.

Every instrument runs against your plant's live validation rules before it posts. Anything the plant would reject shows up in the extraction report first, so you fix it here instead of chasing rejections later.

Your auto-completes, applied inline.

Your plant's auto-completes and lookups (party normalization, subdivision and legal-description codes) run as the batch posts, so new instruments match the records already in your plant.

Not on PropertySync? The pipeline still delivers to any plant through the export adapters above. PropertySync customers just get the deepest, most direct path in.

Legal descriptions & the land base

Every legal, checked against a real land base.

Indexing a document is half the job. The other half is tying its legal description to the county's actual additions, abstracts, and surveys. The pipeline builds and maintains that land base, validates every posted legal against it, and recovers the legals other pipelines skip.

A curated land base, not keystrokes.

We build and maintain the validation tables your plant searches against: every platted addition, Texas abstract and survey, and section-township-range grid, harvested from the recorded documents themselves and cleared by a human before a single value posts. Subdivision, abstract, and acreage systems are all first-class.

AutoLocate: legals recovered, not skipped.

Plenty of recorded documents carry no usable legal description, just a reference to a prior instrument. AutoLocate chases that reference, or walks the party chain already in your plant, and inherits the referenced legal with a confidence score. Uncertain resolutions hold for review in your existing batch flow, with the evidence attached.

A locate report with every batch.

Each delivered batch includes a locate categorization report: which documents matched the land base directly, which AutoLocate resolved and how, and which genuinely need a human locate. Nothing lands in your plant unclassified, and nothing disappears silently.

New plants built from scratch.

Standing up a plant where none exists? The same pipeline constructs the vocabulary and land base for a new county from its recorded documents, calibrates the result against reference data, and delivers it in your plant's format. Backplants and full county back-files run through the identical engine.

Two operating modes

Full automation or first-pass for your keyers.

Same pipeline, two ways to deploy. Plant operators run it end to end from county source to plant; keying services run it as a first pass in front of their team. Pick the one that matches your operation today; you can move between them later.

Full automation

Go-forward daily posting from county source to plant. Documents that classify and extract with confidence above your threshold flow straight to delivery; only flagged exceptions surface for human review.

Per-county confidence thresholds.
Auto-deliver into the plant via API or scheduled drop.
End-to-end intake-to-plant when you don't have or don't want a separate keying step.
Same audit trail as a keyed transmission, with overrides logged with author and timestamp.

First-pass for keyers

Run the pipeline ahead of your existing keying team. Your keyers open instruments that already have doc-type, parties, dates, and recording info filled in. They verify and correct instead of typing from scratch.

Per-seat throughput typically 3 to 6 times higher, without adding headcount.
Keyers focus on hard documents: handwritten, damaged, unusual instruments.
Routine instruments (deeds, releases, satisfactions) clear themselves.
Every keyer override is preserved alongside the AI suggestion for QA review.

Prefer to keep keying your own plant? The keyboard-driven Stapler, Index, and Examine workspaces the pipeline uses are available as standalone tools for your team.

How a batch moves

Five steps. Most of them you'll watch, not drive.

The pipeline runs in the background; you intervene where judgment matters.

Acquire from anywhere.

Drop a ZIP into the workspace, point us at an S3 bucket, or schedule a pull. Every batch shows up on one screen with the page count, document count, status, and progress visible at a glance.

ZIP, S3, SFTP, HTTP, watch folder, or direct upload.
A secure dropbox for your company: drag-and-drop in the browser, an emailed link for your county contact, or SFTP from a script. Shipped drives welcome for the big stuff.
Resume on partial failure, with no re-ingesting what you already have.

Staple: group pages into documents.

Recording packets arrive as single-page TIFFs. The Stapler workspace groups them into multi-page instruments. Auto-Stapler suggests boundaries from layout and content, an operator confirms or adjusts from the keyboard. Page-level rotate, delete, and insert without leaving home row.

Single-page TIFFs in, multi-page instruments out.
Auto-Stapler suggestions confirmed (or overridden) by the operator.
Image cleanup runs as part of intake: deskew, denoise, auto-rotate.

Index: capture the metadata fields.

For each stapled document, capture the metadata fields the target plant needs: doc type, instrument number, recording date, book/page, grantor, grantee, parcel. AI auto-fills with confidence scores; the indexer accepts or overrides. Exceptions surface here for human attention.

Custom field schema per project / county / plant.
AI suggestions inline with confidence scores, with keyboard-driven accept or override.
Field carry-forward across documents (recording date, county, etc.).

Examine: verify against the source.

Open any document and see every extracted field next to the source page region that produced it, with bounding boxes drawn over the original scan. Confidence under threshold? Click to override; both the AI suggestion and the correction are preserved in the audit trail.

Split-screen viewer with field-to-page citation linking.
Override any field; both the AI suggestion and the correction are preserved.
Every override carries author and timestamp into the audit trail.

Export in your plant's format.

When the batch is ready, emit a transmission archive: direct to your plant's API, into a watched drop folder, or as a download. Same audit trail every keyed transmission already carries.

PropertySync, industry-standard pipe-delimited, custom CSV, JSON, or ZIP.
Plant-image-bank-ready multi-page TIFFs per instrument.
Transmission statistics file (STAT.txt) generated automatically.

Trust

No black-box deliveries.

If your plant or your customer raises a question about a posted document, you can answer it.

Same audit trail as a keyed transmission.

Every classification, every extraction, every override: author, timestamp, AI confidence, original suggestion. Inspectable months later, exportable on request.

Reproducible transmissions.

A delivered transmission can be regenerated bit-for-bit from the source batch and the pipeline version. No mystery deltas if a plant raises a question.

You set the confidence bar.

Per-county, per-doc-type thresholds for what auto-clears versus what surfaces for review. Tune over time as you build trust in specific instrument types.

See it on your data

Book a 30-minute pipeline demo.

Send us a sample county recording packet (even one day's worth) and we'll run it through the pipeline live. You'll see stapling, indexing, examination, and the plant-ready transmission your plant would receive.

Or email hello@limelyte.com directly.

Image cleanup. OCR. AI.Stapling. Format conversion.One pipeline.