SmartMatch OCR
New · Target Profile v1 — define once, extract anywhere

Turn raw OCR intobusiness-ready, audited data.

SmartMatch OCR is the accuracy-assurance layer on top of Google Document AI. Define what to find with a Target Profile, then let SmartMatch locate candidates, normalize values, validate with evidence, and run human double-check — all behind a single API.

Double-check enforcedEvidence-based · bbox + contextImmutable Target Profile versionsFull audit log
POST /v1/documents/{document_id}/validate
curl -X POST \
  https://smartmatch.gloding.com/v1/documents/doc_01HE.../validate \
  -H "Authorization: Bearer $SMARTMATCH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "target_profile_version": "tp_invoice@3"
  }'
{
  "document_id": "doc_01HE...",
  "project_id": "prj_inv_jp",
  "target_profile_version": "tp_invoice@3",
  "validation_summary": {
    "overall_status": "REVIEW_REQUIRED",
    "invalid_count": 0,
    "review_required_count": 1
  },
  "fields": [{
    "field_key": "invoice_total",
    "target_id": "t_total_amount",
    "normalized_value": "12400",
    "validation": {
      "status": "REVIEW_REQUIRED",
      "reason_codes": ["LOW_CONFIDENCE"]
    },
    "evidence": { "page_no": 1, "bbox": {...} }
  }],
  "scores": { "document_score": 86 }
}
Why SmartMatch

An accuracy-assurance layer you can put in front of any Document AI.

External OCR is an engine — not the source of truth. SmartMatch adds the judgment, the evidence, and the audit trail that make automated extraction safe for production.

Target Profiles, not fixed schemas

Define what to find per project — fields, synonyms, anchors, structural rules. Immutable versions make every extraction reproducible and auditable.

Locate & Match with evidence

Beyond OCR text: SmartMatch searches the page for each target, returning candidates with bounding boxes, page numbers, and surrounding context.

Normalize & validate by rule

Type, format, range, cross-field consistency, forbidden patterns — applied deterministically on top of Document AI, with per-field reason codes.

Human-in-the-Loop + double-check

REVIEW_REQUIRED items flow to a first reviewer, then a second approver (different user enforced). Supervisor resolves mismatches.

Finalized-only external output

Intermediate results never leave the system. Only Finalized documents — tied to project_id and target_profile_version — are exposed via API.

Full, immutable audit trail

Every state transition, rule version, and human decision is append-only. Explain exactly why a value was confirmed, and by whom.

The pipeline

A deterministic state machine from upload to Finalized.

Every document moves through an explicit, auditable set of states. Invalid transitions are rejected at the API layer.

  1. 1Uploaded
    Upload

    PDF or images. Multi-page supported.

  2. 2Prechecked
    Pre-check

    Resolution, blur, completeness, locatability.

  3. 3AIProcessed
    External AI

    Google Document AI OCR & structure.

  4. 4Interpreted
    Locate / Match

    Target Profile × OCR → candidate evidence.

  5. 5Validated
    Validate

    Normalize, type/range check, score.

  6. 6Reviewed → Approved
    HITL review

    Two-person double-check when required.

  7. 7Finalized
    Finalize

    Only Finalized data is exposed via API.

How it works

Four primitives. One predictable flow.

01

Create a Target Profile

Declare what you want to extract: field keys, expected types, synonyms, anchor terms, and match rules. Publish an immutable version.

POST /v1/projects/{project_id}/target-profiles
POST /v1/target-profiles/{profile_id}/versions
02

Upload the document

Create a Document container bound to a project and target_profile_version. Add pages or upload a PDF — SmartMatch handles page splitting.

POST /v1/documents
POST /v1/documents/{document_id}/upload-pdf
03

Run OCR, locate & validate

Kick off Document AI, then let SmartMatch locate candidates, normalize, and validate against your rules — all asynchronously.

POST /v1/documents/{document_id}/ai-jobs
POST /v1/documents/{document_id}/validate
04

Review, approve, finalize

REVIEW_REQUIRED items go to HITL. A second approver (different user) double-checks. Only Finalized data is returned to downstream systems.

POST /v1/documents/{document_id}/reviews
POST /v1/documents/{document_id}/approvals
GET  /v1/documents/{document_id}/final

Ready to make your OCR pipeline accountable?

Pilot SmartMatch with your own document set and Target Profile. We'll help you design the profile, tune rules, and run the first double-check workflow end-to-end.