New · Target Profile v1 — define once, extract anywhere

Turn raw OCR into
business-ready, audited data.

SmartMatch OCR is the accuracy-assurance layer on top of Google Document AI. Define what to find with a Target Profile, then let SmartMatch locate candidates, normalize values, validate with evidence, and run human double-check — all behind a single API.

Read the API docs Request access

Double-check enforcedEvidence-based · bbox + contextImmutable Target Profile versionsFull audit log

POST /v1/documents/{document_id}/validate

curl -X POST \
  https://smartmatch.gloding.com/v1/documents/doc_01HE.../validate \
  -H "Authorization: Bearer $SMARTMATCH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "target_profile_version": "tp_invoice@3"
  }'

{
  "document_id": "doc_01HE...",
  "project_id": "prj_inv_jp",
  "target_profile_version": "tp_invoice@3",
  "validation_summary": {
    "overall_status": "REVIEW_REQUIRED",
    "invalid_count": 0,
    "review_required_count": 1
  },
  "fields": [{
    "field_key": "invoice_total",
    "target_id": "t_total_amount",
    "normalized_value": "12400",
    "validation": {
      "status": "REVIEW_REQUIRED",
      "reason_codes": ["LOW_CONFIDENCE"]
    },
    "evidence": { "page_no": 1, "bbox": {...} }
  }],
  "scores": { "document_score": 86 }
}

Why SmartMatch

An accuracy-assurance layer you can put in front of any Document AI.

External OCR is an engine — not the source of truth. SmartMatch adds the judgment, the evidence, and the audit trail that make automated extraction safe for production.

Target Profiles, not fixed schemas

Define what to find per project — fields, synonyms, anchors, structural rules. Immutable versions make every extraction reproducible and auditable.

Locate & Match with evidence

Beyond OCR text: SmartMatch searches the page for each target, returning candidates with bounding boxes, page numbers, and surrounding context.

Normalize & validate by rule

Type, format, range, cross-field consistency, forbidden patterns — applied deterministically on top of Document AI, with per-field reason codes.

Human-in-the-Loop + double-check

REVIEW_REQUIRED items flow to a first reviewer, then a second approver (different user enforced). Supervisor resolves mismatches.

Finalized-only external output

Intermediate results never leave the system. Only Finalized documents — tied to project_id and target_profile_version — are exposed via API.

Full, immutable audit trail

Every state transition, rule version, and human decision is append-only. Explain exactly why a value was confirmed, and by whom.

The pipeline

A deterministic state machine from upload to Finalized.

Every document moves through an explicit, auditable set of states. Invalid transitions are rejected at the API layer.

1Uploaded
Upload
PDF or images. Multi-page supported.
2Prechecked
Pre-check
Resolution, blur, completeness, locatability.
3AIProcessed
External AI
Google Document AI OCR & structure.
4Interpreted
Locate / Match
Target Profile × OCR → candidate evidence.
5Validated
Validate
Normalize, type/range check, score.
6Reviewed → Approved
HITL review
Two-person double-check when required.
7Finalized
Finalize
Only Finalized data is exposed via API.

How it works

Four primitives. One predictable flow.

Create a Target Profile

Declare what you want to extract: field keys, expected types, synonyms, anchor terms, and match rules. Publish an immutable version.

POST /v1/projects/{project_id}/target-profiles
POST /v1/target-profiles/{profile_id}/versions

Upload the document

Create a Document container bound to a project and target_profile_version. Add pages or upload a PDF — SmartMatch handles page splitting.

POST /v1/documents
POST /v1/documents/{document_id}/upload-pdf

Run OCR, locate & validate

Kick off Document AI, then let SmartMatch locate candidates, normalize, and validate against your rules — all asynchronously.

POST /v1/documents/{document_id}/ai-jobs
POST /v1/documents/{document_id}/validate

Review, approve, finalize

REVIEW_REQUIRED items go to HITL. A second approver (different user) double-checks. Only Finalized data is returned to downstream systems.

POST /v1/documents/{document_id}/reviews
POST /v1/documents/{document_id}/approvals
GET  /v1/documents/{document_id}/final

Ready to make your OCR pipeline accountable?

Pilot SmartMatch with your own document set and Target Profile. We'll help you design the profile, tune rules, and run the first double-check workflow end-to-end.

Explore the API Talk to the team

Turn raw OCR intobusiness-ready, audited data.