Turn raw OCR into
business-ready, audited data.
SmartMatch OCR is the accuracy-assurance layer on top of Google Document AI. Define what to find with a Target Profile, then let SmartMatch locate candidates, normalize values, validate with evidence, and run human double-check — all behind a single API.
curl -X POST \
https://smartmatch.gloding.com/v1/documents/doc_01HE.../validate \
-H "Authorization: Bearer $SMARTMATCH_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"target_profile_version": "tp_invoice@3"
}'{
"document_id": "doc_01HE...",
"project_id": "prj_inv_jp",
"target_profile_version": "tp_invoice@3",
"validation_summary": {
"overall_status": "REVIEW_REQUIRED",
"invalid_count": 0,
"review_required_count": 1
},
"fields": [{
"field_key": "invoice_total",
"target_id": "t_total_amount",
"normalized_value": "12400",
"validation": {
"status": "REVIEW_REQUIRED",
"reason_codes": ["LOW_CONFIDENCE"]
},
"evidence": { "page_no": 1, "bbox": {...} }
}],
"scores": { "document_score": 86 }
}An accuracy-assurance layer you can put in front of any Document AI.
External OCR is an engine — not the source of truth. SmartMatch adds the judgment, the evidence, and the audit trail that make automated extraction safe for production.
Target Profiles, not fixed schemas
Define what to find per project — fields, synonyms, anchors, structural rules. Immutable versions make every extraction reproducible and auditable.
Locate & Match with evidence
Beyond OCR text: SmartMatch searches the page for each target, returning candidates with bounding boxes, page numbers, and surrounding context.
Normalize & validate by rule
Type, format, range, cross-field consistency, forbidden patterns — applied deterministically on top of Document AI, with per-field reason codes.
Human-in-the-Loop + double-check
REVIEW_REQUIRED items flow to a first reviewer, then a second approver (different user enforced). Supervisor resolves mismatches.
Finalized-only external output
Intermediate results never leave the system. Only Finalized documents — tied to project_id and target_profile_version — are exposed via API.
Full, immutable audit trail
Every state transition, rule version, and human decision is append-only. Explain exactly why a value was confirmed, and by whom.
A deterministic state machine from upload to Finalized.
Every document moves through an explicit, auditable set of states. Invalid transitions are rejected at the API layer.
- 1UploadedUpload
PDF or images. Multi-page supported.
- 2PrecheckedPre-check
Resolution, blur, completeness, locatability.
- 3AIProcessedExternal AI
Google Document AI OCR & structure.
- 4InterpretedLocate / Match
Target Profile × OCR → candidate evidence.
- 5ValidatedValidate
Normalize, type/range check, score.
- 6Reviewed → ApprovedHITL review
Two-person double-check when required.
- 7FinalizedFinalize
Only Finalized data is exposed via API.
Four primitives. One predictable flow.
Create a Target Profile
Declare what you want to extract: field keys, expected types, synonyms, anchor terms, and match rules. Publish an immutable version.
POST /v1/projects/{project_id}/target-profiles
POST /v1/target-profiles/{profile_id}/versionsUpload the document
Create a Document container bound to a project and target_profile_version. Add pages or upload a PDF — SmartMatch handles page splitting.
POST /v1/documents
POST /v1/documents/{document_id}/upload-pdfRun OCR, locate & validate
Kick off Document AI, then let SmartMatch locate candidates, normalize, and validate against your rules — all asynchronously.
POST /v1/documents/{document_id}/ai-jobs
POST /v1/documents/{document_id}/validateReview, approve, finalize
REVIEW_REQUIRED items go to HITL. A second approver (different user) double-checks. Only Finalized data is returned to downstream systems.
POST /v1/documents/{document_id}/reviews
POST /v1/documents/{document_id}/approvals
GET /v1/documents/{document_id}/finalReady to make your OCR pipeline accountable?
Pilot SmartMatch with your own document set and Target Profile. We'll help you design the profile, tune rules, and run the first double-check workflow end-to-end.