Enterprise

dArchiva

dArchiva handles the full lifecycle of document digitization — from physical scanning through multi-engine OCR, semantic indexing, access-controlled retrieval, and legal admissibility — scaled to millions of documents across government and enterprise archives.

dArchiva illustration

Key Features

From scanner to searchable — every step of the digitization pipeline engineered for accuracy, compliance, and scale.

🔍

Multi-Engine OCR

PaddleOCR (fastest, multilingual), Tesseract 5 (open baseline), and Qwen-VL (for handwritten, degraded, or complex layouts) run in ensemble. A voting mechanism selects the highest-confidence output per document region. 97%+ accuracy on typed Swahili, English, and Arabic text.

🔎

Hybrid Semantic Search

BM25 sparse retrieval combined with dense vector embeddings — sentence-transformers fine-tuned on legal and government Swahili/English corpora — in a weighted ensemble. Sub-second full-text search across 4M+ documents. Filters by date, type, department, and classification.

🔐

Layered Access Control

RBAC (Role-Based), ABAC (Attribute-Based), and ReBAC (Relationship-Based) access control unified in a single policy engine. Documents are tagged with classification levels (PUBLIC, INTERNAL, CONFIDENTIAL, SECRET). Attribute conditions — department, tenure, project — enforced at query time.

📦

Physical Inventory Tracking

QR-coded physical boxes linked to digital manifests. Scan-in / scan-out tracking for physical files across a full location hierarchy: building → floor → room → shelf → box. Overdue-return alerts pushed to department heads automatically.

🏷️

Auto Classification & Tagging

ML classifier assigns document type — contract, invoice, deed, report, memo — with 94% accuracy. Entity extraction indexes named persons, organizations, dates, and amounts for faceted search. Custom taxonomy support for department-specific classification schemes.

⚖️

Legal Admissibility

Digitization log captures scanner serial, operator ID, timestamp, and SHA-256 hash at point of capture. Compliant with Kenya Evidence Act Cap 80 provisions for electronic records. Export packages for court submission include a full chain of custody affidavit.

Technical Specifications

Scale

  • 4M+ documents in production
  • <500ms search response at scale
  • Batch processing: 10,000 pages/hour
  • Multi-tenant with data isolation

OCR Engines

  • PaddleOCR v4 (multilingual)
  • Tesseract 5 (open source)
  • Qwen-VL 7B (handwriting / complex)
  • Arabic + Swahili + English primary

Security

  • AES-256 encryption at rest
  • TLS 1.3 in transit
  • GDPR Article 17 erasure
  • Kenya Data Protection Act 2019

Compliance

  • Kenya National Archives Act
  • Evidence Act Cap 80 (digital admissibility)
  • ISO 15489 records management
  • NIST SP 800-53 Rev 5

Digitize your archive with dArchiva

Contact our team to discuss your requirements. We respond within 24 hours.