The Architecture Behind Qvault

The problem

Legal professionals handle some of the most sensitive information in existence — medical records, financial statements, personal identifiers. The standard approach in most software is to send this data to the cloud for processing. We think that's the wrong model for this kind of work.

Qvault takes a different approach: every byte of every document stays on your computer. No cloud processing, no uploads, no server-side analysis. The entire pipeline — from document ingestion to PII detection to redacted export — runs locally.

This article explains how that works under the hood.

Key numbers

0 bytes sent to the cloud. 5 jurisdictions covered. 18 IPC commands in the bridge layer. <100ms typical scan time.

Local-first architecture

Qvault is built on Tauri v2, which pairs a Rust backend with a React 18 frontend rendered through native webviews. Unlike Electron, there's no bundled Chromium — the app uses the operating system's own webview, which keeps the installed size around 15-20 MB.

The Rust backend handles everything security-critical: document parsing, PII detection, and storage. The frontend is purely a presentation layer. Communication between the two happens through 18 IPC commands that cover:

Document management (import, list, delete)
PII scanning and detection
Redaction review and approval
Cross-document entity tracking
Redacted document export
License management

These are in-process function calls, not network requests. The serialization overhead is measured in microseconds.

Dual-layer PII detection engine

The detection system runs two independent passes over every document. Each layer catches different types of sensitive information, and their results are merged with overlap prevention.

Layer 1: Pattern-based regex scanner

The first layer uses compiled regular expressions to detect structured data with known formats:

Email addresses and phone numbers
Credit card numbers (with Luhn validation)
IBAN numbers
Dates in various formats
Region-specific identifiers: US Social Security numbers, EU VAT numbers, Brazilian CPF/CNPJ, German tax IDs

The regex patterns are compiled to deterministic finite automata, which guarantees linear-time matching regardless of input size. These detections get a 0.95 confidence score — high, because structured patterns have very low false-positive rates.

Layer 2: Heuristic context scanner

The second layer handles unstructured data — primarily names and company entities, which don't have fixed formats. It runs six detection passes that analyze:

Company legal suffixes (LLC, GmbH, S.A., Ltd, etc.)
Name-like patterns using capitalization and word-boundary analysis
Distribution tables for common first and last names
Contextual signals from surrounding text

To minimize false positives, the heuristic layer maintains 25 stop phrases and 71 stop words — common terms that look like names but aren't (e.g., "General Court", "Supreme Court"). Confidence scores for heuristic detections range from 0.75 to 0.92 depending on the strength of the contextual signal.

Document processing pipeline

When a document enters Qvault, it goes through a fixed pipeline:

Upload — the file is read from disk into memory
Extract text — parsed to extract searchable text with position coordinates
Scan PII — both detection layers run against the extracted text
Review — the user approves, rejects, or edits each detection
Export — a redacted copy is generated with black-box redactions

Text extraction

PDF parsing uses lopdf to walk the PDF object tree, while PDF.js provides coordinate-mapped text spans for precise overlay positioning. DOCX files are handled as ZIP archives with XML parsing to extract text runs and paragraph structure.

Storage

Everything is stored in a local SQLite database running in WAL mode for concurrent read/write access. Seven tables track documents, redactions, entities, page text, licenses, audit logs, and credits.

One particularly useful feature: the cross-document entity knowledge base. As you process more documents, Qvault builds a database of known entities (people, companies) across your document corpus. This means detection accuracy improves over time — if a name was confirmed in one document, it gets flagged with higher confidence in subsequent documents.

Frontend rendering

The frontend manages a multi-layer rendering system for the document viewer:

PDF canvas rendering at 1.5x viewport scale for crisp display
Text span extraction with dual coordinate systems (PDF space and screen space)
Color-coded redaction overlays that the user can accept, reject, or manually adjust
Coordinate mapping back to PDF space for accurate export

The challenge here is keeping two coordinate systems in sync. PDF coordinates are bottom-left origin with points as units. Screen coordinates are top-left origin with pixels. Every overlay position requires a transformation between these systems, accounting for zoom level, page offset, and DPI scaling.

What Qvault does not do

This is as important as what it does:

No cloud uploads of any kind
No telemetry or usage tracking
No third-party processing APIs
No external model downloads
No temporary cloud storage

The trust model is simple: trust the machine, distrust the network. By eliminating all network-based processing, we eliminate an entire category of threat vectors. There are no API keys to leak, no cloud buckets to misconfigure, no data-in-transit to intercept.

Cross-platform distribution

Qvault ships native binaries for all major platforms:

macOS: .dmg installers for both Apple Silicon and Intel
Windows: .exe and .msi installers with multilingual support
Linux: .deb packages and AppImage

Jurisdictional coverage

The PII detection engine covers five jurisdictions out of the box: global patterns (email, phone, credit cards), US-specific (SSN, state IDs), EU-specific (VAT, IBAN), Brazilian (CPF, CNPJ), and German (Steuer-ID, tax numbers). This enables firms with international practices to use a single tool across their entire document corpus.

Enterprise: Private LLM deployments

Beyond the desktop application, Qvault offers enterprise-grade private AI infrastructure for law firms and organizations handling sensitive documents at scale — with zero data exposure.

Private AI infrastructure

For enterprise clients, Qvault deploys custom large language models (Llama, Mistral, and others) directly on client servers or private cloud environments. This enables AI-powered PII detection that never sends data outside your network.

Document pipeline integration

Organizations can integrate Qvault into existing workflows using MCP (Model Context Protocol), enabling automatic scanning and redaction as documents flow through your systems.

Customization & compliance

Enterprise deployments include custom pattern libraries tailored for your jurisdiction and practice area — specifically designed for legal offices handling privileged communications and discovery processes. The solution is built to meet GDPR, LGPD, HIPAA, and bar association compliance requirements.

Data sovereignty

All enterprise processing occurs on-premise or in private cloud deployment. No data ever leaves your infrastructure. The enterprise tier includes:

Private LLM deployment on customer infrastructure
Pipeline integration via MCP
Jurisdiction and practice-area-specific PII patterns
On-premise operation with zero external data transfer
Dedicated support, SLA agreements, and onboarding assistance

About Qvault

Qvault is built by Santacroce SL in Madrid. For more information or to discuss enterprise deployment, visit qvault.tech or contact info@santacroce.es.