The problem
Legal professionals handle some of the most sensitive information in existence — medical records, financial statements, personal identifiers. The standard approach in most software is to send this data to the cloud for processing. We think that's the wrong model for this kind of work.
Qvault takes a different approach: every byte of every document stays on your computer. No cloud processing, no uploads, no server-side analysis. The entire pipeline — from document ingestion to PII detection to redacted export — runs locally.
This article explains how that works under the hood.
Key numbers
0 bytes sent to the cloud. 5 jurisdictions covered. 18 IPC commands in the bridge layer. <100ms typical scan time.
Local-first architecture
Qvault is built on Tauri v2, which pairs a Rust backend with a React 18 frontend rendered through native webviews. Unlike Electron, there's no bundled Chromium — the app uses the operating system's own webview, which keeps the installed size around 15-20 MB.
The Rust backend handles everything security-critical: document parsing, PII detection, and storage. The frontend is purely a presentation layer. Communication between the two happens through 18 IPC commands that cover:
- Document management (import, list, delete)
- PII scanning and detection
- Redaction review and approval
- Cross-document entity tracking
- Redacted document export
- License management
These are in-process function calls, not network requests. The serialization overhead is measured in microseconds.
Dual-layer PII detection engine
The detection system runs two independent passes over every document. Each layer catches different types of sensitive information, and their results are merged with overlap prevention.
Layer 1: Pattern-based regex scanner
The first layer uses compiled regular expressions to detect structured data with known formats:
- Email addresses and phone numbers
- Credit card numbers (with Luhn validation)
- IBAN numbers
- Dates in various formats
- Region-specific identifiers: US Social Security numbers, EU VAT numbers, Brazilian CPF/CNPJ, German tax IDs
The regex patterns are compiled to deterministic finite automata, which guarantees linear-time matching regardless of input size. These detections get a 0.95 confidence score — high, because structured patterns have very low false-positive rates.
Layer 2: Heuristic context scanner
The second layer handles unstructured data — primarily names and company entities, which don't have fixed formats. It runs six detection passes that analyze:
- Company legal suffixes (LLC, GmbH, S.A., Ltd, etc.)
- Name-like patterns using capitalization and word-boundary analysis
- Distribution tables for common first and last names
- Contextual signals from surrounding text
To minimize false positives, the heuristic layer maintains 25 stop phrases and 71 stop words — common terms that look like names but aren't (e.g., "General Court", "Supreme Court"). Confidence scores for heuristic detections range from 0.75 to 0.92 depending on the strength of the contextual signal.
Document processing pipeline
When a document enters Qvault, it goes through a fixed pipeline:
- Upload — the file is read from disk into memory
- Extract text — parsed to extract searchable text with position coordinates
- Scan PII — both detection layers run against the extracted text
- Review — the user approves, rejects, or edits each detection
- Export — a redacted copy is generated with black-box redactions
Text extraction
PDF parsing uses lopdf to walk the PDF object tree, while PDF.js provides coordinate-mapped text spans for precise overlay positioning. DOCX files are handled as ZIP archives with XML parsing to extract text runs and paragraph structure.
Storage
Everything is stored in a local SQLite database running in WAL mode for concurrent read/write access. Seven tables track documents, redactions, entities, page text, licenses, audit logs, and credits.
One particularly useful feature: the cross-document entity knowledge base. As you process more documents, Qvault builds a database of known entities (people, companies) across your document corpus. This means detection accuracy improves over time — if a name was confirmed in one document, it gets flagged with higher confidence in subsequent documents.
Frontend rendering
The frontend manages a multi-layer rendering system for the document viewer:
- PDF canvas rendering at 1.5x viewport scale for crisp display
- Text span extraction with dual coordinate systems (PDF space and screen space)
- Color-coded redaction overlays that the user can accept, reject, or manually adjust
- Coordinate mapping back to PDF space for accurate export
The challenge here is keeping two coordinate systems in sync. PDF coordinates are bottom-left origin with points as units. Screen coordinates are top-left origin with pixels. Every overlay position requires a transformation between these systems, accounting for zoom level, page offset, and DPI scaling.
What Qvault does not do
This is as important as what it does:
- No cloud uploads of any kind
- No telemetry or usage tracking
- No third-party processing APIs
- No external model downloads
- No temporary cloud storage
The trust model is simple: trust the machine, distrust the network. By eliminating all network-based processing, we eliminate an entire category of threat vectors. There are no API keys to leak, no cloud buckets to misconfigure, no data-in-transit to intercept.
Cross-platform distribution
Qvault ships native binaries for all major platforms:
- macOS: .dmg installers for both Apple Silicon and Intel
- Windows: .exe and .msi installers with multilingual support
- Linux: .deb packages and AppImage
Jurisdictional coverage
The PII detection engine covers five jurisdictions out of the box: global patterns (email, phone, credit cards), US-specific (SSN, state IDs), EU-specific (VAT, IBAN), Brazilian (CPF, CNPJ), and German (Steuer-ID, tax numbers). This enables firms with international practices to use a single tool across their entire document corpus.
Enterprise: Private LLM deployments
Beyond the desktop application, Qvault offers enterprise-grade private AI infrastructure for law firms and organizations handling sensitive documents at scale — with zero data exposure.
Private AI infrastructure
For enterprise clients, Qvault deploys custom large language models (Llama, Mistral, and others) directly on client servers or private cloud environments. This enables AI-powered PII detection that never sends data outside your network.
Document pipeline integration
Organizations can integrate Qvault into existing workflows using MCP (Model Context Protocol), enabling automatic scanning and redaction as documents flow through your systems.
Customization & compliance
Enterprise deployments include custom pattern libraries tailored for your jurisdiction and practice area — specifically designed for legal offices handling privileged communications and discovery processes. The solution is built to meet GDPR, LGPD, HIPAA, and bar association compliance requirements.
Data sovereignty
All enterprise processing occurs on-premise or in private cloud deployment. No data ever leaves your infrastructure. The enterprise tier includes:
- Private LLM deployment on customer infrastructure
- Pipeline integration via MCP
- Jurisdiction and practice-area-specific PII patterns
- On-premise operation with zero external data transfer
- Dedicated support, SLA agreements, and onboarding assistance
About Qvault
Qvault is built by Santacroce SL in Madrid. For more information or to discuss enterprise deployment, visit qvault.tech or contact info@santacroce.es.