System Architecture
PolicyDiff is built on rule-based processing. Every monitoring job passes through a hardened pipeline designed for 100% auditable results.
Core Pipeline
Our pipeline prioritizes reliability. If any stage fails (e.g., fetch timeout), the job is marked failed with a deterministic error code. We use Myers algorithm for word-level diffs and Levenshtein similarity (0.85 threshold) for stable section tracking.
Risk Engine Logic
The engine identifies intent through deterministic proximity clustering and negation tracking:
- Proximity Clustering: Scans for high-risk verb+noun pairs (e.g., "sell" within 5 words of "data") to detect data transfer intent.
- Negation Shift Detection: Explicitly flags when negation words like "not" or "never" are removed from sensitive clauses.
- Structural Erosion: Automatically flags the deletion of high-risk sections (e.g., arbitration or class-action waivers).
- Contextual Multipliers: Weights risk scores based on section importance (e.g., 2.0x for Dispute Resolution).
Isolation Stability
The Content Isolation Layer uses a deterministic multi-stage strategy to remove global noise (headers, nav, footers).
We track Isolation Stability by generating a metadata fingerprint for every container selection. If the selected container changes between runs, a drift event is logged and exposed via metrics to prevent layout shifts from masking legitimate policy changes.
Numeric Override Integrity
Pricing changes are treated as first-class events. Our engine normalizes thousand separators, ignores currency symbols, and excludes version numbers to ensure that only meaningful numeric shifts trigger overrides.
System Guards
PolicyDiff maintains operational stability through several layers of defense:
- Concurrency Reconciliation Guard: A backend worker verifies consistency between in-memory job counts and the database every 10 seconds, self-healing any drift.
- Replay Determinism Harness: Before every deployment, we replay 500+ historical snapshots. A single hash change for a known input blocks the build.
- Job Timeout Enforcement: Every monitoring task has a hard 15-second limit to prevent resource exhaustion from hanging connections.