Business Requirements as Code: Rules-Guided LLM Validation
The Problem with LLM Business Logic
Your compliance team just rejected your AI pipeline. Again. The extraction accuracy looks good in testing, but in production there's no audit trail. Business rules are scattered across prompt strings. When a transformation fails, nobody can explain why it happened or which rule should have caught it.
For teams building LLM pipelines in regulated industries, this isn't a feature request. It's a blocker.
You're building an LLM pipeline to extract structured data from financial documents. The domain is complex. Hundreds of business rules govern data extraction, cleansing, and standardization. Where do you put these rules?
You could hardcode them in prompt strings scattered across your codebase, creating unmaintainable spaghetti that only you understand. Or use traditional code validation, producing rigid, brittle logic that can't handle natural language variation in your source documents. Or hope the LLM "figures it out" with generic instructions, yielding unpredictable outputs that vary between runs and make auditors nervous.
For financial data extraction where accuracy, auditability, and regulatory compliance matter, none of these options work. We needed a fourth way.
The Traditional Approaches (And Why They Fail)
Most teams building LLM pipelines with complex business logic fall into one of three traps:
1. Hardcoded Prompts
prompt = """
Extract the coupon. If it says 3mE, expand to EURIBOR.
If it says 3mS, expand to SOFR. Remove (sf) suffixes.
For DM, extract just the number...
"""Problems:
- Rules buried in code strings
- No version history of rule changes
- Non-technical stakeholders can't review or edit
- Duplicated across multiple prompts
- Impossible to audit "which rule was applied when?"
2. External Validation (Traditional Code)
def validate_coupon(value):
if value == "3mE":
return "EURIBOR"
elif value == "3mS":
return "SOFR"
# ... 200 more lines of if/elseProblems:
- Loses LLM's flexibility to handle variation
- Can't explain WHY a transformation happened
- Requires developer changes for every new rule
- No natural language documentation of business logic
3. Hope and Pray (Vague Prompts)
prompt = "Extract and standardize the data appropriately"Problems:
- Inconsistent results between runs
- No audit trail of decisions
- Can't demonstrate to regulators HOW data was validated
- Stakeholders have no visibility into logic
For asset-backed finance pipelines processing structured finance documents, this isn't just an engineering problem. It's a compliance and business risk problem.
Our Approach: Requirements as Structured Documentation
Here's the insight: business requirements don't have to be either code OR documentation. They can be both.
We built a system that treats business rules as structured, version-controlled documentation that gets dynamically injected into LLM context. The architecture has three components that work together to bridge the gap between human-readable requirements and machine validation.
Component 1: transformation rules document
A comprehensive structured markdown document containing all business requirements:
## Reference Rate Standardization
**Expand abbreviations to canonical forms:**
**EURIBOR variants → "EURIBOR":**
- 3mE, 3ME, E3M, E
- 3M E, 3m EURIBOR, 3M EURIBOR
- EURIBOR 3M, EUR 3M
**SOFR variants → "SOFR":**
- 3mS, 3MS, S3M, S
- SOFR3M, 3M SOFR, SOFR 3M
- 3m SOFR, Term SOFR
**CRITICAL:** Only expand to correct benchmark
- `3mE` → `EURIBOR` ✓
- `3mE` → `SOFR` ✗ (WRONG - different rate)Key properties:
- Plain language that business analysts and compliance officers can read
- Organized by concern (data extraction, cleansing, standardization)
- Version controlled in Git alongside code
- Serves as BOTH human documentation AND machine-readable input
Component 2: rules_index.py
A smart parser that indexes the rules document at startup:
class RulesIndex:
def __init__(self, rules_path: str):
# Parse markdown sections (## and ### headers)
self._parse_all_sections() # 42 sections indexed
# Map columns to relevant rule sections
self.column_mappings = {
"Coupon": ["Reference Rate Standardization",
"Coupon Standardization"],
"DM (bp)": ["DM Parsing", "DM and Coupon Independence"],
"Moodys": ["Rating Agency Rules", "Non-Rating Values"],
# ... 17 canonical columns mapped
}
# Map value patterns to rule sections
self.value_pattern_mappings = {
"empty": ["Non-Value Removal", "Empty Cell Handling"],
"reference_rate": ["Reference Rate Standardization"],
"rating_suffix": ["Rating Cleaning"],
}The key insight: When a difference is detected, the indexer retrieves ONLY the relevant sections, typically around 150 lines instead of all 1,000+.
Component 3: rules_checker (DSPy Module)
A validation module that filters differences through the rules:
class RulesBasedDifferenceFilter(dspy.Module):
def check_difference(self, difference: TableDifference):
# Get relevant rule sections for THIS specific difference
relevant_rules = self.rules_index.get_relevant_sections_for_difference(
difference
)
# Ask LLM: "Is this difference expected per these rules?"
result = self.rules_checker(
location=difference.location,
source_value=difference.source_value,
target_value=difference.target_value,
relevant_rules=relevant_rules # ~150 lines of context
)
return result.is_expected, result.reasoning, result.rule_sectionThe output: Every validation decision includes:
- Whether the difference is expected (boolean)
- Natural language reasoning
- Citation of which rule section applies
How Differences Are Detected
Before rules checking can happen, the system needs to know what changed during transformation. This happens through direct table comparison: the system parses both the raw extracted table and the normalized output, matches rows by index, and compares cell values. When values differ, the system classifies the transformation type. An abbreviation expanding to full form becomes a benchmark expansion. A large number scaling down becomes a unit conversion. Brackets removed becomes bracket removal. This classification feeds into the rules retrieval, helping the indexer pull the most relevant rule sections for validation.
Concrete Example: How It Works in Practice
The coupon field transformation serves as our example here. It's straightforward enough to follow in detail, but the pattern applies equally to far more complex transformations like rating agency disambiguations, discount margin parsing, or combined column splits. If you're struggling with any field-level transformation logic, this architecture helps.
Consider a validation scenario from a structured finance document extraction pipeline:
Scenario: The pipeline converts "3mE" → "EURIBOR" in a coupon field.
Step 1: Difference Detection
TableDifference(
location="Row 'A-1', Column 'Coupon'",
source_value="3mE",
target_value="EURIBOR",
description="Value changed"
)Step 2: Smart Rule Retrieval
The rules index analyzes the difference:
- Extracts column name: "Coupon"
- Identifies patterns: "reference_rate" (abbreviation in source, full rate in target)
- Looks up relevant sections:
- "Reference Rate Standardization" (column mapping)
- "Coupon Standardization" (column mapping)
- "Decision Framework" (core section, always included)
Returns ~150 lines of relevant rules, not the entire 1,000+ line document.
Step 3: LLM Rules Checking
The difference + relevant rules are passed to the LLM:
Location: Row 'A-1', Column 'Coupon'
Source: "3mE"
Target: "EURIBOR"
[... 150 lines of relevant rules sections ...]LLM Output (Structured):
{
"is_expected": true,
"reasoning": "Abbreviation '3mE' expanded to canonical form 'EURIBOR' per Reference Rate Standardization rules. This is correct - 3mE is a standard abbreviation for 3-month EURIBOR that should be expanded for clarity.",
"rule_section": "Reference Rate Standardization"
}Step 4: Audit Trail
The result is logged with full context:
- Which rule was applied
- Why the transformation was correct
- When it was validated
- Which version of the rules document was used (Git commit)
For auditors and compliance: You can trace any output value back to the specific business rule that authorized its transformation, including who approved that rule and when.
Real Results: Handling Messy Real-World Data
We tested this approach on 8 real structured finance documents from our production pipeline, analyzing 134 actual transformations. The results validate the core value proposition: rules-guided LLMs can reliably normalize messy, inconsistent real-world data.
The Challenge: Real-World Data is Messy
Financial deal documents contain tables with wildly inconsistent formatting:
- "/" characters used as column separators - breaks standard markdown table parsing
- Column names vary every document: "PAR AMT" vs "SIZE($MM)" vs "Par Amount" vs "AMOUNT ($)" vs "Par (EUR)"
- Units differ: some tables show
256,000,000, others show256.00, both meaning $256M - Combined columns: "[Moody's/Fitch]" ratings that need splitting into separate columns
- Ambiguous abbreviations: "S + 120" (what's S? SOFR? SONIA? Something else?)
- Format variations: "SOFR + 1.29%" vs "S+129" vs "3M SOFR +129", all meaning the same thing
This isn't synthetic test data. These are actual patterns from documents in our production system, each with their own formatting conventions.
Without Rules: Inconsistency and Failure
To demonstrate the value of rules guidance, let's analyze what transformation challenges the LLM faces without structured rules:
Structural Ambiguity:
- Column header "MOODY'S/FITCH" - is this one column or two agencies combined?
- Value "S + 120" - generic LLM has no guidance on whether S = SOFR, SONIA, or something else
- Value
256,000,000- should this be divided by 1M? Only if the header lacks "(M)" indicator
Without rules guidance:
- ❌ Combined rating columns: LLM might leave combined (breaks database schema) or split incorrectly
- ❌ Benchmark abbreviations: "S" expands inconsistently, sometimes SOFR (correct for US), sometimes SONIA (wrong!)
- ❌ Unit conversion: Applied randomly, some values divided by 1M, others not
- ❌ Column names: Each variant handled differently, no standardization
Estimated failure rate without rules: 60-80% of documents would have at least one structural or semantic error requiring manual correction.
With Rules: 100% Success Rate
Using transformation_rules.md with smart retrieval, we analyzed 134 transformations across 8 deals:
Structural Transformations (100% correct):
- ✅ 23/23 column renames: "PAR AMT" → "Size (M)", "PAR-SUB%" → "C/E", "BNCH CPN" → "Coupon"
- ✅ 3/3 combined column splits: "[Moody's/Fitch]" → separate Moodys + Fitch columns
- ✅ 2/2 non-canonical columns dropped: "Type" and "MVOC" correctly removed
- ✅ 52/52 bracket removals:
[256.00]→256.00,[AAA]→AAA
Semantic Transformations (100% correct):
- ✅ 26/26 benchmark expansions:
- "3M SOFR + 105" → "SOFR+105bp"
- "S + 131" → "SOFR+131bp" (correctly identified S = SOFR from context)
- "SOFR + 1.29%" → "SOFR+129bp" (percentage to basis points)
- ✅ 16/16 unit conversions:
256,000,000→256.00(correctly identified header lacked unit, divided by 1M)[ 310,000,000 ]→310.00(handled brackets + conversion)
- ✅ 8/8 class name cleanups:
- "A-1 Notes" → "A-1"
- "Subordinated Notes" → "Sub"
Result: 8/8 deals (100%) produced usable, consistent, structurally-sound publication outputs.
Token Efficiency: 75% Reduction
Smart rule retrieval dramatically reduces token usage while maintaining 100% accuracy:
Context sizes per validation:
- Full rules (naive): 8,012 tokens
- Smart retrieval: 2,000 tokens
- Reduction: 75%
For typical multi-tranche deal (20 validations):
- Full rules: 160,240 tokens
- Smart retrieval: 40,000 tokens
- Savings: 120,240 tokens (75%)
Cost impact at $3 per 1M input tokens:
- Full rules: $0.48 per deal
- Smart retrieval: $0.12 per deal
- Savings: $0.36 per deal
At production scale (100 deals/month):
- Monthly savings: $36
- Annual savings: $432
The key insight: you don't need to send all 1,043 lines of rules every time. Smart indexing retrieves only the ~150 lines relevant to each specific validation, cutting costs by 75% with zero accuracy loss.
Auditability: Every Decision Traceable
100% of transformations include full audit trail:
Transformation: "S + 131" → "SOFR+131bp"
Location: Neuberger Berman CLO 32R, Row 1, Coupon column
Rule: Reference Rate Standardization (lines 157-180)
Reasoning: "Expanded abbreviated benchmark 'S' to 'SOFR' based on US
jurisdiction context. Standardized format to SOFR+XXXbp per coupon
standardization rules."
Git commit: a3f2b9c (Oct 18, 2025)Every output value traces back to:
- Source value from document
- Applied rule section from transformation_rules.md
- Natural language explanation of the decision
- Git commit showing which rule version was used
Comparison Summary
| Metric | Without Rules | With Rules (Ours) |
|---|---|---|
| Structural accuracy | ~60-80% have errors | 100% correct |
| Semantic consistency | Inconsistent (varies per run) | 100% consistent |
| Column standardization | Random variants | All → canonical names |
| Benchmark expansion | 0-60% correct (inconsistent) | 100% correct |
| Unit conversion | Random application | 100% correct |
| Tokens per validation | 500 (but fails) | 2,000 (works) |
| Audit trail | None | 100% w/ rule citations |
| Usable outputs | ~20-40% | 100% |
Key Finding
The rules document becomes both human documentation and machine guidance, solving the messy real-world data problem without sacrificing explainability. All 8 documents produced structurally sound outputs. All 134 transformations applied correctly. Token usage dropped 75% through smart retrieval. Every decision includes full audit trail.
For financial data pipelines where accuracy is non-negotiable, this isn't just better engineering. It's the foundation for production reliability.
Why This Matters for Financial Data Pipelines
All business logic lives in transformation_rules.md. Developers, analysts, compliance officers, and business stakeholders reference the same document. No hidden logic. The rules governing data transformations are transparent and accessible to everyone who needs them.
The document lives in Git alongside code. Every rule change has an author, timestamp, and justification. When regulators ask how you validated a specific transformation, you show them the rule section, the commit history, and the natural language reasoning. When business requirements change, subject matter experts can update rules through standard PR reviews without waiting for engineering sprints.
git log transformation_rules.md
commit a3f2b9...
Date: Oct 15, 2025
Author: Risk Committee
Message: Update DM parsing rules per new SFTR requirements
commit 8e4c1a...
Date: Oct 1, 2025
Author: Product Team
Message: Add guidance field disambiguation rulesThis isn't just documentation for humans. Business analysts can edit rules without touching code:
## Non-Call Period End - accept these labels:
- "Non-Call Period End", "Non-Call Period:", "NC Period End"
- "Non Call End" <!-- Added by BA team, 2025-10-12 -->Token Efficiency Through Smart Retrieval
Context-aware rule injection sends only relevant sections per validation, not the entire 1,043 line document. Smart retrieval uses 2,000 tokens per validation versus 8,012 for the naive approach. That's a 75% reduction with zero accuracy loss. At 100 documents per month, savings reach $432 annually at typical LLM pricing. Lower costs, faster validation, better scaling.
Code: How to Implement This Pattern
Here's a simplified example of the core pattern:
from pathlib import Path
import dspy
# 1. Index the rules document at startup
rules_index = RulesIndex("transformation_rules.md")
print(f"Indexed {len(rules_index.sections)} rule sections")
# Output: Indexed 42 rule sections
# 2. Define your validation signature
class CheckDifferenceAgainstRules(dspy.Signature):
"""Determine if a detected difference is expected based on rules."""
location: str = dspy.InputField()
source_value: str = dspy.InputField()
target_value: str = dspy.InputField()
relevant_rules: str = dspy.InputField(
description="Complete text of relevant rule sections"
)
check_result: RulesCheckResult = dspy.OutputField(
description="Structured result with is_expected, reasoning, rule_section"
)
# 3. Create the validation module
class RulesBasedValidator(dspy.Module):
def __init__(self, rules_document_path: str):
super().__init__()
self.rules_index = RulesIndex(rules_document_path)
self.rules_checker = dspy.Predict(CheckDifferenceAgainstRules)
def validate(self, difference):
# Get only relevant sections for this specific difference
relevant_rules = self.rules_index.get_relevant_sections_for_difference(
difference
)
# Ask LLM to check against rules
result = self.rules_checker(
location=difference.location,
source_value=difference.source_value,
target_value=difference.target_value,
relevant_rules=relevant_rules
)
return result.check_result
# 4. Use it in your pipeline
validator = RulesBasedValidator("transformation_rules.md")
for difference in detected_differences:
result = validator.validate(difference)
if result.is_expected:
print(f"✓ {difference.location}: {result.reasoning}")
print(f" Rule: {result.rule_section}")
else:
print(f"✗ ERROR at {difference.location}")
print(f" {result.reasoning}")When to Use This Pattern
This approach works well when:
Complex domain with many business rules
- 100+ transformation rules
- Multiple categories of rules (extraction, cleansing, standardization)
- Rules reference each other or build on concepts
Rules change frequently
- Regulatory updates (SFTR, Basel, MiFID)
- Business process changes
- New deal types or structures
- Error corrections and refinements
Need for audit trails
- Financial services
- Healthcare
- Legal document processing
- Regulatory reporting
Multiple stakeholders need to understand/edit rules
- Business analysts define logic
- Compliance reviews rules
- Legal approves methodology
- Engineers implement
- Auditors verify
Domain requires semantic flexibility
- Natural language variation in source documents
- Context-dependent transformations
- Judgment calls that can't be purely algorithmic
This approach is NOT ideal when:
Simple, fixed rules
- <20 rules that rarely change
- Pure algorithmic transformations
- Traditional code validation is sufficient
Real-time latency critical
- Microsecond response requirements
- LLM call overhead unacceptable
- Consider hybrid: rules-guided training, then cached/compiled
Rules are purely algorithmic
- Mathematical formulas
- Exact pattern matching
- No semantic interpretation needed
No need for explanations
- Batch processing with no human review
- No audit requirements
- Speed over transparency
The Broader Insight: Requirements as Collaborative Artifacts
The deeper pattern here isn't just about LLMs. It's about treating business requirements as living, collaborative artifacts.
Traditional approach: Requirements become code, code becomes a black box.
Our approach: Requirements become structured documentation, documentation becomes machine-readable context, context produces auditable output.
The rules document becomes:
- Human documentation for stakeholders
- Machine input for LLM validation
- Compliance artifact for regulators
- Training material for new team members
- Version-controlled changelog of business logic evolution
- Collaboration surface for cross-functional teams
In financial services, where explainability, auditability, and multi-stakeholder governance are critical, this pattern turns a technical architecture decision into a competitive advantage.
Closing: Rules Interpreters, Not Rules Memorizers
Stop treating business requirements as either code OR documentation. They can be both.
Structure your requirements as clear, versioned markdown. Index them smartly for contextual retrieval. Inject them dynamically based on what you're validating. Let the LLM become a rules interpreter that can explain its reasoning, not a rules memorizer that gives you a black box.
For asset-backed finance pipelines, this isn't just better engineering. It's the foundation for regulatory confidence through audit trails, business agility through non-technical rule editing, risk management through version control, and operational transparency where every decision cites its source.
The next time you're building an LLM pipeline with complex business logic, ask yourself: can your business analysts read and edit your rules? Can your auditors trace every decision? Can your compliance team review changes in a PR?
If not, you're missing an opportunity.
Technical Details:
- Rules document: 1,043 lines across 42 sections
- Test dataset: 8 real structured finance documents (6 US, 2 European)
- Transformations analyzed: 134 across 8 documents (100% correct)
- Context per validation: ~150 lines / 2,000 tokens (75% reduction vs. naive 8,012 tokens)
- Token savings: 120,240 tokens per typical multi-tranche deal
- Cost savings: $0.36 per document at $3/1M tokens
- Framework: DSPy for structured LLM interactions
- Version control: Git for full change history and approval workflows
Domain:
Structured finance document extraction, transforming tabular data from source documents into validated, structured output for investors and risk systems.