Business Requirements as Code: Rules-Guided LLM Validation

The Problem with LLM Business Logic

Your compliance team just rejected your AI pipeline. Again. The extraction accuracy looks good in testing, but in production there's no audit trail. Business rules are scattered across prompt strings. When a transformation fails, nobody can explain why it happened or which rule should have caught it.

For teams building LLM pipelines in regulated industries, this isn't a feature request. It's a blocker.

You're building an LLM pipeline to extract structured data from financial documents. The domain is complex. Hundreds of business rules govern data extraction, cleansing, and standardization. Where do you put these rules?

You could hardcode them in prompt strings scattered across your codebase, creating unmaintainable spaghetti that only you understand. Or use traditional code validation, producing rigid, brittle logic that can't handle natural language variation in your source documents. Or hope the LLM "figures it out" with generic instructions, yielding unpredictable outputs that vary between runs and make auditors nervous.

For financial data extraction where accuracy, auditability, and regulatory compliance matter, none of these options work. We needed a fourth way.

The Traditional Approaches (And Why They Fail)

Most teams building LLM pipelines with complex business logic fall into one of three traps:

1. Hardcoded Prompts

prompt = """
Extract the coupon. If it says 3mE, expand to EURIBOR. 
If it says 3mS, expand to SOFR. Remove (sf) suffixes.
For DM, extract just the number...
"""

Problems:

Rules buried in code strings
No version history of rule changes
Non-technical stakeholders can't review or edit
Duplicated across multiple prompts
Impossible to audit "which rule was applied when?"

2. External Validation (Traditional Code)

def validate_coupon(value):
    if value == "3mE":
        return "EURIBOR"
    elif value == "3mS":
        return "SOFR"
    # ... 200 more lines of if/else

Problems:

Loses LLM's flexibility to handle variation
Can't explain WHY a transformation happened
Requires developer changes for every new rule
No natural language documentation of business logic

3. Hope and Pray (Vague Prompts)

prompt = "Extract and standardize the data appropriately"

Problems:

Inconsistent results between runs
No audit trail of decisions
Can't demonstrate to regulators HOW data was validated
Stakeholders have no visibility into logic

For asset-backed finance pipelines processing structured finance documents, this isn't just an engineering problem. It's a compliance and business risk problem.

Our Approach: Requirements as Structured Documentation

Here's the insight: business requirements don't have to be either code OR documentation. They can be both.

We built a system that treats business rules as structured, version-controlled documentation that gets dynamically injected into LLM context. The architecture has three components that work together to bridge the gap between human-readable requirements and machine validation.

Component 1: transformation rules document

A comprehensive structured markdown document containing all business requirements:

## Reference Rate Standardization

**Expand abbreviations to canonical forms:**

**EURIBOR variants → "EURIBOR":**
- 3mE, 3ME, E3M, E
- 3M E, 3m EURIBOR, 3M EURIBOR
- EURIBOR 3M, EUR 3M

**SOFR variants → "SOFR":**
- 3mS, 3MS, S3M, S
- SOFR3M, 3M SOFR, SOFR 3M
- 3m SOFR, Term SOFR

**CRITICAL:** Only expand to correct benchmark  
- `3mE` → `EURIBOR` ✓
- `3mE` → `SOFR` ✗ (WRONG - different rate)

Key properties:

Plain language that business analysts and compliance officers can read
Organized by concern (data extraction, cleansing, standardization)
Version controlled in Git alongside code
Serves as BOTH human documentation AND machine-readable input

Component 2: rules_index.py

A smart parser that indexes the rules document at startup:

class RulesIndex:
    def __init__(self, rules_path: str):
        # Parse markdown sections (## and ### headers)
        self._parse_all_sections()  # 42 sections indexed
        
        # Map columns to relevant rule sections
        self.column_mappings = {
            "Coupon": ["Reference Rate Standardization", 
                      "Coupon Standardization"],
            "DM (bp)": ["DM Parsing", "DM and Coupon Independence"],
            "Moodys": ["Rating Agency Rules", "Non-Rating Values"],
            # ... 17 canonical columns mapped
        }
        
        # Map value patterns to rule sections
        self.value_pattern_mappings = {
            "empty": ["Non-Value Removal", "Empty Cell Handling"],
            "reference_rate": ["Reference Rate Standardization"],
            "rating_suffix": ["Rating Cleaning"],
        }

The key insight: When a difference is detected, the indexer retrieves ONLY the relevant sections, typically around 150 lines instead of all 1,000+.

Component 3: rules_checker (DSPy Module)

A validation module that filters differences through the rules:

class RulesBasedDifferenceFilter(dspy.Module):
    def check_difference(self, difference: TableDifference):
        # Get relevant rule sections for THIS specific difference
        relevant_rules = self.rules_index.get_relevant_sections_for_difference(
            difference
        )
        
        # Ask LLM: "Is this difference expected per these rules?"
        result = self.rules_checker(
            location=difference.location,
            source_value=difference.source_value,
            target_value=difference.target_value,
            relevant_rules=relevant_rules  # ~150 lines of context
        )
        
        return result.is_expected, result.reasoning, result.rule_section

The output: Every validation decision includes:

Whether the difference is expected (boolean)
Natural language reasoning
Citation of which rule section applies

How Differences Are Detected

Before rules checking can happen, the system needs to know what changed during transformation. This happens through direct table comparison: the system parses both the raw extracted table and the normalized output, matches rows by index, and compares cell values. When values differ, the system classifies the transformation type. An abbreviation expanding to full form becomes a benchmark expansion. A large number scaling down becomes a unit conversion. Brackets removed becomes bracket removal. This classification feeds into the rules retrieval, helping the indexer pull the most relevant rule sections for validation.

Concrete Example: How It Works in Practice

The coupon field transformation serves as our example here. It's straightforward enough to follow in detail, but the pattern applies equally to far more complex transformations like rating agency disambiguations, discount margin parsing, or combined column splits. If you're struggling with any field-level transformation logic, this architecture helps.

Consider a validation scenario from a structured finance document extraction pipeline:

Scenario: The pipeline converts "3mE" → "EURIBOR" in a coupon field.

Step 1: Difference Detection

TableDifference(
    location="Row 'A-1', Column 'Coupon'",
    source_value="3mE",
    target_value="EURIBOR",
    description="Value changed"
)

Step 2: Smart Rule Retrieval

The rules index analyzes the difference:

Extracts column name: "Coupon"
Identifies patterns: "reference_rate" (abbreviation in source, full rate in target)
Looks up relevant sections:
- "Reference Rate Standardization" (column mapping)
- "Coupon Standardization" (column mapping)
- "Decision Framework" (core section, always included)

Returns ~150 lines of relevant rules, not the entire 1,000+ line document.

Step 3: LLM Rules Checking

The difference + relevant rules are passed to the LLM:

Location: Row 'A-1', Column 'Coupon'
Source: "3mE"
Target: "EURIBOR"

[... 150 lines of relevant rules sections ...]

LLM Output (Structured):

{
  "is_expected": true,
  "reasoning": "Abbreviation '3mE' expanded to canonical form 'EURIBOR' per Reference Rate Standardization rules. This is correct - 3mE is a standard abbreviation for 3-month EURIBOR that should be expanded for clarity.",
  "rule_section": "Reference Rate Standardization"
}

Step 4: Audit Trail

The result is logged with full context:

Which rule was applied
Why the transformation was correct
When it was validated
Which version of the rules document was used (Git commit)

For auditors and compliance: You can trace any output value back to the specific business rule that authorized its transformation, including who approved that rule and when.

Real Results: Handling Messy Real-World Data

We tested this approach on 8 real structured finance documents from our production pipeline, analyzing 134 actual transformations. The results validate the core value proposition: rules-guided LLMs can reliably normalize messy, inconsistent real-world data.

The Challenge: Real-World Data is Messy

Financial deal documents contain tables with wildly inconsistent formatting:

"/" characters used as column separators - breaks standard markdown table parsing
Column names vary every document: "PAR AMT" vs "SIZE($MM)" vs "Par Amount" vs "AMOUNT ($)" vs "Par (EUR)"
Units differ: some tables show 256,000,000, others show 256.00, both meaning $256M
Combined columns: "[Moody's/Fitch]" ratings that need splitting into separate columns
Ambiguous abbreviations: "S + 120" (what's S? SOFR? SONIA? Something else?)
Format variations: "SOFR + 1.29%" vs "S+129" vs "3M SOFR +129", all meaning the same thing

This isn't synthetic test data. These are actual patterns from documents in our production system, each with their own formatting conventions.

Without Rules: Inconsistency and Failure

To demonstrate the value of rules guidance, let's analyze what transformation challenges the LLM faces without structured rules:

Structural Ambiguity:

Column header "MOODY'S/FITCH" - is this one column or two agencies combined?
Value "S + 120" - generic LLM has no guidance on whether S = SOFR, SONIA, or something else
Value 256,000,000 - should this be divided by 1M? Only if the header lacks "(M)" indicator

Without rules guidance:

❌ Combined rating columns: LLM might leave combined (breaks database schema) or split incorrectly
❌ Benchmark abbreviations: "S" expands inconsistently, sometimes SOFR (correct for US), sometimes SONIA (wrong!)
❌ Unit conversion: Applied randomly, some values divided by 1M, others not
❌ Column names: Each variant handled differently, no standardization

Estimated failure rate without rules: 60-80% of documents would have at least one structural or semantic error requiring manual correction.

With Rules: 100% Success Rate

Using transformation_rules.md with smart retrieval, we analyzed 134 transformations across 8 deals:

Structural Transformations (100% correct):

✅ 23/23 column renames: "PAR AMT" → "Size (M)", "PAR-SUB%" → "C/E", "BNCH CPN" → "Coupon"
✅ 3/3 combined column splits: "[Moody's/Fitch]" → separate Moodys + Fitch columns
✅ 2/2 non-canonical columns dropped: "Type" and "MVOC" correctly removed
✅ 52/52 bracket removals: [256.00] → 256.00, [AAA] → AAA

Semantic Transformations (100% correct):

✅ 26/26 benchmark expansions:
- "3M SOFR + 105" → "SOFR+105bp"
- "S + 131" → "SOFR+131bp" (correctly identified S = SOFR from context)
- "SOFR + 1.29%" → "SOFR+129bp" (percentage to basis points)
✅ 16/16 unit conversions:
- 256,000,000 → 256.00 (correctly identified header lacked unit, divided by 1M)
- [ 310,000,000 ] → 310.00 (handled brackets + conversion)
✅ 8/8 class name cleanups:
- "A-1 Notes" → "A-1"
- "Subordinated Notes" → "Sub"

Result: 8/8 deals (100%) produced usable, consistent, structurally-sound publication outputs.

Token Efficiency: 75% Reduction

Smart rule retrieval dramatically reduces token usage while maintaining 100% accuracy:

Context sizes per validation:

Full rules (naive): 8,012 tokens
Smart retrieval: 2,000 tokens
Reduction: 75%

For typical multi-tranche deal (20 validations):

Full rules: 160,240 tokens
Smart retrieval: 40,000 tokens
Savings: 120,240 tokens (75%)

Cost impact at $3 per 1M input tokens:

Full rules: $0.48 per deal
Smart retrieval: $0.12 per deal
Savings: $0.36 per deal

At production scale (100 deals/month):

Monthly savings: $36
Annual savings: $432

The key insight: you don't need to send all 1,043 lines of rules every time. Smart indexing retrieves only the ~150 lines relevant to each specific validation, cutting costs by 75% with zero accuracy loss.

Auditability: Every Decision Traceable

100% of transformations include full audit trail:

Transformation: "S + 131" → "SOFR+131bp"
Location: Neuberger Berman CLO 32R, Row 1, Coupon column
Rule: Reference Rate Standardization (lines 157-180)
Reasoning: "Expanded abbreviated benchmark 'S' to 'SOFR' based on US 
jurisdiction context. Standardized format to SOFR+XXXbp per coupon 
standardization rules."
Git commit: a3f2b9c (Oct 18, 2025)

Every output value traces back to:

Source value from document
Applied rule section from transformation_rules.md
Natural language explanation of the decision
Git commit showing which rule version was used

Comparison Summary

Metric	Without Rules	With Rules (Ours)
Structural accuracy	~60-80% have errors	100% correct
Semantic consistency	Inconsistent (varies per run)	100% consistent
Column standardization	Random variants	All → canonical names
Benchmark expansion	0-60% correct (inconsistent)	100% correct
Unit conversion	Random application	100% correct
Tokens per validation	500 (but fails)	2,000 (works)
Audit trail	None	100% w/ rule citations
Usable outputs	~20-40%	100%

Key Finding

The rules document becomes both human documentation and machine guidance, solving the messy real-world data problem without sacrificing explainability. All 8 documents produced structurally sound outputs. All 134 transformations applied correctly. Token usage dropped 75% through smart retrieval. Every decision includes full audit trail.

For financial data pipelines where accuracy is non-negotiable, this isn't just better engineering. It's the foundation for production reliability.

Why This Matters for Financial Data Pipelines

All business logic lives in transformation_rules.md. Developers, analysts, compliance officers, and business stakeholders reference the same document. No hidden logic. The rules governing data transformations are transparent and accessible to everyone who needs them.

The document lives in Git alongside code. Every rule change has an author, timestamp, and justification. When regulators ask how you validated a specific transformation, you show them the rule section, the commit history, and the natural language reasoning. When business requirements change, subject matter experts can update rules through standard PR reviews without waiting for engineering sprints.

git log transformation_rules.md

commit a3f2b9...
Date: Oct 15, 2025
Author: Risk Committee
Message: Update DM parsing rules per new SFTR requirements

commit 8e4c1a...
Date: Oct 1, 2025
Author: Product Team
Message: Add guidance field disambiguation rules

This isn't just documentation for humans. Business analysts can edit rules without touching code:

## Non-Call Period End - accept these labels:
- "Non-Call Period End", "Non-Call Period:", "NC Period End"
- "Non Call End"  <!-- Added by BA team, 2025-10-12 -->

Token Efficiency Through Smart Retrieval

Context-aware rule injection sends only relevant sections per validation, not the entire 1,043 line document. Smart retrieval uses 2,000 tokens per validation versus 8,012 for the naive approach. That's a 75% reduction with zero accuracy loss. At 100 documents per month, savings reach $432 annually at typical LLM pricing. Lower costs, faster validation, better scaling.

Code: How to Implement This Pattern

Here's a simplified example of the core pattern:

from pathlib import Path
import dspy

# 1. Index the rules document at startup
rules_index = RulesIndex("transformation_rules.md")
print(f"Indexed {len(rules_index.sections)} rule sections")
# Output: Indexed 42 rule sections

# 2. Define your validation signature
class CheckDifferenceAgainstRules(dspy.Signature):
    """Determine if a detected difference is expected based on rules."""
    
    location: str = dspy.InputField()
    source_value: str = dspy.InputField()
    target_value: str = dspy.InputField()
    relevant_rules: str = dspy.InputField(
        description="Complete text of relevant rule sections"
    )
    
    check_result: RulesCheckResult = dspy.OutputField(
        description="Structured result with is_expected, reasoning, rule_section"
    )

# 3. Create the validation module
class RulesBasedValidator(dspy.Module):
    def __init__(self, rules_document_path: str):
        super().__init__()
        self.rules_index = RulesIndex(rules_document_path)
        self.rules_checker = dspy.Predict(CheckDifferenceAgainstRules)
    
    def validate(self, difference):
        # Get only relevant sections for this specific difference
        relevant_rules = self.rules_index.get_relevant_sections_for_difference(
            difference
        )
        
        # Ask LLM to check against rules
        result = self.rules_checker(
            location=difference.location,
            source_value=difference.source_value,
            target_value=difference.target_value,
            relevant_rules=relevant_rules
        )
        
        return result.check_result

# 4. Use it in your pipeline
validator = RulesBasedValidator("transformation_rules.md")

for difference in detected_differences:
    result = validator.validate(difference)
    
    if result.is_expected:
        print(f"✓ {difference.location}: {result.reasoning}")
        print(f"  Rule: {result.rule_section}")
    else:
        print(f"✗ ERROR at {difference.location}")
        print(f"  {result.reasoning}")

When to Use This Pattern

This approach works well when:

Complex domain with many business rules

100+ transformation rules
Multiple categories of rules (extraction, cleansing, standardization)
Rules reference each other or build on concepts

Rules change frequently

Regulatory updates (SFTR, Basel, MiFID)
Business process changes
New deal types or structures
Error corrections and refinements

Need for audit trails

Financial services
Healthcare
Legal document processing
Regulatory reporting

Multiple stakeholders need to understand/edit rules

Business analysts define logic
Compliance reviews rules
Legal approves methodology
Engineers implement
Auditors verify

Domain requires semantic flexibility

Natural language variation in source documents
Context-dependent transformations
Judgment calls that can't be purely algorithmic

This approach is NOT ideal when:

Simple, fixed rules

<20 rules that rarely change
Pure algorithmic transformations
Traditional code validation is sufficient

Real-time latency critical

Microsecond response requirements
LLM call overhead unacceptable
Consider hybrid: rules-guided training, then cached/compiled

Rules are purely algorithmic

Mathematical formulas
Exact pattern matching
No semantic interpretation needed

No need for explanations

Batch processing with no human review
No audit requirements
Speed over transparency

The Broader Insight: Requirements as Collaborative Artifacts

The deeper pattern here isn't just about LLMs. It's about treating business requirements as living, collaborative artifacts.

Traditional approach: Requirements become code, code becomes a black box.

Our approach: Requirements become structured documentation, documentation becomes machine-readable context, context produces auditable output.

The rules document becomes:

Human documentation for stakeholders
Machine input for LLM validation
Compliance artifact for regulators
Training material for new team members
Version-controlled changelog of business logic evolution
Collaboration surface for cross-functional teams

In financial services, where explainability, auditability, and multi-stakeholder governance are critical, this pattern turns a technical architecture decision into a competitive advantage.

Closing: Rules Interpreters, Not Rules Memorizers

Stop treating business requirements as either code OR documentation. They can be both.

Structure your requirements as clear, versioned markdown. Index them smartly for contextual retrieval. Inject them dynamically based on what you're validating. Let the LLM become a rules interpreter that can explain its reasoning, not a rules memorizer that gives you a black box.

For asset-backed finance pipelines, this isn't just better engineering. It's the foundation for regulatory confidence through audit trails, business agility through non-technical rule editing, risk management through version control, and operational transparency where every decision cites its source.

The next time you're building an LLM pipeline with complex business logic, ask yourself: can your business analysts read and edit your rules? Can your auditors trace every decision? Can your compliance team review changes in a PR?

If not, you're missing an opportunity.

Technical Details:

Rules document: 1,043 lines across 42 sections
Test dataset: 8 real structured finance documents (6 US, 2 European)
Transformations analyzed: 134 across 8 documents (100% correct)
Context per validation: ~150 lines / 2,000 tokens (75% reduction vs. naive 8,012 tokens)
Token savings: 120,240 tokens per typical multi-tranche deal
Cost savings: $0.36 per document at $3/1M tokens
Framework: DSPy for structured LLM interactions
Version control: Git for full change history and approval workflows

Domain:

Structured finance document extraction, transforming tabular data from source documents into validated, structured output for investors and risk systems.

Business Requirements as Code: Rules-Guided LLM Validation

The Problem with LLM Business Logic

The Traditional Approaches (And Why They Fail)

1. Hardcoded Prompts

2. External Validation (Traditional Code)

3. Hope and Pray (Vague Prompts)

Our Approach: Requirements as Structured Documentation

Component 1: transformation rules document

Component 2: rules_index.py

Component 3: rules_checker (DSPy Module)

How Differences Are Detected

Concrete Example: How It Works in Practice

Step 1: Difference Detection

Step 2: Smart Rule Retrieval

Step 3: LLM Rules Checking

Step 4: Audit Trail

Real Results: Handling Messy Real-World Data

The Challenge: Real-World Data is Messy

Without Rules: Inconsistency and Failure

With Rules: 100% Success Rate

Token Efficiency: 75% Reduction

Auditability: Every Decision Traceable

Comparison Summary

Why This Matters for Financial Data Pipelines

Token Efficiency Through Smart Retrieval

Code: How to Implement This Pattern

When to Use This Pattern

This approach works well when:

This approach is NOT ideal when:

The Broader Insight: Requirements as Collaborative Artifacts

Closing: Rules Interpreters, Not Rules Memorizers

Samuel Griek