When Probabilistic Systems (LLMs) Pretend to Be Deterministic: A Lineage Case Study

1 Like

Executive Summary – A Practical Example

Large Language Models (LLMs) and deterministic lineage engines address different classes of problems. This blog presents a concrete, reproducible demonstration—using macro‑heavy SAS code—showing where LLMs add value (contextual reasoning, explanation) and where they must not be used (deterministic lineage extraction). The conclusion is not that LLMs are “bad”, but that misusing them as engines of record is dangerous. The correct architecture is complementary: a deterministic lineage engine as the system of record, augmented by an LLM for explanation and insight.

Imagine entrusting the keys to your most critical regulatory, governance, and transformation initiatives—not to an unerring system of record, but to a probabilistic engine designed for creativity. Allowing an LLM to play both judge and historian—reasoning about context and dictating immutable lineage—is not just risky; it is dangerous. This is no academic quibble: BCBS239 regulatory compliance, change management, integration projects, code refactoring, transpilations, and any journey to AI demand lineage that is deterministic, repeatable, and audit‑defensible. Crucially, no amount of prompting or human reinforcement learning can eliminate the inherent risks.

LLMs, by their nature, are susceptible to silent omissions and opaque reasoning—responses that cannot be reliably audited or interrogated. Prompt engineering and human feedback may refine outputs, but they cannot guarantee completeness, transparency, or reproducibility when it matters most.

In this blog, we demonstrate—using macro-heavy SAS code—why only a deterministic lineage engine should ever be trusted as the system of record. LLMs are powerful allies for insight, not ultimate authorities. Misusing them as engines of record is more than a technical misstep: it is an existential threat to reliability, governance, and compliance in modern data estates. The only safe architecture is complementary: deterministic lineage systems set the record, LLMs illuminate the path.

Background and Motivation

Modern SAS estates commonly rely on:

Macro indirection (&&name&i),
Runtime metadata (PROC CONTENTS),
CALL EXECUTE,
SYMPUT / SYMPUTX, and
Mid‑stream macro overrides via repeated %include.

These features provide flexibility, but they also introduce runtime variability. For governance, regulation, change impact analysis, and integration projects, lineage must be deterministic, repeatable, and audit‑defensible. This blog evaluates whether LLMs can safely replace deterministic lineage engines for that purpose.

The Contract: What Deterministic Lineage Must Guarantee

A deterministic lineage system must:

Be correct for all possible executions of the code,
Produce byte‑for‑byte identical outputs for identical inputs,
Never guess runtime values,
Return explicit null when something cannot be determined statically,
Be machine‑consumable and audit‑defensible.

Anything that violates these guarantees is non‑compliant, regardless of how plausible it sounds.

Side‑by‑Side Evaluation

The tables below are reformatted to ensure clear rendering and unambiguous comparison. All identifiers are monospace‑formatted and columns are aligned for readability.

Tables

Dimension	Expected (Contract)	Deterministic Lineage Engine	LLM Output
Input tables	All sources named without assumption	✅ SASHELP.CARS, SASHELP.CLASS	✅ Same names listed
Intermediate tables	Captured if created	✅ _CARS_SRC, _CARS_META, _MEASURE_STATS	✅ Same names listed
Output tables	Do not assume execution	✅ CARS_DYN, CARS_DYN2, CLASS_KEEP, CLASS_AGE	❌ Assumes all outputs always exist
Conditional execution	Must be respected	✅ Represented as conditional steps	❌ Implicitly assumed
Cross‑run stability	Required	✅ Stable	❌ Not guaranteed

Verdict: LLMs can recognise table names; engines must model execution semantics.

Columns

Dimension	Expected (Contract)	Deterministic Lineage Engine	LLM Output
Output columns	Explicit per execution	✅ Resolved columns per run	❌ Narrative (e.g., “COL1..COLn”)
Metadata‑driven selection	Must not be guessed	✅ Treated as dynamic	❌ Implied stability
Phase overrides	Must be reflected	✅ Phase‑specific resolution	⚠️ Mentioned only in prose
Machine‑usable	Required	✅ Yes	❌ No

Verdict: Describing how columns are built ≠ extracting deterministic lineage.

Expressions

Dimension	Expected (Contract)	Deterministic Lineage Engine	LLM Output
Derived expressions	Explicitly identified	✅ UPCASE(MAKE), AVG(HEIGHT), …	✅ Mentioned
Expression typing	Required	✅ DIRECT / AGG / SQL_EXPR	❌ Not classified
Dependency tracing	Required	✅ Column‑level graph	❌ Narrative only
Runtime sensitivity	Must be respected	✅ Honoured	❌ Emitted as static

Verdict: LLMs explain expressions; engines model and trace them.

Predicates (Critical)

Dimension	Expected (Contract)	Deterministic Lineage Engine	LLM Output
Predicate capture	Best‑effort	✅ predicate_conditions	✅ Textual
Boolean signature	Deterministic only	✅ null when dynamic	❌ Concrete WHERE clauses
Dynamic macros	Must collapse	✅ Yes	❌ Ignored
Cross‑run correctness	Required	✅ Guaranteed	❌ Broken
Audit defensibility	Required	✅ Yes	❌ No

Verdict: Guessing predicates is a category error and actively dangerous.

Concrete Failure vs Success Examples

Example: Dynamic WHERE Predicate

Code Reality: MAKE_UP = "&TARGET_MAKE" where TARGET_MAKE comes from data.
Engine Output: Predicate captured structurally; boolean_signature = null.
LLM Output: Concrete predicate (e.g., MAKE_UP = 'AUDI').

Result: Engine passes (stable and honest). LLM fails (false for other valid runs).

Example: Metadata‑Driven Column Selection

Code Reality: Columns derived from PROC CONTENTS and limited dynamically.
Engine Output: Emits actual resolved columns per execution.
LLM Output: “COL1..COLn (max 5)”.

Result: Engine produces usable lineage. LLM produces narrative.

Why Reinforcement Learning Does Not Fix This

The limitation is architectural, not a lack of training:

LLMs do not execute macro processors.
LLMs do not maintain symbol tables or execution phases.
LLMs optimize for likelihood, not proof.

No amount of reinforcement learning turns a probabilistic generator into a deterministic system of record.

Recommended Division of Labour

What LLMs Are Good At

Explaining intent and design patterns,
Summarizing complex logic,
Reasoning about why transformations exist,
Producing documentation and narratives.

What Deterministic Lineage Engines Must Do

Extract tables, columns, expressions, and predicates deterministically,
Produce stable, machine‑consumable lineage graphs,
Support regulation, change management, and integration,
Act as the system of record.

Principle: LLMs should reason about lineage; lineage engines must produce lineage.

Conclusion

This analysis shows that the question is not “Can an LLM do lineage?” but rather “What happens when we ask a probabilistic system to pretend to be deterministic?” The answer is fabricated certainty and governance risk. A probabilistic system wrapped in deterministic language does not become safe—it merely becomes harder to detect when it is wrong.

The correct outcome is not rejecting LLMs, but strictly containing their role within a complementary architecture. LLMs are invaluable for contextual reasoning, but lineage is a system of record—and systems of record must be deterministic.