Classical computing rests on determinism: same input, same output, every time. Search engines began blurring this principle. LLMs have abandoned it entirely. What are the implications for developers?
Since the origins of computing, determinism has been a founding axiom: given identical input, a program always produces the same output. This is what makes software testable, debuggable, and predictable. A bug is reproducible, a regression is detectable, a test passes or fails in a binary way. This implicit contract is being broken.
A gradual erosion: from determinism to ambiguity
The shift didn't start with LLMs. Search engines were the first to introduce opacity into their results: PageRank, then personalisation algorithms, vary results based on time, location, and browsing history. Two users typing the same query get different answers. The output is no longer universal โ it is contextual. Recommendation systems went further: their outputs depend on a model trained on billions of data points, whose behaviour is difficult to predict even for its designers.
LLMs: non-determinism as a feature
Large language models have made non-determinism a feature. The temperature parameter controls generation randomness:
at 0, the model is quasi-deterministic (always choosing the most probable token); at 1 or above, it explores less
probable paths. Two identical calls at temperature=0.7 will produce different responses. This is not a bug โ it's a
design choice to make responses less repetitive and more creative. But for a developer integrating an LLM into a
system, it's a paradigm shift.
The art of blurring the process
What makes LLMs particularly tricky to work with is the absence of error signals. A deterministic programme that fails throws an exception, returns an error code, or produces manifestly incorrect output. An LLM that "gets it wrong" produces a convincing, well-written, grammatically correct response โ that is factually false. No stack trace. No warning. The reasoning process is opaque by design: you get an answer, not an audited explanation of how it was produced.
Implications for developers
Integrating an LLM into a software pipeline requires rethinking several practices:
- Classical unit tests are no longer sufficient: you cannot assert an exact value on an LLM output. Test properties (the response contains valid JSON, the response mentions the correct product name) rather than values.
- Temperature=0 for critical uses: classification, structured data extraction, code generation โ minimise randomness.
- Systematic output validation: parse and validate the return format, never trust structure without verifying it.
- Human-in-the-loop: for anything with real consequences (sending emails, modifying data, business decisions), keep a human in the validation loop.
- Traceability and logging: store inputs and outputs to analyse aberrant behaviour after the fact.
Generative AI is a powerful tool โ but it demands a shift in posture: from engineering of total control to engineering of managed probability. This is not a regression; it is a new skill to develop.