LLMs Overused Overpriced Overhyped | Generated by AI

Home 2026.03

Question: What are unpopular or contrarian opinions about calling LLM APIs? For example, the idea that LLMs are slow and expensive compared to regular code execution, and that you should only use them when truly necessary.

Answer:

This is a genuinely important engineering perspective that doesn’t get discussed enough. Here’s a comprehensive breakdown of the unpopular (but sensible) truths about calling LLM APIs:


1. If Code Can Do It, Use Code — Not an LLM

This is perhaps the most underappreciated principle. With LLMs being the new hotness, some developers may be using them in places where it doesn’t make sense. For tasks like sending an email or a notification, you don’t need to ask an agent to do this if you already have an API for that.

The rule of thumb: if the problem is deterministic and rule-based, write a function. LLMs should be reserved for tasks that genuinely require understanding natural language, ambiguity, or creative reasoning.


2. LLMs Are Fundamentally Non-Deterministic — and That’s a Real Engineering Problem

LLMs are fundamentally non-deterministic, which means you get a different response for the same input. If you’re using reasoning models and AI agents, errors can compound when earlier mistakes are used in later steps.

Computer programs are incredibly good at being deterministic — producing the exact same result every time. Unlike humans, LLMs don’t get bored or tired or impatient. But like humans — and unlike computer programs — they do not produce the exact same results every time they are used.

This matters enormously for production software. Program code is deterministic for a reason. If your code is supposed to control something with correctness requirements, you need a testable system with proven correctness.


3. Reliability Is Expensive to Engineer Around an LLM

The moment you care about reliability, your architecture stops being “call an LLM” and starts becoming a pipeline. Input is cleaned and normalized. A generation step produces a candidate answer. Another step evaluates that answer. A routing layer decides whether to retry with a modified prompt, a different model, or a corrective pass. We are not converting probability into certainty — we are reducing uncertainty through redundancy and validation. That reduction costs computation.

In short: making an LLM-based feature truly reliable is far more engineering work than people initially assume.


4. LLMs Are Expensive — Costs Compound Fast

Token usage is the core cost driver — LLMs charge by tokens (input + output). Long prompts, unnecessary context, and verbose responses add up. Using a powerful model for every query when a simpler model or plain code would suffice inflates costs further. High latency calls also increase compute usage.

Frequent API calls significantly increase costs, and long-term costs are higher than self-hosting. Data must also be uploaded to a third-party server, posing risks of data leakage and compliance concerns.


5. Tool Calling / Agentic Loops Multiply the Problem

Your AI agent just made twelve API calls to answer a question that needed two. Each unnecessary tool call burned tokens, added latency, and pushed costs higher. Poor tool-calling behavior inflates both cost and latency through inefficient execution paths and unnecessary processing. Overly detailed tool schemas consume input tokens on every request, even when those tools are not called — if your tool definitions run 500 tokens each and you have ten tools, that’s 5,000 tokens of overhead before the user even asks a question.


6. LLMs Are Slow Compared to Regular Code Execution

Even optimized LLM APIs introduce network round-trip latency and inference time. Five sequential API calls at 200ms each means a full second of wait time, even if the LLM itself responds instantly. Serverless functions and containerized tools often have cold start penalties that add hundreds of milliseconds before actual execution begins.

For tasks like string parsing, regex, classification with known labels, sorting, or data transformation — plain code is orders of magnitude faster.


7. LLM Output Is Not Guaranteed to Be Correct

LLMs don’t really “understand” what they produce. Their currency is the probabilistic relationships between words. They can’t always distinguish between fact and fiction — some output might appear very believable but turn out to be a confidently worded untruth.

LLMs optimize for plausibility, not correctness. This means code or logic generated by an LLM may look right but behave incorrectly — sometimes catastrophically (one benchmark found a 20,000x SQLite slowdown in LLM-written code).


8. Data Privacy Is a Hidden Risk

The fine print of API usage terms for these platforms often includes the fact that they can use your data to train their systems. For customers with strict privacy requirements — such as those requiring SOC 2 compliance — sending data to external systems is a non-starter.


9. Vendor Lock-In and Availability Risk

Nobody can turn off your locally-hosted model because they need to update their hardware or because they’ve decided to discontinue support. With cloud APIs, you have no control over deprecations, pricing changes, or service outages.


When Should You Actually Use an LLM API?

Use it when the task genuinely requires:

Skip it when:


Summary Table

Concern Reality
Speed LLMs are 10–1000x slower than native code
Cost Billed per token; scales poorly at volume
Correctness Probabilistic, not guaranteed
Determinism Non-deterministic by design
Testability Hard to unit test reliably
Privacy Data sent to third-party servers
Reliability Requires extensive engineering scaffolding

The bottom line: LLMs are powerful but expensive hammers. Not every problem is a nail.


References:


Back Donate