LLMs Overused Overpriced Overhyped | Generated by AI
Question: What are unpopular or contrarian opinions about calling LLM APIs? For example, the idea that LLMs are slow and expensive compared to regular code execution, and that you should only use them when truly necessary.
Answer:
This is a genuinely important engineering perspective that doesn’t get discussed enough. Here’s a comprehensive breakdown of the unpopular (but sensible) truths about calling LLM APIs:
1. If Code Can Do It, Use Code — Not an LLM
This is perhaps the most underappreciated principle. With LLMs being the new hotness, some developers may be using them in places where it doesn’t make sense. For tasks like sending an email or a notification, you don’t need to ask an agent to do this if you already have an API for that.
The rule of thumb: if the problem is deterministic and rule-based, write a function. LLMs should be reserved for tasks that genuinely require understanding natural language, ambiguity, or creative reasoning.
2. LLMs Are Fundamentally Non-Deterministic — and That’s a Real Engineering Problem
LLMs are fundamentally non-deterministic, which means you get a different response for the same input. If you’re using reasoning models and AI agents, errors can compound when earlier mistakes are used in later steps.
Computer programs are incredibly good at being deterministic — producing the exact same result every time. Unlike humans, LLMs don’t get bored or tired or impatient. But like humans — and unlike computer programs — they do not produce the exact same results every time they are used.
This matters enormously for production software. Program code is deterministic for a reason. If your code is supposed to control something with correctness requirements, you need a testable system with proven correctness.
3. Reliability Is Expensive to Engineer Around an LLM
The moment you care about reliability, your architecture stops being “call an LLM” and starts becoming a pipeline. Input is cleaned and normalized. A generation step produces a candidate answer. Another step evaluates that answer. A routing layer decides whether to retry with a modified prompt, a different model, or a corrective pass. We are not converting probability into certainty — we are reducing uncertainty through redundancy and validation. That reduction costs computation.
In short: making an LLM-based feature truly reliable is far more engineering work than people initially assume.
4. LLMs Are Expensive — Costs Compound Fast
Token usage is the core cost driver — LLMs charge by tokens (input + output). Long prompts, unnecessary context, and verbose responses add up. Using a powerful model for every query when a simpler model or plain code would suffice inflates costs further. High latency calls also increase compute usage.
Frequent API calls significantly increase costs, and long-term costs are higher than self-hosting. Data must also be uploaded to a third-party server, posing risks of data leakage and compliance concerns.
5. Tool Calling / Agentic Loops Multiply the Problem
Your AI agent just made twelve API calls to answer a question that needed two. Each unnecessary tool call burned tokens, added latency, and pushed costs higher. Poor tool-calling behavior inflates both cost and latency through inefficient execution paths and unnecessary processing. Overly detailed tool schemas consume input tokens on every request, even when those tools are not called — if your tool definitions run 500 tokens each and you have ten tools, that’s 5,000 tokens of overhead before the user even asks a question.
6. LLMs Are Slow Compared to Regular Code Execution
Even optimized LLM APIs introduce network round-trip latency and inference time. Five sequential API calls at 200ms each means a full second of wait time, even if the LLM itself responds instantly. Serverless functions and containerized tools often have cold start penalties that add hundreds of milliseconds before actual execution begins.
For tasks like string parsing, regex, classification with known labels, sorting, or data transformation — plain code is orders of magnitude faster.
7. LLM Output Is Not Guaranteed to Be Correct
LLMs don’t really “understand” what they produce. Their currency is the probabilistic relationships between words. They can’t always distinguish between fact and fiction — some output might appear very believable but turn out to be a confidently worded untruth.
LLMs optimize for plausibility, not correctness. This means code or logic generated by an LLM may look right but behave incorrectly — sometimes catastrophically (one benchmark found a 20,000x SQLite slowdown in LLM-written code).
8. Data Privacy Is a Hidden Risk
The fine print of API usage terms for these platforms often includes the fact that they can use your data to train their systems. For customers with strict privacy requirements — such as those requiring SOC 2 compliance — sending data to external systems is a non-starter.
9. Vendor Lock-In and Availability Risk
Nobody can turn off your locally-hosted model because they need to update their hardware or because they’ve decided to discontinue support. With cloud APIs, you have no control over deprecations, pricing changes, or service outages.
When Should You Actually Use an LLM API?
Use it when the task genuinely requires:
- Understanding natural language or ambiguity
- Open-ended text generation, summarization, or reasoning
- Tasks where no deterministic rule system would generalize well
- Tasks where occasional errors are acceptable and reviewable by humans
Skip it when:
- A regex, SQL query, or if-statement can do it
- You need guaranteed consistent, testable output
- Latency is critical (sub-100ms requirements)
- You’re processing high-volume, cost-sensitive pipelines
- The data is private/sensitive and can’t leave your infrastructure
Summary Table
| Concern | Reality |
|---|---|
| Speed | LLMs are 10–1000x slower than native code |
| Cost | Billed per token; scales poorly at volume |
| Correctness | Probabilistic, not guaranteed |
| Determinism | Non-deterministic by design |
| Testability | Hard to unit test reliably |
| Privacy | Data sent to third-party servers |
| Reliability | Requires extensive engineering scaffolding |
The bottom line: LLMs are powerful but expensive hammers. Not every problem is a nail.
References:
- How Poor Tool Calling Behavior Increases LLM Cost and Latency
- LLMs Are Not Deterministic — DEV Community
- Deterministic Programming with LLMs
- Local AI vs APIs: A Founder’s Real-World Guide
- Reliability for Unreliable LLMs — Stack Overflow Blog
- 6 Pitfalls to Avoid with LLMs — The Enterprisers Project
- How to Save 90% on LLM API Costs