Cheapest LLM API (at public list prices)
“Cheapest” depends on the shape of your traffic. Chatbots that mostly answer in a few hundred tokens should optimize for low output $/1M. Retrieval-heavy apps that paste big contexts every turn may care more about input $/1M or effective caching. Free-tier rows with $0 list input and output can be real—but vendors still enforce quotas, rate limits, and fair-use rules that invoices do not show.
Workflow that actually saves money
1) In the pricing table, set sort to Output price ↑ and modality to Text. 2) Shortlist three candidates and run the same hundred production prompts through each—measure latency and failure rate, not vibes. 3) Multiply measured tokens by list $/1M from the table; if you batch or cache, re-check whether your vendor bills those lanes separately. 4) Sleep on “too cheap” models—sometimes the cheapest row is an old snapshot id that routers map differently.
Free tiers vs always-on APIs
List-price-zero SKUs are useful for experiments and low-volume prototypes, but production systems should plan for paid lanes with explicit rate limits. Treat zero-price rows as “start here,” not “scale forever,” unless your contract says otherwise.
When cheapest is wrong
Medical, financial, or safety-critical flows may need stronger models regardless of list price—saving cents per request while risking brand or liability is a bad trade. Use the cheap tier for volume; keep a flagship on deck for escalation paths.
Automation angle
If your goal is predictable monthly cost instead of optimizing every token, CloudyBot plans bundle AI work with hard caps—useful when you want scheduling and delivery without running your own token meter. The pricing table still helps you sanity-check what raw APIs would have cost for the same workload shape.
Related
OpenAI vs Anthropic · GPT-4o vs Claude Sonnet. All CloudyBot figures are aggregated catalog estimates; confirm on each vendor console before you publish a forecast externally.