The End of an Era

Short-termed retirement notice for GPT 4o-mini
Sadly, the GPT 4o-mini model series has reached its end-of-life and we needed to evaluate a new model or set of models. After almost 2 years of fully relying on this model for every task.
Which model should come next?
We’ve done thorough analysis on a number of models including Mistral, Gemma, GPT 4.1 series and the GPT 5-series and saw the best balance between accuracy, price and speed with the GPT 5-mini and GPT 5-nano models. The only downside with these models was that answers and tool-calling were considerably slower than the 4 and 4.1 series because of the reasoning involved. Setting reasoning to “none“ resulted in nonsense-responses. Not an option at all!
Upgrading models is not "Plug-and-Play"
We quickly noticed that just swapping GPT 4o-mini for GPT 5-mini was absolutely not going to be a simple task. Prompts were interpreted very differently and needed to be re-engineered and even the context needed to be re-engineered because during our tests, GPT-5-nano started to hallucinate because it got confused by our property names and chunk IDs.
This was in the beginning of 2026 and we then decided to rebuild our entire LLM-interface-layer to be as model-agnostic as possible. We now define prompts per model and per task and we also define models per task. This lets us use the cheaper GPT-5-nano model for simpler classification tasks where we can use more sophisticated models for text generation. This config can even be overridden per tenant in case one customer for instance prefers the Mistral set of models over the GPT series.
We also built evaluation tooling for every LLM task we have, allowing us to test prompt and model changes on hundreds of reference cases easily.
The research quality of our search agent got a lot better but we just had to settle for the fact that the search and chat were slower than before – it was still OK, but not “wow”.
A silent breakthrough
A few weeks ago, OpenAI quietly released GPT-5.4-mini and GPT-5.4-nano. Thanks to our earlier refactoring, and new tooling, switching over was seamless. Some minor prompt adjustments for better formatting and language adherence, that was all.
The performance gains were immediate:
- Massive Speed Increase: Tool selection and result streaming are significantly faster than both the 5-nano and 5-mini models.
- Concise Style: The output is precise and lacks the "bloat" found in earlier reasoning models. It returns to the punchy, direct style we loved in GPT-4o-mini, but with far more mature intelligence.
- Deep Research: Because it is so much faster, we can now allow the agent to perform more tool calls for deeper research without sacrificing response time.
Caveat: Price
The price for tokens is considerably higher though than the GPT 5-series. That’s why we’ve decided to continue using the 5-nano and 5-mini models for background data processing tasks while deploying the 5.4 series as default for user-facing workloads.
Our investment in a task-based model architecture has already paid off, allowing us use the best model for each job.
Thank you OpenAI for bridging this gap!

