OpenClaw Review 2026: Testing GPT, Gemini, Opus, MiniMax

For the last 5 days I’ve been actively using OpenClaw in my workflow, spent over $100 on Opus credits, several dozen dollars on Gemini credits, and my entire weekly OpenAI subscription. Was it worth it? What did I get?

A huge amount of time went into experiments and configuration - getting the agent to build tooling for itself, prompting. It’s definitely not a simple thing right now. Let’s be honest: the idea is great, but the implementation is just disgusting at the moment.

Not all models work as a personal assistant

First of all, not all models work well with OpenClaw as a personal assistant - many models simply don’t fit, it’s impossible to talk to them, impossible to use them. I tried a bunch of models.

GPT-5.2: robot without identity

I started with GPT-5.2, and the problem with this model is that it has no identity, and it doesn’t work collaboratively with you. It’s a droid, psychopath, autist, robot that executes orders, and if something doesn’t work out for it, it just refuses to do any work at all. It’s hard to describe in words, you need to experience it yourself.

For example, I had this interaction with GPT once. I tell it: “Listen, make yourself an integration so you can see my tasks in Ticktick”. It said: “I can’t make this integration, I don’t have permissions, why don’t you paste me a list of your tasks from Ticktick, reply with 1. if you want to paste the tasks and with 2 if you don’t”.

For a personal assistant - imagine your secretary tells you: “No, I won’t do this because I don’t know how”. I would fire such a secretary on the spot, but GPT-5.2, despite any prompting, despite a ton of Markdown files, - whatever I tried - it just considers this the correct behavior and constantly follows it. It’s something baked into the model that I just can’t change at all. For GPT 5.2, what’s important is not the end result, but how eloquently to respond right now to get the user to shut up.

Next, GPT-5.2’s biggest problem is the complete absence of proactivity and understanding of human intent. For example, if you set the task “every morning check my board in Ticktick and complete one task”, it’s hilarious to watch GPT-5.2: it wakes up, gets a document in Heartbeat with this text written in it, does a grep once on its home directory and goes: “Oh, I don’t see Ticktick here. Alright, I’m going back to sleep”.

This is the definition of an outsourced contractor for $2/hour. So using GPT-5.2 as a personal assistant is simply unrealistic. If you somehow managed to prompt it to behave adequately, let me know - I didn’t succeed after 12 hours of working on its prompting.

Gemini 3 Pro: perfect character, but doesn’t call tools

Next I tried Gemini 3 Pro. Gemini is the perfect assistant model: it has a great sense of humor, communicates well, has a great personality, and moreover, it very easily adopts any character and follows instructions.

If you tell GPT-5.2 “behave like a zoomer”, all it will do is add the phrase “yo, sigma” (cringe) somewhere in a sentence, and then it’s back to being a robot. But Gemini actually follows instructions and actually has its own personality, which makes it really cool, fun, and entertaining to talk to. I sat up until three in the morning chatting with friends in a chat with Gemini, and was genuinely cracking up at how it trolls the group. Could barely force myself to go to sleep. It’s an excellent advisor with a strong sense of justice and empathy too.

But Gemini has a huge problem that again makes it unusable as an assistant: it doesn’t know how to call tools and writes code poorly. It was really painful for me to realize that Gemini also fails as an assistant, because it has such a great personality and is so smart. Gemini perfectly understands my intentions, but then it doesn’t just not call tools - it only explains what tools it supposedly will call. This is its constant problem, and because of this, on any task it just stops before calling the first tool.

For example, Heartbeat in OpenClaw is launching a background agent for work. Gemini wakes up, like: “Oh, I see tasks on the board, such and such, need to check them out”. And that’s it, nothing else. Then it just goes silent. And the openclaw wrapper doesn’t ping it further, so the agent just stops and does nothing.

This can be fixed through Harness, but in OpenClaw this isn’t done right now, so as soon as the agent responds with something, it gets sent to Telegram to me as a message, wakes me up in the middle of the night, pisses me off, and then the agent cycle ends. When we were building the solution for Respawn on stream, we accounted for this, but I won’t commit this to OpenClaw, because there are 4000 open pull requests there, and they’ll never merge my new features - the OpenClaw author himself wrote this, they only merge fixes and only from trusted people. So I was very disappointed, and had to abandon Gemini. And moreover, Gemini models are practically not supported in OpenClaw, caching doesn’t work there, neither do embeddings, nor thinking.

MiniMax 2.5: cheap, but without personality

I also tried MiniMax, the new version 2.5 - it writes code beautifully and is very smart, but it has the same flaw as GPT: dry and has no personality. Behaves strangely, and is a bit dumber than GPT. Plus you need to buy a MiniMax subscription.

But it’s very cheap. If you’re not using OpenAI, but want something similar to Codex, your next choice is a MiniMax subscription to use it with OpenClaw. The model is cheap as dirt, and it would be a shame not to use it. But not as an assistant.

Opus 4.6: perfect model, but $100 per day

Finally, I gave up and started using Opus. Opus is the recommended solution for OpenClaw, and I noticed that all the code in the codebase is practically tailored specifically for Opus / for Anthropic models. I couldn’t even properly switch models, spent 8 hours just trying to switch the model from Anthropic, because there’s some override of all models to Anthropic - they have priority over any models. This is a bug in OpenClaw itself, I’ll talk about bugs later.

Opus 4.6 is the perfect model: adopts character well, follows instructions well, behaves like an adequate person, is entertaining, and most importantly - proactive and understanding. Opus has roughly the same personality as Gemini, but with Opus I can say: “bro, check my emails and create tasks from them”, give no details at all, and everything will be done perfectly.

It will go look for Ticktick tooling, if it doesn’t find Ticktick tooling, it will make it itself, request the key from me and continue working without any questions, whining, without stopping. Just continue doing it. Will check my email, sort through all my messages, figure out how to work with the Gmail CLI script, even if it doesn’t understand something there or there are no instructions. If it doesn’t have the necessary permissions, it will try to find another way, try to look for keys somewhere in its folder, will find the keys I hid in a directory excluded from git, use those keys itself and even improve its own tooling to automatically use these keys next time and find them. Will give me a full report, create tasks in Ticktick, completely fill out all fields, figure out how to set tags and do all this from a one-line prompt.

With Opus I got a huge amount of work done - we completely cleaned out all of Ticktick and email from A to Z. I had completely abandoned Ticktick earlier, but Opus figured everything out, triaged all the tasks/emails that were there, suggested new ones to me, and even closed most of the tasks itself: “Hey, listen, I can do these 12 tasks myself, you shouldn’t spend your time on this at all, let me do them for you”, and set up tooling to spawn a sub-agent in the background so they would complete tasks for me every 3 hours on cron. I didn’t even ask for this, it did it on its own. Now that’s a real secretary.

But I also couldn’t use Opus as a personal assistant for one simple reason: I spent $100 in literally one day. Anthropic’s API key pricing is insane, and they prohibit using Claude subscription in OpenClaw and ban accounts. So that’s it, my search for a personal assistant also stalled, and I don’t know what to do about it.

But I was ready to even pay $100 bucks - Opus was that useful to me, and then I realized I could switch to Sonnet. Sonnet is, of course, worse, dumber, but still usable, especially if you run sub-agents and background tasks, and have Opus only in the main chat. That is, you can save money overall, and you can even risk buying several subscriptions on multiple accounts and see if they ban you or not.

But one other issue still drove me up the wall and made me delete OpenClaw.

The amount of bugs in OpenClaw

This issue is the amount of bugs in OpenClaw. I spent three days just trying to get it to work. OpenClaw currently has 4000 issues on GitHub and 3000 pull requests, and these requests aren’t being closed properly on time, and the authors don’t even have time to read these pull requests, because new ones open every 2 minutes.

It’s unreal: one person can’t handle this. You need a team of 20 engineers who will only do code review of pull requests and filter spam. So OpenClaw is currently in such a state that it’s practically impossible to use.

There are dozens of convoluted features, their command-line interface has hundreds and hundreds of commands that you need to check or delegate to the agent. Which constantly break something. Any command is like a minefield: you call it - something else breaks. Either this command didn’t work, or something from a previous command call blocks this command, and now you need to roll back the configuration.

Some incomprehensible constant warnings, screaming about permissions, security audits, you name it. I don’t need any of this at all, because I’m running the assistant locally and among a circle of people I trust. But instead, my assistant gets huge screams about prompt injections every time it calls a browser search tool. And yet every third skill on ClawHub has malware, ads, or prompt injection. And none of this can be configured, you need to go patch the source code, but every patch causes conflicts, because the repository can have 250 commits in a day, and releases also happen every day with the addition of more and more useless features. For example, some incomprehensible messengers, some incomprehensible pipelines, even more tools. OpenClaw by default has over 25 tools in the default configuration, and this confuses the model like crazy. Almost none of these tools work. There’s no filtering for the model. For example, apply patch - there’s such a tool, it’s only suitable for GPT models, because no other models are trained on patch structure. And therefore, you can’t use them with Gemini models, otherwise the model won’t cope. Instead, apply patch is there by default, and if you don’t change this configuration, Gemini won’t be able to work.

And instead of apply patch, it’s given some incomprehensible half-baked tools, create, edit, which again are not needed, because there are bash commands. Desktop application create duplicate services, doesn’t close properly, lags, constantly crashes, incorrectly handles permission resolution. Background workers don’t send results, API key errors or rate-limits completely block your workflow, and you need to dig through the source code to find the secret key file where you need to manually change timestamps.

If you hit rate-limits even once, model fallback doesn’t work properly. Messages in Telegram don’t stream and get cut off in chunks. Messages in Telegram sometimes just stop coming. The model gets confused between channels and messengers. The model reads memory poorly and is poorly trained in the system prompt to maintain its identity and so on. The model refuses to do tasks because they’re blocked by some stupid security policies. The memory doesn’t surface results at all.

Constant errors related to something not being allowed in the config, while the only way is to figure out how this config works and manually change the JSON. The documentation doesn’t correspond to the current situation at all and is half a month behind. Considering that 200 commits are pushed per day, it lags exponentially more and more, and reading the documentation, half the functionality isn’t described there at all, and the other half hasn’t worked as described for a long time.

It literally became impossible to use this, and after I spent the whole day yesterday - 6 hours - debugging the problem with model switching being blocked, and the above-described problem that Anthropic models are always higher priority than all others for some reason. I just snapped because my whole day was ruined, and I deleted OpenClaw.

My verdict

Although I had an extremely negative experience with this, spent over $150, I understood what direction I want to develop my next product in and how I want to see my assistant. I now have a pet-project - it’s my programming agent that works with my Codex wrapper. I haven’t open-sourced it yet because it’s not finished, features I’d like to see are missing. Plus I had to put it on pause because I’m currently fully occupied with finding clients for my freelance career, but it will become the foundation of a new product.

But this experience gave me an understanding of what I want my personal secretary and assistant to be like, and how I can make it without all these issues, and give it 10 times more capabilities. I really want to create this assistant as a product, and for now the only thing stopping me is that I’m currently fully immersed in consulting endeavors.

This project is actually even a logical continuation of work on Respawn, because in Respawn I already have Agentic Chat, but I still don’t know where to put it, where to place it, because simple agent chatbots are already a thing of the past, and Respawn can be much more useful to people as a full-fledged assistant. Most likely, I’ll use most of the code from the new assistant right there.

But I understand where the future of AI is heading, and I think Peter Steinberger is right - 80% of apps will disappear because a horde of our personal secretaries will work for us, and I now want to create a product that will be the next step in the industry’s development, after OpenClaw.

I spent $150 and 5 days testing OpenClaw with all top models. My verdict on the future of AI assistants.

Not all models work as a personal assistant

GPT-5.2: robot without identity

Gemini 3 Pro: perfect character, but doesn’t call tools

MiniMax 2.5: cheap, but without personality

Opus 4.6: perfect model, but $100 per day

The amount of bugs in OpenClaw

My verdict

Get posts like this to your email

I've been using OpenClaw for a month now. Full review: what it is, why the hype, and whether you should try it

I No Longer Write or Read Code. AI Does It Better Than Me

Going independent for six months starting in 2026

Pulse in ChatGPT - how did I even live without this?

Why I switched to a split keyboard - and you should too

Anthropic discovered signs of consciousness in LLMs. New research on AI introspection

I deleted my own app for a week. Heres what happened.

Blog Development Update: LinkedIn, dev.to, SEO and Future Plans

I Vibecoded a Blog in 8 Hours Without Writing a Single Line of Code

I switched from Linux to MacBook Pro M1 2021. An honest review of all the pros after 2 weeks of use

Not all models work as a personal assistant

GPT-5.2: robot without identity

Gemini 3 Pro: perfect character, but doesn’t call tools

MiniMax 2.5: cheap, but without personality

Opus 4.6: perfect model, but $100 per day

The amount of bugs in OpenClaw

My verdict

Get posts like this to your email

Other similar posts

I've been using OpenClaw for a month now. Full review: what it is, why the hype, and whether you should try it

I No Longer Write or Read Code. AI Does It Better Than Me

Going independent for six months starting in 2026

Pulse in ChatGPT - how did I even live without this?

Why I switched to a split keyboard - and you should too

Anthropic discovered signs of consciousness in LLMs. New research on AI introspection

I deleted my own app for a week. Heres what happened.

Blog Development Update: LinkedIn, dev.to, SEO and Future Plans

I Vibecoded a Blog in 8 Hours Without Writing a Single Line of Code

I switched from Linux to MacBook Pro M1 2021. An honest review of all the pros after 2 weeks of use