Itβs a gem!
Google drops Gemini 3. The benchmarks are crying.
Was this email forwarded to you? Sign up here.

Many expected Google was brewing up a new model release, but few anticipated how hard Sundar & Co were cooking.
Here's what you need to know about Gemini 3, and my verdict after putting it through its paces.
In this email:
Gemini 3 crushes the benchmarks
My test of Gemini 3 for writing (yes, it beats GPT-5)
The proverbial cat is out of the bagβGoogle launched Gemini 3 this week, and it took the number one spot across benchmarks, by margins that few saw coming.
Gemini 3 is Googleβs most advanced model yet. It can remember a lot of context, and has some new superpowers that let it activate only a subset (rather than all) of its parameters for each taskβthis sparse mixture-of-experts approach delivers frontier-level reasoning at mid-tier pricing.
A standout feature of Gemini 3 is that itβs better at figuring out the intent and context behind what you ask it. It brings you closer to a desired result with less prompting. Like Google puts it, AI has evolved from simply reading text and images to reading the room.
βAI has evolved from simply reading text and images to reading the roomβ
IN PARTNERSHIP WITH INTERCOM
Startups get Intercom 90% off and Fin AI agent free for 1 year
Join Intercomβs Startup Program to receive a 90% discount, plus Fin free for 1 year.
Get a direct line to your customers with the only complete AI-first customer service solution.
Itβs like having a full-time human support agent free for an entire year.
The benchmarks table below tells a clear story. Gemini 3 doesnβt just edge out the state of the art, it jumps ahead on the hardest tests.
Some notable examples:
Massive improvement on Humanityβs Last Exam; Gemini 3 scores 37.5%, 10 points above GPT-5.1 and 24 above Claude.
Hefty gains on math benchmarks; scores 23.4% on MathArena Apex. Claude and GPT-5.1 have less than 2% (in other words, a 10x leap).
Strong in understanding user interfaces in apps; scores 73% on ScreenSpot-Pro. Previous high score was less than half.
Probably the best coding model so far with 2,439 Elo on LiveCodeBench. It blows Claude out of the water on this one and marginally beats GPT-5.1. However, on SWE-Bench Claude holds a slight lead and GPT-5.1 is in the same ballpark, so itβs not really a clean sweep on coding.
31% on ARC-AGI-2 (reasoning puzzles). No other model had passed 20% until now.
Real-world business skills (my personal favourite): On Vending Bench 2, a benchmark that lets different AIs go wild running a vending machine business, Gemini 3 made over $5,000 in a year of simulated time. Claude earned around $4,000, and GPT-5.1 only $1,500. This isn't abstract reasoning; it's practical decision-making under real constraints.
Overall, the new Gemini is dramatically stronger than Claude 4.5 on pretty much every benchmark. Versus GPT-5.1, thereβs less of a difference in performance but still very significant.
Gemini is rolling out everywhere Google can, including in AI mode when you put in a Google search. Itβs also accessible through the Gemini App, the API, AI Studio, Vertex and the brand new Antigravity IDE (more about Antigravity further down).
Why it matters Google has been patiently playing the long-game in AI. Theyβve recently had the viral launch of Nano Banana, and a series of serious yet quiet improvements to things like AI in Search, Chrome, Workspace apps, Maps, even hardware. And now, their crown jewel, Gemini 3.
Google is building unmatched control across the value chain of hardware, models, dev tools, distribution and end-user apps.
My test of Gemini 3 for writing
I took Gemini 3 for a little test run for the same writing task I did with GPT-5 when it came out.
This simple test lets me do a rough test of image understanding, math, reasoning and vibes of the model.

The text is a story of 89 words, only sentences with 7 or fewer words and a single sentence that is fragmented.
I explain to the model that the exercise was to write a text of 100-150 words, max sentence length of 7 words and no sentence fragments allowed.
Then I ask it to check my writing against the exerciseβs constraints.
Gemini 2.5 Fast (for comparison)
If you open the Gemini app right now, youβll see two options: Fast and Thinking.
The βFastβ mode, which is easy to assume is Gemini 3, is actually Gemini 2.5 Flash under the hood.
I used it as a contrast to see how much better Gemini 3 is afterward.

Gemini 2.5 Flash (aka Gemini Fast in the app)
As you can see, Gemini Fast got most of the things wrong in this exercise, similar to when I tried it for GPT-5 Instant.
These βfastβ models are generally quite bad at math and they hallucinateβyet serve everything neatly in a table to you and present it as truth. Beware.
Fortunately, my experience with Gemini 3 was far betterβ¦
Gemini 3 Thinking gets it right

The βThinkingβ mode, on the other hand, does the new model justice. Iβd say the output is better than what I got when I tried the same with GPT-5 Thinking.
Gemini 3 correctly calculated total and sentence word count, and flagged the only sentence fragment present in the original text. Additionally, it gave me a single tip to improve punctuation (not a mistake, but a meaningful style improvement), which I appreciated.
Overall, it nailed the exercise and Iβm now thinking about how to make it more difficult for the next model release (feel free to reply to this email with your suggestions!).
Google also launched Antigravityβtheir answer to Cursor. I'm testing it on a real project right now. My early take: Cursor has real competition now.
Stay tuned for a full breakdown coming soon.
IN PARTNERSHIP WITH RUBRIK
Here's what changed: 82% of cyberattacks now target cloud environments. If you're running AI workloads thereβtraining data, model storage, agent deploymentsβyou just gave attackers more entry points.
Join IT and security leaders on December 10 who've actually dealt with this. Learn recovery strategies for bouncing back in hours, not weeks, when things go wrong.

THATβS ALL FOR THIS WEEK
Friends,
Indeed, Gemini 3 is new and exciting. It doesn't mean you need to drop everything and use it today.
There'll be another game-changing release next week, and the week after that, andβ¦
Use and test it if it makes sense for you.
Otherwise, itβs perfectly fine to just filter it out.

Was this email forwarded to you? Sign up here.
Want to get in front of 21,000+ AI builders and enthusiasts? Work with me.
This newsletter is written & shipped by Dario Chincha.
Affiliate disclosure: To cover the cost of my email software and the time I spend writing this newsletter, I sometimes link to products and other newsletters. Please assume these are affiliate links. If you choose to subscribe to a newsletter or buy a product through any of my links then THANK YOU β it will make it possible for me to continue to do this.




