I want to share a small thinking tool I've been playing with. I'm definitely not an established researcher—just someone trying to get better at evaluating ideas honestly. I've found this little habit helpful for myself, so I figured I'd write it up. Maybe it resonates with you, maybe not. Either way, here it is.
I think most of us, myself included, have a soft spot for new stuff. There's something about a shiny new idea that just feels better than the thing we already have. In research, I notice this a lot. The community has method A and method B, and someone proposes method C. You read the paper and think, "oh, this is interesting, this is fresh." But sometimes I catch myself realizing that it might not actually be better—it just feels that way because it's new.
I think novelty and progress can be different things, but at least for me, my brain has a hard time telling them apart sometimes.
So here's the little trick. It's honestly pretty simple, maybe too simple, but it works for me.
Say the community already has work A and B, and you (or someone) propose C. Instead of evaluating C in the natural order—given A and B, how good is C—try flipping it. Imagine a world where we already had A and C, and someone proposes B instead. Then see which proposal actually feels more compelling.
Formally, compare C | A,B versus B | A,C.
That's basically it. The idea is that by swapping the order, you can maybe peel away some of the novelty bias. Instead of comparing "the new thing" against "the established things," you're trying to put them on more equal footing and see what you actually think.
Let me try walking through an example that's been on my mind. Just to be clear, this is totally my personal take—you might see it differently, and I'd be curious to hear why.
A—Machines naturally reason in latent space. This isn't really new. In some sense, all of machine learning is about building better representation spaces, finding good features for next-token prediction. The "latent space" has kind of always been there, quietly doing its thing.
B—Chain-of-Thought reasoning. Someone proposes that instead of reasoning purely in hidden states, we have models reason over explicit textual tokens. Step by step, in natural language. A pretty cool idea that changed how we think about inference.
C—Latent reasoning (the recent wave), along with interpretability work trying to understand what these latent tokens actually encode.
In the natural order (A,B → propose C): latent reasoning feels exciting. We're going back to continuous representations, moving beyond the constraints of text. And it genuinely is different from text-based CoT, no question.
But I keep thinking about what we might be trading away. One of CoT's really nice properties is that humans can read it, write it, supervise it, and scale it. Both humans and machines can generate reasoning traces pretty cheaply. But latent tokens are really hard to write or verify. And it's not obvious to me what we're getting in return.
Now try flipping it. Imagine we live in a world with A and C—machines have always had latent reasoning, and people have been working on its interpretability. Now someone walks in and proposes B: "Hey, what if we just make the model reason in plain text?"
I think that's a pretty interesting pitch.
If you look at the two columns side by side, it seems like C | A,B—latent reasoning, given we already have CoT—is kind of asking us to trade some pretty concrete advantages for something less transparent. Meanwhile B | A,C—CoT, given we already have latent reasoning—feels like a really practical, scalable idea that anyone can inspect and build on.
I have my own sense of which proposal feels stronger, and you probably have yours. The point isn't really that one is definitively right—it's more that flipping the order helps me see past the initial reaction and actually think about the tradeoffs more clearly.
Here's another one I've been thinking about. The more I work with this stuff, the more I find myself appreciating it.
The Transformer architecture. I think it's easy to take it for granted—it's been around since 2017, it's everywhere, it kind of fades into the background of modern ML. But when I actually stop and think about what it gives us, it's pretty remarkable.
A lot of architectures are finicky. You spend a lot of time tuning hyperparameters, finding the right configuration, trying to make things competitive. The Transformer just kind of... stands there. (My friend @zhouxiang and I once did a funny cosplay of it—long story.) You train it, and it gives you good performance. It's really quite robust.
And the nice properties keep adding up. Transformers with CoT are theoretically Turing-complete—they can, in principle, compute anything. They maintain exact KV caches, which means fast inference and good recall over long contexts. They seem like a natural fit for post-training, for agentic RL, for a lot of the things people want to build on top.
So I tried the ABC test here too. For a new architecture that wants to replace the Transformer: imagine you already had that new architecture (C), and someone came along and proposed the Transformer (B). I think the pitch would sound something like this.
"Hey, I have this architecture that just works out of the box. It scales predictably. It has an exact memory mechanism. It supports all sorts of post-training. You barely need to tune anything. And paired with textual reasoning, it's theoretically universal."
I don't know about you, but to me that sounds like a really strong proposal. I think in a lot of cases, the Transformer-as-proposal might actually be the more compelling pitch.
This doesn't mean we should stop exploring new architectures—definitely not. But I think it helps to appreciate the baseline we're comparing against. The Transformer has earned its spot, and I think acknowledging that is just being honest with ourselves.
The ABC thing isn't a rigorous framework or anything. It's really just a mental habit—a way to pause before I get too excited about something new and check whether I actually think it's better, or whether it just feels better because it's different.
I find it keeps me a bit more honest with myself, especially when I'm evaluating my own ideas. (My own ideas always feel the most novel to me, which is probably exactly when I need this the most.)
If you have thoughts on this, or examples where this kind of thinking breaks down, or cases where flipping the order actually reveals the opposite of what you'd expect—I'd genuinely love to hear about them. This is very much a work in progress, like pretty much everything else in my research life.
Just a personal reflection, not trying to claim any authority here. If the ABC framing is useful to you, feel free to borrow it. If you think it's wrong, I'd honestly love to know why.