Blogs & Thoughts

Astronauts’ favorite key on keyboard.

Hanchen's ICLR 2026 Reflections on Self-Improving AI

I had the privilege of attending ICLR 2026 this year thanks to generous support from UCB Sky Lab. I helped present Agentic Context Engineering at the main conference, as well as two oral presentations at the Lifelong Agent and MemAgent workshops. As a result, much of my experience centered on agent memory and continual learning. I am writing this reflection to summarize my takeaways from conversations with other researchers. I hope this blog strikes a chord with fellow readers and not with any companies’ legal teams over leaked information. ...

Token Revenue are Growing. But That Doesn't Mean We are Not in a Bubble.

TL;DR AI token revenue is growing at historic rates — Cursor hit $2B ARR in 24 months, Anthropic crossed $14B ARR in 14 months. The “Anti-Bubble School” argues this proves the token economy is real and sustainable. But much of this revenue may be driven by a temporary exploration phase — companies overspending to “not miss the AI revolution” without validated ROI for projects. This pattern still resembles the dark fiber phenomenon. In the late 1990s, we built vast broadband infrastructure before anyone had Netflix or YouTube. The technology was right. The timeline was wrong. The investors went bankrupt. Tokens may be this generation’s broadband — genuinely needed eventually, but the current demand inflated by exploration budgets. There is real usage, but the usage might not be effective or sustainable. The question isn’t whether AI is transformative. The question is how much of today’s token demand is long-lasting, and how much is exploration money that vanishes once economy tightens. The Amazing Revenue Numbers Jensen Huang, at GTC 2026, declared: ...

We Analyzed 413K AI Agent Runs. Here's What Separates the Ones That Succeed.

TL;DR — Four Takeaways from 413K Agent Traces Test early, test often. The single strongest predictor of agent success is the fraction of early bash commands dedicated to testing. This is TDD for AI agents — and it works. Just like humans, agents need to concentrate as well. Agents that scatter edits across 3+ files early are far more likely to fail — a dose-response effect validated across all 3 dataset splits. The Single Responsibility Principle holds for agents. Agents that repeat commands are stuck. Identical bash commands in the early phase predict failure — a genuine behavioral signal, not a task-difficulty confound. Many human SWE “best practices” don’t transfer. View-before-edit, grep-before-edit, incremental TDD cycles — these intuitive principles are confounded or reversed for AI agents. Agents are not junior developers. Every day, thousands of AI agents attempt to solve real software engineering tasks. They read code, run tests, edit files, and submit patches. Each attempt leaves behind a detailed trace — a complete record of every tool call, every bash command, every file read and edit. ...

CES and Groq "Acqui-hire" Reflection: Nvidia's Plan to Build Real Time Agents?

In this blog post, we will discuss Nvidia’s recent announcements at CES and their strategic partnership with Groq, focusing on their strategy to enhance LLM agent inference. We will explore three main aspects: the importance of KV cache hits, the role of SRAM in improving decoding speed, and a proposed hardware-software architecture that potentially speeds up agent inference to real-time. Prerequisite: LLM inference basics. Suggested reading: LLM Inference; KV Cache Offloading with LMCache; LLM Agent with KV Cache ...

Why Agents are Efficiency Nightmares and How to Fix them?

Agents are EXPENSIVE. Claude Code takes 1 USD to handle a single issue in a mid-sized public repository when using API. In the same time, it charges 20 USD per month for a subscription license. Why do we still get to use them? Maybe the price war, maybe the crazy debts some companies upstream are carrying, maybe the implicit labeling you are carrying out for the providers. But regardless of the reason behind the pricing, we want to use more these powerful agents. Current models and applications are already capable of handling many complex tasks with minimal human intervention. However, the efficiency of these agents has just rised to our attention. In this blog post, we will discuss why agents are efficiency nightmares, how we can make them (somewhat) better with KV cache management tool like LMCache, and where we could be moving forward for improving agents. ...