Yesterday in AI

When your AI has a brokerage account, nobody is ready for what comes next.

Mike Robinson

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 9:46

Yesterday in AI | Friday, May 29, 2026

When your AI has a brokerage account, nobody is ready for what comes next.

Anthropic dropped Opus 4.8 yesterday, with benchmarks that beat GPT-5.5, and a research preview that lets it run hundreds of parallel sub-agents at once. AI agents now have wallets and can trade your stocks. IBM tested every major AI model on real enterprise IT work, and none of them cracked 50%. OpenAI put $250 million on the table for workers being displaced by the technology it's building. And two chip companies you've probably never thought about just hit $1 trillion each. Plus: what happens to academic research when 400 papers can be written in 12 hours.

Send us Fan Mail

Feedback? Email mike@yesterdayinai.news or connect on LinkedIn, X, or Bluesky. If you like the show, please take a minute to rate and review it so others can find it!

SPEAKER_00

Hi folks, this is Yesterday in AI, your daily digest of everything happening in the world of AI in 10 minutes or less. I'm Mike Robinson. It's Friday, May 29th, and we're watching AI go from research project to full economic force. It's in your brokerage account, and apparently your academic journals. Let's get into it. Let's start with a model drop from yesterday. Anthropic released Claude Opus 4.8 on Thursday. It scored 69.2% on Sweebench Pro, a standard coding ability benchmark, beating both GPT-5.5 and Gemini 3.1 Pro on that test, and it's reportedly four times less likely than Opus 4.7 to let code flaws slide past without flagging them. They also shipped a research preview where Claude can plan a task and spin up hundreds of parallel subagents. Think of them as smaller specialized AIs all working simultaneously, in a single session. Anthropic is calling it dynamic workflows. They're shipping new models faster than anyone expected. Opus 4.6 to 4.7 was six weeks. 4.7 to 4.8 is another six weeks. The pace of releases means if you made a platform decision three months ago, you might already be behind. For context, Opus 4.7 just led the IT Bench Enterprise benchmark we'll talk about in a moment. The new one's already out. Speaking of that benchmark, IBM and Artificial Analysis released something called IT Bench last week, and the results are a little humbling for the whole industry. The benchmark tests AI agents on real site reliability engineering work. That's the job of keeping cloud systems running when things break. We're talking diagnosing live incidents in Kubernetes, the software that manages cloud infrastructure at most big companies, reading error logs, tracing what broke and why. The kind of messy, multi-step debugging that IT teams deal with every day. And here's the number. Claude Opus 4.7 leads the leaderboard at 47%. GPT 5.5 sits at 46%. Gemini 3.1 Pro Preview scored 30%. Nobody cracked 50%. So the models we're told can replace developers can't reliably fix half the incidents a junior cloud operations engineer would handle on day one. The models are impressive and getting better fast, but good enough for demos and good enough for production are still two different things. Keep that distinction in mind when you're making AI strategy decisions because your business stakeholders are definitely not making it. One more wrinkle, Gemma 4, 31B, a large openweight model, meaning its code is publicly available and anyone can run it, scored 37% at 14 cents per task. Claude Opus, 4.7 cost $5.38 per task on the same benchmark. The performance gap doesn't justify that price gap. That'll matter more as enterprise buyers start actually measuring outcomes. Since we're measuring outcomes, here's a story that started in finance but is really about something bigger. Robinhood announced agentic trading this week. Users can connect an AI agent to a dedicated Robinhood account, set a spending budget, and let the agent trade stocks on their behalf. Gold card users also get an agentic virtual card so agents can make purchases within user-defined limits. AI agents have been asking for access to your calendar and your email. Now they're asking for your brokerage account. This is a legitimate milestone. Financial transactions are among the highest stakes actions an AI can take on your behalf. Real money, real consequences, and no undo button. Robinhood is betting users will trust this. And some will. The guardrails matter enormously here. The dedicated account structure and user set budget cap are exactly the right design patterns. You want the agent operating in a controlled, limited environment, not handed the keys to your full portfolio. The broader trend, agents are moving from information tasks to action tasks. Reading your email is one thing, spending your money is another. The governance question of who's responsible when an agent makes a bad trade, that hasn't been answered yet. Expect regulators to start asking. Now we move to an interesting data point on the money side of AI anxiety. The OpenAI Foundation, the nonprofit entity that still owns 26% of OpenAI's for-profit ARM, committed $250 million this week to help workers navigate AI-driven disruption. The money targets three areas: tracking AI's economic impact with real data, retraining workers facing near-term job loss, and building longer-term economic security through ideas like shifting tax burdens from labor to capital and exploring sovereign wealth funds, government-managed investment accounts that distribute national wealth directly to citizens. It's a real commitment from a meaningful source. $250 million is also, to put it plainly, a down payment on a bill that could run into the trillions if the displacement projections from the New York City Comptroller, the Fed, and others turn out to be right. OpenAI says it'll announce its first initiatives later this year. I want to be fair here. A company that's contributing to disruption funding research and retraining to address that disruption is better than one that isn't. But there's something a little uncomfortable about a situation where the entity generating the disruption is also the one setting the terms of the relief. Government and civil society need to be running this agenda, not just benefiting from corporate philanthropy, doesn't mean the money isn't useful. It is. But don't confuse it for a solution. You know how when AI gets all the headlines, the picks and shovels business quietly becomes worth a trillion dollars? That happened twice this week. SK Heinnix and Micron, both memory chip makers, both major Nvidia suppliers, crossed the one trillion valuation mark. SK Heinnix shares jumped 10% in a single day, with the stock more than tripling this year. Micron shares jumped nearly 20% after UBS tripled its price target. These are the companies making the memory chips that go inside the systems that run the AI models. AI data centers need enormous amounts of high-speed memory, and there's a real shortage of it. We've talked before about how the memory chip supply crunch may stretch to 2030. That shortage is what's pushing these stocks into the trillion dollar club alongside Nvidia, Amazon, Apple, Microsoft, Google, and Meta. Samsung crossed $1 trillion earlier this month, so we now have three chip companies that make AI infrastructure worth a trillion dollars each. The entire infrastructure layer of AI is becoming one of the most valuable sectors in the global economy. Quietly, while everyone's arguing about whether the models can pass the bar exam. Maybe I'm being naive here, but I'm left wondering why we're busy rewarding the companies driving up our compute costs. Here's a story that should concern anyone who reads academic research or trusts published financial data. Researchers at Penn State and the University of Rochester built a pipeline using Claude and published the results in the Journal of Economic Literature. They used it to generate nearly 400 complete publication-ready academic finance papers in about 12 hours. They started with 95 market data patterns, the kind of signals analysts use to argue a stock will go up or down, fed them to the model, and got back full manuscripts, abstracts, introductions, hypothesis development, results, and citations. The papers look like human-written research. The researchers flagged the obvious risks themselves. If you can produce 400 plausible papers in half a day, peer review as a quality control mechanism starts to break down fast. There's also a more subtle problem called harking. Hypothesizing after results are known, where you fit the narrative to the data after the fact. An AI that's good at generating compelling justifications for any signal will make that problem much worse. The pipeline is real, the code is on GitHub, and anyone can run it. Journals are going to need tools to verify where research actually comes from a lot faster than most of them are moving. And if you're making investment decisions based on academic finance research, it's worth knowing that the supply of that research just became essentially infinite and much cheaper. And finally, Meta made a move that affects basically every human with a smartphone. The company launched paid subscriptions across its apps. Instagram Plus is $3.99 a month, Facebook Plus is $3.99, WhatsApp Plus is $2.99. There are also Meta1 AI plans and testing at $7.99 and $19.99 a month, plus creator and business tiers up to $49.99 a month. If you're thinking, I've been using these apps for free for 15 years, that's exactly what Meta is counting on. They've already hit saturation on users globally. There aren't many new people left to sign up, so growth has to come from existing users paying more. The subscriptions give you customization options, reaction tools, story analytics, and custom icons. Mild stuff, but enough for some people to pay. The AI plans are the more interesting part strategically. Meta is building toward a bundled subscription model, one price for all of its apps plus AI assistant features. That's the long game, and it puts Meta squarely in competition with OpenAI, Google, and Anthropic for your monthly AI subscription dollar. Just a couple of more items. If you have any feedback about this show, you can email Mike at yesterday.news, or you can find me on LinkedIn, X or Blue Sky. And if you like this podcast and want to see it continue, please be sure to rate and review it so others can find it. Thanks. That's all for this edition of Yesterday and AI. Stay curious, and I'll see you tomorrow.