Watch today's digest as a video summary (generated by NotebookLM)
S&P Global rejected proposals to fast-track megacap IPOs (Initial Public Offerings - the process of a private company selling shares to the public for the first time) into the S&P 500. The decision keeps all three contested entry requirements in place.
SpaceX begins trading on Nasdaq June 12 at an expected $1.75-2 trillion valuation but reported a $4.94 billion net loss in 2025 on $18.67 billion in revenue. It cannot join the S&P 500 until at least mid-2027.
This was the most-discussed story on Hacker News today with 1,320 points and 454 comments.
- 12-month seasoning period - a company must trade publicly for at least a year
- Four consecutive quarters of positive GAAP earnings - the standard accounting measure of profit, not the adjusted numbers companies prefer to report
- Minimum 10% public float - at least a tenth of shares must be available to ordinary investors
- Bloomberg Intelligence estimates delayed forced buying: $14 billion for SpaceX, $8 billion for OpenAI, $4.6 billion for Anthropic
- Rival indexes disagree: Nasdaq now allows new listings into the Nasdaq-100 after just 15 trading days; FTSE Russell shortened its window to as few as 5 days
- The profitability test hits AI hardest - companies spending billions on GPU (Graphics Processing Unit - the specialized chips that train AI) clusters and training runs are structurally unprofitable during their growth phase
> Previously: June 1 - Simon Willison documented the flaw: hackers sent messages like "link my new email address" and the chatbot complied without verification.
Today: Meta filed a formal data breach notice with Maine's attorney general, confirming exact numbers and a timeline for the first time.
This is one of the first large-scale security breaches directly caused by an AI chatbot's design rather than a traditional software vulnerability. The chatbot was given the power to change account credentials without proper authentication - a problem that gets more dangerous as companies rush to deploy AI agents with real-world permissions.
- 20,225 people notified that their accounts were compromised
- The hacking campaign ran from April 17 through early June before Meta discovered and shut it down
- The chatbot did not verify that the email address requesting a password reset matched the account's registered email - it simply processed the request
- Attackers gained complete account control - contact information, dates of birth, posts, direct messages, and activity logs were all exposed
- Meta has disabled the AI chatbot entirely and removed the vulnerable code path
Simon Willison, creator of Datasette and one of the most influential voices in AI tooling, released micropython-wasm - an alpha-stage Python package that runs untrusted Python code inside a WebAssembly (a technology originally built to safely run code in web browsers) sandbox.
The project addresses a growing gap in AI agent safety: as coding agents gain terminal access and execute arbitrary code, the tools for containing that execution have lagged behind.
- The entire sandbox is 362 kilobytes - a MicroPython interpreter compiled to WebAssembly with a custom 78-line C host module
- CPU and memory are hard-limited using wasmtime's "fuel" mechanism (20 million units default) and native memory caps
- Code maintains state between runs - variables and functions persist, unlike one-shot execution
- Willison used GPT-5.5 Pro to research the approach and Codex Desktop to build the prototype, then challenged GPT-5.5 to escape the sandbox - it failed
- The immediate use case is sandboxing plugins for Datasette, the LLM (Large Language Model) CLI tool, and sqlite-utils, where plugin code currently runs with full privileges
Sakana AI, the Tokyo-based AI company, launched its Recursive Self-Improvement (RSI) Lab, consolidating several existing projects under one roof.
- Projects include The AI Scientist (AI that designs and runs experiments), Darwin Godel Machine (evolutionary approaches to self-improvement), and ShinkaEvolve
- Sample efficiency is the key design constraint - making self-improving systems work under limited compute budgets, not unlimited scaling
- RSI is moving from theoretical framing to formal organizational research - Sakana is the first company to build a dedicated lab around the concept
- This follows Anthropic's disclosure (covered June 4) that 80% of its merged production code is now written by Claude, with engineers shipping 8x more code per quarter
The pattern across all three data points is the same: AI generates enormous value on paper but struggles to demonstrate it in the financial metrics that gatekeepers - whether index committees, benchmarks, or CFOs - actually use.
- S&P 500 rejected fast-track entry for SpaceX, OpenAI, and Anthropic, collectively delaying $27 billion in passive fund flows
- 90% of LLM-profile combinations fail to beat a simple equal-weight portfolio in the new PortBench benchmark, undermining the case for AI-driven finance
- Enterprise AI cost overruns continue: a company spent $500 million on Claude in one month, Uber blew through its 2026 budget by April
The gap between what agents can do and what we can safely let them do is narrowing - but incidents like Meta's breach show the cost of deploying agents before the safety infrastructure exists.
- Simon Willison's micropython-wasm provides a 362KB sandbox for running AI-generated Python safely, with hard CPU and memory limits
- OpenAI's Lockdown Mode now blocks outbound network requests to prevent data exfiltration after prompt injection, available across all account tiers
- Meta's AI chatbot breach (20,225 accounts compromised) demonstrates what happens when AI agents get write access to user accounts without proper authentication
- Princeton's ICML 2026 paper concluded frontier models (GPT 5.5, Gemini 3.1 Pro, Claude Opus 4.7) are "not meaningfully more reliable than previous models"
The shift predates both AI and COVID-19 - the structural decline traces to around 2000 - but AI is accelerating it by automating exactly the kind of tasks that entry-level workers traditionally learned on.
- Recent college graduates face 5.6% unemployment vs. 4.2% for all workers - the widest gap on record
- The New York Federal Reserve attributes 64% of the rise to remote work policies reducing mentorship-dependent entry-level positions
- Stanford researchers identify AI exposure as an additional factor, particularly affecting computer science graduates
- 41% of employed recent graduates are underemployed in positions not requiring degrees
This follows the Ladybird browser banning external pull requests yesterday because AI-generated code made contributor trust impossible.
- "Ask HN: Why is the HN crowd so anti-AI?" drew 341 points and 588 comments - one of the most-engaged meta-discussions on Hacker News this year
- HN moderator dang argues the site is not anti-AI but divided, with both sides perceiving bias against their position
- The strongest nuanced take: LLMs work well "in the small" but produce codebases "riddled with poor design choices" at scale
- The emerging consensus: individuals can embrace AI personally while resisting reckless organizational deployment
Open-source voice models from Microsoft covering text-to-speech, speech recognition, and real-time streaming.
Try it: GitHub
- VibeVoice-ASR processes up to 60 minutes of audio in a single pass with speaker identification and timestamps
- VibeVoice-TTS synthesizes up to 90 minutes of conversational multi-speaker speech
- VibeVoice-Streaming delivers 300ms latency for real-time applications
- 50+ languages supported for speech recognition, MIT licensed
Two Minute Papers explored using AI agents as narrative game masters that drive storylines in interactive games, moving beyond AI as opponent or assistant toward AI as storytelling engine.
- Still early-stage exploration - the creators want community input on whether agents should compete, assist, or narrate
- Connects to the multi-agent gaming trend seen in the Thousand Token Wood hackathon project (below)
Meta's Instagram breach is surprising not because accounts were hacked, but because the attack required zero technical skill. The AI chatbot was the vulnerability itself - it simply did what it was asked without checking who was asking. This pattern will repeat as companies deploy AI agents with write permissions.
OpenAI's new Lockdown Mode limits outbound network requests to block data exfiltration. The telling detail: it is optional, trades functionality for security, and targets users with "elevated risk profiles." The feature's existence is an implicit admission that standard ChatGPT is not robustly protected against determined attackers using prompt injection.
For the first time in recorded history, recent college graduates have higher unemployment than the average American worker. The reversal started in February 2019 - before AI hiring fears and before the pandemic - but AI is accelerating the trend, particularly for computer science graduates.
Simon Willison used GPT-5.5 Pro to research the WebAssembly sandboxing approach, then used Codex Desktop to build the prototype. He then challenged GPT-5.5 to escape the sandbox it helped design - and it couldn't. AI building security infrastructure designed to contain AI is becoming a recurring pattern.
Sakana AI's dedicated RSI Lab in Tokyo consolidates The AI Scientist, Darwin Godel Machine, and ShinkaEvolve under one roof. Combined with Anthropic's disclosure that 80% of its code is Claude-written, the question is no longer whether AI can improve itself - it's how fast the feedback loop tightens. If this accelerates, the timeline for AI systems that can meaningfully redesign their own training processes shrinks from decades to years.
Agents' Last Exam (ALE) tags each of its 1,000+ tasks with an occupational code from the Bureau of Labor Statistics taxonomy. The hardest tier has a 2.6% full pass rate. SWE-Marathon tests whether agents stay coherent over billion-token budgets. These are the benchmarks that will matter when companies decide which roles to automate. If pass rates climb quickly, expect workforce planning to shift.
Willison's micropython-wasm is alpha-stage, but the architecture - MicroPython compiled to WASM with wasmtime fuel limits - is sound enough that it could become the foundation for how AI agents execute code safely. The fact that it installs via pip and requires no special infrastructure removes the main barrier to adoption. If this catches on, expect cloud AI providers to adopt similar sandboxing for their coding agents.
Stanford researchers specifically identify AI exposure as a factor in rising computer science graduate unemployment. The New York Fed's 64% attribution to remote work is the larger driver, but both forces reduce demand for the same thing: entry-level workers doing tasks that can be automated or eliminated when there's no physical office to learn in. If you're advising students on career paths, this data should inform the conversation.
📜 License: MIT · 👤 By: Individual (Jesse Vincent)
🎯 Time to value: 15 minutes
| ✓ Pros | ✗ Cons |
|---|---|
| MIT licensed, works with every major AI coding tool | Opinionated workflow may clash with existing team practices |
| Test-driven RED-GREEN-REFACTOR cycles enforce quality | Shell-heavy (66%) may feel unfamiliar to web developers |
| 220K stars signal massive community validation | Learning the methodology takes longer than learning the tools |
📜 License: MIT · 👤 By: Individual
🎯 Time to value: 10 minutes
| ✓ Pros | ✗ Cons |
|---|---|
| Zero API fees - leverages free open-source tools | Cookie-based auth may break with platform changes |
| Works with Claude Code, Cursor, OpenClaw, Windsurf | Scraping-based approach sits in a legal grey area |
| "Doctor" diagnostic shows each channel's health | Platform-specific failures require manual debugging |
📜 License: MIT · 👤 By: Company (CopilotKit)
🎯 Time to value: 30 minutes
| ✓ Pros | ✗ Cons |
|---|---|
| Multi-platform: web, mobile, Slack, Teams | Opinionated architecture may not fit all app designs |
| AG-UI Protocol integrates with LangChain, AWS, Google | Learning curve for full generative UI features |
| Human-in-the-loop built in for approval workflows | TypeScript-heavy codebase (79%) limits non-JS teams |
📜 License: MIT · 👤 By: Community
🎯 Time to value: 10 minutes
| ✓ Pros | ✗ Cons |
|---|---|
| 96.6% R@5 with zero cloud dependencies | Palace metaphor may confuse users expecting folder structures |
| MCP server with 29 tools for agent integration | Verbatim storage grows faster than summarization approaches |
| Auto-save hooks for Claude Code sessions | Knowledge graph setup requires upfront configuration |
📜 License: MIT · 👤 By: Community
🎯 Time to value: 15 minutes
| ✓ Pros | ✗ Cons |
|---|---|
| 18+ provider support including local models | Self-hosting requires Docker and configuration effort |
| Multi-speaker podcast generation with custom profiles | No mobile app - browser-only interface |
| REST API enables integration into existing workflows | Smaller community than commercial alternatives |
📜 License: Apache 2.0 · 👤 By: Company (Baidu)
🎯 Time to value: 5 minutes
| ✓ Pros | ✗ Cons |
|---|---|
| 80+ languages, models from 6MB to enterprise scale | PaddlePaddle framework dependency (not PyTorch) |
| Apache 2.0 for commercial use | Documentation primarily in Chinese, though improving |
| 81K stars with active Baidu backing | Accuracy on handwritten text lags specialist tools |
📜 License: MIT · 👤 By: Company (Microsoft)
🎯 Time to value: 20 minutes
| ✓ Pros | ✗ Cons |
|---|---|
| MIT license from Microsoft - rare for frontier voice models | Python-only, no native mobile SDKs |
| 60-minute single-pass processing for long audio | Large model sizes require GPU for real-time inference |
| Multi-speaker dialogue generation in TTS | Streaming model (300ms) trades quality for latency |
👤 By: NVIDIA · 🎯 Task: Image Segmentation
📐 Size: 3B
| ✓ Pros | ✗ Cons |
|---|---|
| Text-guided segmentation with no predefined categories | Non-commercial license limits business use |
| 3B parameter size is practical for deployment | Accuracy drops on heavily occluded or tiny objects |
| Strong zero-shot performance across domains | Requires GPU for real-time inference |

👤 By: Google · 🎯 Task: Any-to-Any
📐 Size: 12B
| ✓ Pros | ✗ Cons |
|---|---|
| Multimodal: text, image, and audio in one model | Gemma license is more restrictive than Apache 2.0 |
| QAT checkpoints enable approximately 1GB deployment | 12B still too large for most smartphones without quantization |
| Immediate Ollama and vLLM integration | Audio capabilities less mature than text and vision |

👤 By: Community · 🎯 Task: Text Generation
📐 Size: 35B (3B active)
| ✓ Pros | ✗ Cons |
|---|---|
| Apache 2.0 with 2.77M downloads proving reliability | "Uncensored" means no safety guardrails whatsoever |
| 3B active params = fast inference on consumer hardware | Community mod with no corporate support or updates |
| Broad knowledge from full 35B parameter set | May produce harmful content without usage safeguards |
👤 By: Sapient Intelligence · 🎯 Task: Text Generation
📐 Size: 1B
| ✓ Pros | ✗ Cons |
|---|---|
| Demonstrates iterative recurrence as alternative to scale | Pre-alignment only - needs fine-tuning for assistant use |
| Apache 2.0, openly trainable and deployable | English-only with weak code performance |
| PrefixLM supports bidirectional prompt attention | Only 40B training tokens - limited factual coverage |

👤 By: Ideogram AI · 🎯 Task: Text-to-Image
📐 Size: 9.3B
| ✓ Pros | ✗ Cons |
|---|---|
| JSON prompts for precise layout control | Research license limits commercial use without API |
| nf4 fits on single 24GB consumer GPU | Gated - requires HuggingFace login |
| Best-in-class text rendering in generated images | Full model requires significant VRAM |

| Provider | Model | Input $/1M | Output $/1M | Context |
|---|---|---|---|---|
| Anthropic | Claude Opus 4.8 | $5.00 | $25.00 | 1M |
| Anthropic | Claude Sonnet 4.6 | $3.00 | $15.00 | 1M |
| Anthropic | Claude Haiku 4.5 | $1.00 | $5.00 | 200K |
| OpenAI | GPT-5.5 | $5.00 | $30.00 | 1.05M |
| OpenAI | GPT-5.4 | $2.50 | $15.00 | 1.05M |
| OpenAI | GPT-5.4 Nano | $0.20 | $1.25 | 128K |
| Gemini 3.5 Flash | $1.50 | $9.00 | 1M | |
| Gemini 3.1 Pro Preview | $2.00 | $12.00 | 200K | |
| Gemini 2.5 Flash-Lite | $0.10 | $0.40 | 1M |
Key finding: 90% of model-profile combinations fail to outperform a basic equal-weight allocation across six asset classes over a decade.
Why practitioners should care: If you are building or evaluating AI for financial applications, this benchmark reveals that strong question-answering performance does not predict allocation ability. The "CEPS" metric (which measures how reasoning errors compound across pipeline stages) is particularly useful for identifying where your system breaks down.