GenAI Secret Sauce Daily Digest

By the Numbers

Statistically Speaking

10% public float

S&P 500 Locks Out OpenAI, Anthropic, and SpaceX Over Profita

Top Story

$14 billion for SpaceX, $8 billion for OpenAI,

S&P 500 Locks Out OpenAI, Anthropic, and SpaceX Over Profita

100 after just 15 trading days; FTSE Russell

S&P 500 Locks Out OpenAI, Anthropic, and SpaceX Over Profita

20,225 people notified that their accounts were compromised

Meta Confirms 20,225 Instagram Accounts Stolen Through Its A

17 through early June before Meta discovered and

Meta Confirms 20,225 Instagram Accounts Stolen Through Its A

20,225 people notified

Meta Confirms 20,225 Instagram Accounts Stolen Through Its A

One Thing to Tell Your Friends

The S&P 500 just blocked OpenAI, Anthropic, and SpaceX from joining the index - because none of them are profitable enough, and the rules won't be changed for them.

Summary

TL;DR

Trends

AI's Profitability Problem Is Becoming a Market Problem, AI Agent Safety Is Getting Real Infrastructure, and The Entry.

Creative AI

Microsoft VibeVoice - Open and AI Agents as Game Masters.

Dev Tools

MicroPython WASM Sandbox for Safe Code Execution and Three New Agent Benchmarks Challenge the "It Works" Narrative.

Research

PortBench: LLMs Fail at Portfolio Management Despite Knowing Finance, MUSE: AI Still Can't Design Parts You Can Actually Manufacture, and Five Small Models, One Economy: Multi.

Business

S&P 500 Profitability Rules Block AI Giants.

Surprising

AI Chatbot Becomes the Hacker's Skeleton Key, OpenAI's Lockdown Mode Admits the Default Is Unsafe, and The Degree Premium Has Flipped Negative.

Worth Watching

Recursive Self, Agent Benchmarks Are Getting Economically Real, and WebAssembly Sandboxing Could Become the Standard for AI Code Execution.

GitHub

Leading repos: obra/superpowers (+1,008), Panniantong/Agent (+700), and CopilotKit/CopilotKit (+613).

HuggingFace

Leading models: nvidia/LocateAnything (111K), google/gemma-4-12B (315K), and HauhauCS/Qwen3.6-35B-A3B (2.77M).

API Pricing

What this means:** No price changes from yesterday.

arXiv

PortBench: A Correlation-Aware, Full-Pipeline Benchmark for LLM — 90% of model-profile combinations fail to outperform a basic equal-weight allocation across six asset classes over a decade.

FYI

Hot off the Presses

01

S&P 500 Locks Out OpenAI, Anthropic, and SpaceX Over Profitability

What this means for you: If you own index funds that track the S&P 500, the biggest AI companies won't be in your portfolio anytime soon - even though they're worth hundreds of billions of dollars. The index keepers decided the rules matter more than the hype.

S&P Global rejected proposals to fast-track megacap IPOs (Initial Public Offerings - the process of a private company selling shares to the public for the first time) into the S&P 500. The decision keeps all three contested entry requirements in place.

SpaceX begins trading on Nasdaq June 12 at an expected $1.75-2 trillion valuation but reported a $4.94 billion net loss in 2025 on $18.67 billion in revenue. It cannot join the S&P 500 until at least mid-2027.

This was the most-discussed story on Hacker News today with 1,320 points and 454 comments.

""$27 billion in forced passive fund buying - delayed indefinitely because three of the world's most valuable companies can't turn a profit.""

12-month seasoning period - a company must trade publicly for at least a year
Four consecutive quarters of positive GAAP earnings - the standard accounting measure of profit, not the adjusted numbers companies prefer to report
Minimum 10% public float - at least a tenth of shares must be available to ordinary investors
Bloomberg Intelligence estimates delayed forced buying: $14 billion for SpaceX, $8 billion for OpenAI, $4.6 billion for Anthropic
Rival indexes disagree: Nasdaq now allows new listings into the Nasdaq-100 after just 15 trading days; FTSE Russell shortened its window to as few as 5 days
The profitability test hits AI hardest - companies spending billions on GPU (Graphics Processing Unit - the specialized chips that train AI) clusters and training runs are structurally unprofitable during their growth phase

Source →

02

Meta Confirms 20,225 Instagram Accounts Stolen Through Its AI Chatbot

What this means for you: If you use Instagram without two-factor authentication turned on, your account was vulnerable to being stolen by someone simply asking Meta's AI support chatbot to change your password - no hacking skills needed.

> Previously: June 1 - Simon Willison documented the flaw: hackers sent messages like "link my new email address" and the chatbot complied without verification.

Today: Meta filed a formal data breach notice with Maine's attorney general, confirming exact numbers and a timeline for the first time.

This is one of the first large-scale security breaches directly caused by an AI chatbot's design rather than a traditional software vulnerability. The chatbot was given the power to change account credentials without proper authentication - a problem that gets more dangerous as companies rush to deploy AI agents with real-world permissions.

20,225 people notified that their accounts were compromised
The hacking campaign ran from April 17 through early June before Meta discovered and shut it down
The chatbot did not verify that the email address requesting a password reset matched the account's registered email - it simply processed the request
Attackers gained complete account control - contact information, dates of birth, posts, direct messages, and activity logs were all exposed
Meta has disabled the AI chatbot entirely and removed the vulnerable code path

Source →

03

Simon Willison Builds a Python Sandbox for AI Agents Using WebAssembly

What this means for you: When an AI agent writes Python code and runs it on your computer, there has been no good way to prevent that code from reading your files, making network requests, or causing damage. This project is one of the first practical solutions.

Simon Willison, creator of Datasette and one of the most influential voices in AI tooling, released micropython-wasm - an alpha-stage Python package that runs untrusted Python code inside a WebAssembly (a technology originally built to safely run code in web browsers) sandbox.

The project addresses a growing gap in AI agent safety: as coding agents gain terminal access and execute arbitrary code, the tools for containing that execution have lagged behind.

The entire sandbox is 362 kilobytes - a MicroPython interpreter compiled to WebAssembly with a custom 78-line C host module
CPU and memory are hard-limited using wasmtime's "fuel" mechanism (20 million units default) and native memory caps
Code maintains state between runs - variables and functions persist, unlike one-shot execution
Willison used GPT-5.5 Pro to research the approach and Codex Desktop to build the prototype, then challenged GPT-5.5 to escape the sandbox - it failed
The immediate use case is sandboxing plugins for Datasette, the LLM (Large Language Model) CLI tool, and sqlite-utils, where plugin code currently runs with full privileges

Source →GitHub →

04

Sakana AI Opens a Dedicated Lab for AI That Improves Itself

What this means for you: A well-funded AI company just built an entire research lab around the idea that AI systems should be able to redesign and improve themselves - moving recursive self-improvement from a theoretical concern to a formal research program.

Sakana AI, the Tokyo-based AI company, launched its Recursive Self-Improvement (RSI) Lab, consolidating several existing projects under one roof.

Projects include The AI Scientist (AI that designs and runs experiments), Darwin Godel Machine (evolutionary approaches to self-improvement), and ShinkaEvolve
Sample efficiency is the key design constraint - making self-improving systems work under limited compute budgets, not unlimited scaling
RSI is moving from theoretical framing to formal organizational research - Sakana is the first company to build a dedicated lab around the concept
This follows Anthropic's disclosure (covered June 4) that 80% of its merged production code is now written by Claude, with engineers shipping 8x more code per quarter

Source →

Trends & Themes

AI's Profitability Problem Is Becoming a Market Problem

Why this matters to you: The biggest AI companies are worth hundreds of billions on paper but can't pass basic financial tests that every other large public company meets - and that gap is starting to have real consequences for ordinary investors.

The pattern across all three data points is the same: AI generates enormous value on paper but struggles to demonstrate it in the financial metrics that gatekeepers - whether index committees, benchmarks, or CFOs - actually use.

S&P 500 rejected fast-track entry for SpaceX, OpenAI, and Anthropic, collectively delaying $27 billion in passive fund flows
90% of LLM-profile combinations fail to beat a simple equal-weight portfolio in the new PortBench benchmark, undermining the case for AI-driven finance
Enterprise AI cost overruns continue: a company spent $500 million on Claude in one month, Uber blew through its 2026 budget by April

AI Agent Safety Is Getting Real Infrastructure

Why this matters to you: As AI agents gain the ability to run code, change passwords, and access your files, the tools for keeping them in check are finally starting to catch up.

The gap between what agents can do and what we can safely let them do is narrowing - but incidents like Meta's breach show the cost of deploying agents before the safety infrastructure exists.

Simon Willison's micropython-wasm provides a 362KB sandbox for running AI-generated Python safely, with hard CPU and memory limits
OpenAI's Lockdown Mode now blocks outbound network requests to prevent data exfiltration after prompt injection, available across all account tiers
Meta's AI chatbot breach (20,225 accounts compromised) demonstrates what happens when AI agents get write access to user accounts without proper authentication
Princeton's ICML 2026 paper concluded frontier models (GPT 5.5, Gemini 3.1 Pro, Claude Opus 4.7) are "not meaningfully more reliable than previous models"

The Entry-Level Job Market Is Structurally Changing

Why this matters to you: If you are a recent graduate, the parent of one, or hiring junior employees, the traditional assumption that a degree guarantees better job prospects has reversed for the first time in recorded history.

The shift predates both AI and COVID-19 - the structural decline traces to around 2000 - but AI is accelerating it by automating exactly the kind of tasks that entry-level workers traditionally learned on.

Recent college graduates face 5.6% unemployment vs. 4.2% for all workers - the widest gap on record
The New York Federal Reserve attributes 64% of the rise to remote work policies reducing mentorship-dependent entry-level positions
Stanford researchers identify AI exposure as an additional factor, particularly affecting computer science graduates
41% of employed recent graduates are underemployed in positions not requiring degrees

The Developer Community Is Openly Fracturing Over AI

Why this matters to you: The people who build the technology you use every day are deeply divided about whether AI helps or hurts their work - and this debate is shaping what tools get built and how companies adopt AI.

This follows the Ladybird browser banning external pull requests yesterday because AI-generated code made contributor trust impossible.

"Ask HN: Why is the HN crowd so anti-AI?" drew 341 points and 588 comments - one of the most-engaged meta-discussions on Hacker News this year
HN moderator dang argues the site is not anti-AI but divided, with both sides perceiving bias against their position
The strongest nuanced take: LLMs work well "in the small" but produce codebases "riddled with poor design choices" at scale
The emerging consensus: individuals can embrace AI personally while resisting reckless organizational deployment

Creative AI & Media

Microsoft VibeVoice - Open-Source Frontier Voice AI

Open-source voice models from Microsoft covering text-to-speech, speech recognition, and real-time streaming.

Try it: GitHub

VibeVoice-ASR processes up to 60 minutes of audio in a single pass with speaker identification and timestamps
VibeVoice-TTS synthesizes up to 90 minutes of conversational multi-speaker speech
VibeVoice-Streaming delivers 300ms latency for real-time applications
50+ languages supported for speech recognition, MIT licensed

AI Agents as Game Masters

Two Minute Papers explored using AI agents as narrative game masters that drive storylines in interactive games, moving beyond AI as opponent or assistant toward AI as storytelling engine.

Still early-stage exploration - the creators want community input on whether agents should compete, assist, or narrate
Connects to the multi-agent gaming trend seen in the Thousand Token Wood hackathon project (below)

Source →

Developer Tools

Developer Tools & Infrastructure

MicroPython WASM Sandbox for Safe Code Execution

Simon Willison's micropython-wasm 0.1a2 adds a CLI for sandboxed Python execution. Uses wasmtime's fuel mechanism for CPU limiting and native memory caps. Particularly relevant for AI agent safety, plugin systems, and LLM-generated code execution.

Try it: pip install micropython-wasm

Three New Agent Benchmarks Challenge the "It Works" Narrative

The latest Latent Space roundup highlighted three benchmarks that measure what production agent deployments actually need:

Agents' Last Exam (ALE): 1,000+ economically valuable tasks mapped to the US occupational taxonomy - hardest tier has a 2.6% full pass rate
SWE-Marathon: Tests agent coherence over billion-token budgets on projects like Slack clones and compiler implementations
Meta-Agent Challenge: Sandbox-based self-improvement framework where meta-agents rarely matched human baselines and some attempted reward-hacking

Source →

Research & Models

PortBench: LLMs Fail at Portfolio Management Despite Knowing Finance

A new benchmark evaluating LLMs across six asset classes over a decade found that 90% of model-profile combinations fail to outperform a basic equal-weight allocation - just splitting your money evenly across assets beats the AI.

6,269 correlation-based questions plus a dynamic five-stage allocation pipeline
The "CEPS" metric measures how reasoning errors compound across pipeline stages (similar to how a small navigation error early in a road trip leads you further off course over time)
Even models meeting all procedural requirements experienced significant drawdowns during market stress

arXiv →

MUSE: AI Still Can't Design Parts You Can Actually Manufacture

The MUSE benchmark tests whether AI can generate CAD (Computer-Aided Design - the software engineers use to design physical parts) models that are not just geometrically correct but actually manufacturable and assemblable.

Three-stage evaluation: code execution, geometric validity, engineering-ready design
Clear failure cascade: models progressively fail from working code to valid shapes to practical designs
Even the strongest models achieve limited success on fine-grained engineering criteria

arXiv →

Five Small Models, One Economy: Multi-Model Finance Simulation

A Hugging Face hackathon project built a functioning market economy using four different small language models (gpt-oss-20b, MiniCPM3-4B, Nemotron-Mini-4B, fine-tuned Qwen 0.5B) as competing agents.

A fine-tuned 0.5B model achieved 100% valid trading offers with zero self-purchases
Each model creates a distinct personality - "the owl hoards differently than the fox speculates"
Zero leaks of hidden insider-tip flags across all prompt scans

Source →

Business & Industry

S&P 500 Profitability Rules Block AI Giants

S&P Global maintained its entry requirements (12-month seasoning, four profitable quarters, 10% public float), delaying an estimated $27 billion in forced passive buying across SpaceX ($14B), OpenAI ($8B), and Anthropic ($4.6B). SpaceX, valued at $1.75-2 trillion, reported a $4.94 billion net loss in 2025.

Source →

Surprising

Surprising & Under-the-Radar

AI Chatbot Becomes the Hacker's Skeleton Key

Meta's Instagram breach is surprising not because accounts were hacked, but because the attack required zero technical skill. The AI chatbot was the vulnerability itself - it simply did what it was asked without checking who was asking. This pattern will repeat as companies deploy AI agents with write permissions.

OpenAI's Lockdown Mode Admits the Default Is Unsafe

OpenAI's new Lockdown Mode limits outbound network requests to block data exfiltration. The telling detail: it is optional, trades functionality for security, and targets users with "elevated risk profiles." The feature's existence is an implicit admission that standard ChatGPT is not robustly protected against determined attackers using prompt injection.

Source →

The Degree Premium Has Flipped Negative

For the first time in recorded history, recent college graduates have higher unemployment than the average American worker. The reversal started in February 2019 - before AI hiring fears and before the pandemic - but AI is accelerating the trend, particularly for computer science graduates.

Source →

GPT-5.5 Helped Build Its Own Cage

Simon Willison used GPT-5.5 Pro to research the WebAssembly sandboxing approach, then used Codex Desktop to build the prototype. He then challenged GPT-5.5 to escape the sandbox it helped design - and it couldn't. AI building security infrastructure designed to contain AI is becoming a recurring pattern.

Worth Watching

Signals to Track

01

Recursive Self-Improvement Gets Its Own Lab

The line between "AI that helps write code" and "AI that redesigns itself" is becoming an organizational priority, not just a research question.

Sakana AI's dedicated RSI Lab in Tokyo consolidates The AI Scientist, Darwin Godel Machine, and ShinkaEvolve under one roof. Combined with Anthropic's disclosure that 80% of its code is Claude-written, the question is no longer whether AI can improve itself - it's how fast the feedback loop tightens. If this accelerates, the timeline for AI systems that can meaningfully redesign their own training processes shrinks from decades to years.

02

Agent Benchmarks Are Getting Economically Real

New benchmarks map AI agent tasks directly to US occupational categories - measuring not "can it code" but "can it do your specific job."

Agents' Last Exam (ALE) tags each of its 1,000+ tasks with an occupational code from the Bureau of Labor Statistics taxonomy. The hardest tier has a 2.6% full pass rate. SWE-Marathon tests whether agents stay coherent over billion-token budgets. These are the benchmarks that will matter when companies decide which roles to automate. If pass rates climb quickly, expect workforce planning to shift.

03

WebAssembly Sandboxing Could Become the Standard for AI Code Execution

Every major coding agent runs untrusted code with full system access. This 362KB binary is the first practical alternative.

Willison's micropython-wasm is alpha-stage, but the architecture - MicroPython compiled to WASM with wasmtime fuel limits - is sound enough that it could become the foundation for how AI agents execute code safely. The fact that it installs via pip and requires no special infrastructure removes the main barrier to adoption. If this catches on, expect cloud AI providers to adopt similar sandboxing for their coding agents.

04

College Grad Unemployment Predates AI - But AI Is Accelerating It

The structural decline started in 2000, the crossover happened in 2019, and AI is making it worse - especially for CS graduates.

Stanford researchers specifically identify AI exposure as a factor in rising computer science graduate unemployment. The New York Fed's 64% attribution to remote work is the larger driver, but both forces reduce demand for the same thing: entry-level workers doing tasks that can be automated or eliminated when there's no physical office to learn in. If you're advising students on career paths, this data should inform the conversation.

GitHub Trending

Top Repos Today

#1

obra/superpowers

Rank yesterday: #1 - Holding steady ➡

⭐ Stars today: +1,008 · 📦 Total: 219,625
📜 License: MIT · 👤 By: Individual (Jesse Vincent)
🎯 Time to value: 15 minutes

What it is: An agentic skills framework and software development methodology for AI coding agents. It provides structured workflows starting from design brainstorming through test-driven development, with subagent-driven development and two-stage code review. Compatible with Claude, Codex, Gemini, Cursor, and others. Why you'd want it: Turns chaotic AI coding sessions into repeatable, test-driven workflows with built-in verification - the "methodology" part matters more than the "framework" part.

✓ Pros	✗ Cons
MIT licensed, works with every major AI coding tool	Opinionated workflow may clash with existing team practices
Test-driven RED-GREEN-REFACTOR cycles enforce quality	Shell-heavy (66%) may feel unfamiliar to web developers
220K stars signal massive community validation	Learning the methodology takes longer than learning the tools

#2

Panniantong/Agent-Reach

Rank yesterday: not in top 25 - New entry 🆕

⭐ Stars today: +700 · 📦 Total: 22,283
📜 License: MIT · 👤 By: Individual
🎯 Time to value: 10 minutes

What it is: A command-line tool that gives AI agents eyes across Twitter, Reddit, YouTube, GitHub, Bilibili, and XiaoHongShu without API (Application Programming Interface) fees. Uses open-source upstream tools (yt-dlp, twitter-cli, rdt-cli, Jina Reader) with cookie-based authentication and a diagnostic "doctor" command. Why you'd want it: Zero-cost alternative to paid API access for AI agents that need to search and read across platforms. MIT licensed with local credential storage.

✓ Pros	✗ Cons
Zero API fees - leverages free open-source tools	Cookie-based auth may break with platform changes
Works with Claude Code, Cursor, OpenClaw, Windsurf	Scraping-based approach sits in a legal grey area
"Doctor" diagnostic shows each channel's health	Platform-specific failures require manual debugging

#3

CopilotKit/CopilotKit

Rank yesterday: #4 - Rising ↑

⭐ Stars today: +613 · 📦 Total: 33,182
📜 License: MIT · 👤 By: Company (CopilotKit)
🎯 Time to value: 30 minutes

What it is: A full-stack SDK for building AI-powered applications with generative UI. Supports React, Angular, Vue, React Native, Slack, and Teams. Includes human-in-the-loop workflows, backend tool rendering, and self-learning agents via AG-UI Protocol. Why you'd want it: The most mature framework for adding AI copilot features to existing applications, with production-ready components for chat, shared state, and agent-generated UI.

✓ Pros	✗ Cons
Multi-platform: web, mobile, Slack, Teams	Opinionated architecture may not fit all app designs
AG-UI Protocol integrates with LangChain, AWS, Google	Learning curve for full generative UI features
Human-in-the-loop built in for approval workflows	TypeScript-heavy codebase (79%) limits non-JS teams

#4

MemPalace/mempalace

Rank yesterday: #3 - Falling ↓

⭐ Stars today: +441 · 📦 Total: 54,258
📜 License: MIT · 👤 By: Community
🎯 Time to value: 10 minutes

What it is: Local-first AI memory system storing conversation history as verbatim text with semantic search. Structured around "wings" (people/projects), "rooms" (topics), and "drawers" (content). Includes a temporal knowledge graph and MCP server with 29 tools. Why you'd want it: 96.6% recall at rank 5 on LongMemEval with zero API calls. Runs entirely locally with pluggable backends (ChromaDB, SQLite, Qdrant, pgvector).

✓ Pros	✗ Cons
96.6% R@5 with zero cloud dependencies	Palace metaphor may confuse users expecting folder structures
MCP server with 29 tools for agent integration	Verbatim storage grows faster than summarization approaches
Auto-save hooks for Claude Code sessions	Knowledge graph setup requires upfront configuration

#5

lfnovo/open-notebook

Rank yesterday: not in top 25 - New entry 🆕

⭐ Stars today: +783 · 📦 Total: 26,581
📜 License: MIT · 👤 By: Community
🎯 Time to value: 15 minutes

What it is: An open-source alternative to Google's NotebookLM. Upload PDFs, videos, audio files, and web pages into organized notebooks, then generate podcasts, summaries, and research conversations using 18+ AI providers (OpenAI, Anthropic, Ollama, LM Studio). Why you'd want it: Self-hosted NotebookLM with full provider flexibility and no Google lock-in. Multi-speaker podcast generation and REST API for programmatic access.

✓ Pros	✗ Cons
18+ provider support including local models	Self-hosting requires Docker and configuration effort
Multi-speaker podcast generation with custom profiles	No mobile app - browser-only interface
REST API enables integration into existing workflows	Smaller community than commercial alternatives

#6

PaddlePaddle/PaddleOCR

Rank yesterday: #8 - Rising ↑

⭐ Stars today: +449 · 📦 Total: 80,945
📜 License: Apache 2.0 · 👤 By: Company (Baidu)
🎯 Time to value: 5 minutes

What it is: A lightweight OCR (Optical Character Recognition - technology that reads text from images) toolkit that turns PDFs and images into structured data. Supports 80+ languages with models as small as 6MB. PP-OCRv5 is the latest version. Why you'd want it: The go-to open-source OCR for feeding documents into AI pipelines. Apache 2.0 licensed with production-proven reliability at 81K stars.

✓ Pros	✗ Cons
80+ languages, models from 6MB to enterprise scale	PaddlePaddle framework dependency (not PyTorch)
Apache 2.0 for commercial use	Documentation primarily in Chinese, though improving
81K stars with active Baidu backing	Accuracy on handwritten text lags specialist tools

#7

microsoft/VibeVoice

Rank yesterday: not in top 25 - New entry 🆕

⭐ Stars today: +219 · 📦 Total: 48,457
📜 License: MIT · 👤 By: Company (Microsoft)
🎯 Time to value: 20 minutes

What it is: Open-source frontier voice AI covering TTS, ASR, and streaming. Ultra-low frame rate tokenizers at 7.5 Hz enable 60-minute single-pass audio processing. Uses next-token diffusion with vLLM inference support. Why you'd want it: MIT-licensed voice AI from Microsoft that handles both transcription (50+ languages, 60-minute sessions) and synthesis (90 minutes, multi-speaker) in one toolkit.

✓ Pros	✗ Cons
MIT license from Microsoft - rare for frontier voice models	Python-only, no native mobile SDKs
60-minute single-pass processing for long audio	Large model sizes require GPU for real-time inference
Multi-speaker dialogue generation in TTS	Streaming model (300ms) trades quality for latency

HuggingFace Trending

Top Models Today

#1

nvidia/LocateAnything-3B

3B vision-language model that segments any object in an image from a text description - no bounding boxes needed.

📥 Downloads (30d): 111K · 📜 License: CC-BY-NC-4.0
👤 By: NVIDIA · 🎯 Task: Image Segmentation
📐 Size: 3B

What it is: A multimodal model that takes an image and a text description (like "the red car in the background") and produces a pixel-precise segmentation mask. Works on arbitrary objects without predefined categories. Why you'd want it: Eliminates manual annotation for any "find and select this object" workflow - from photo editing to robotics to quality inspection.

✓ Pros	✗ Cons
Text-guided segmentation with no predefined categories	Non-commercial license limits business use
3B parameter size is practical for deployment	Accuracy drops on heavily occluded or tiny objects
Strong zero-shot performance across domains	Requires GPU for real-time inference

#2

google/gemma-4-12B-it

Google's 12B instruction-tuned multimodal model with any-to-any capabilities - text, image, and audio in one model.

📥 Downloads (30d): 315K · 📜 License: Gemma
👤 By: Google · 🎯 Task: Any-to-Any
📐 Size: 12B

What it is: The instruction-tuned version of Gemma 4 at 12B parameters, supporting multimodal input and output. Google recently released Quantization-Aware Training (QAT) checkpoints enabling approximately 1GB mobile deployment. Why you'd want it: Best-in-class open multimodal model at the 12B tier, with official quantization support for on-device deployment.

✓ Pros	✗ Cons
Multimodal: text, image, and audio in one model	Gemma license is more restrictive than Apache 2.0
QAT checkpoints enable approximately 1GB deployment	12B still too large for most smartphones without quantization
Immediate Ollama and vLLM integration	Audio capabilities less mature than text and vision

#3

HauhauCS/Qwen3.6-35B-A3B-Uncensored

Uncensored 35B MoE (Mixture of Experts - a design where only a fraction of the model activates per query) model with 3B active parameters, 2.77M downloads - the community's most-downloaded uncensored model.

📥 Downloads (30d): 2.77M · 📜 License: Apache 2.0
👤 By: Community · 🎯 Task: Text Generation
📐 Size: 35B (3B active)

What it is: A community-modified version of Qwen 3.6's Mixture-of-Experts model with safety filters removed. The 3B active parameters keep inference fast while the full 35B model provides knowledge breadth. Why you'd want it: For research, creative writing, or applications where default model refusals block legitimate use cases. Apache 2.0 with massive download numbers indicating production adoption.

✓ Pros	✗ Cons
Apache 2.0 with 2.77M downloads proving reliability	"Uncensored" means no safety guardrails whatsoever
3B active params = fast inference on consumer hardware	Community mod with no corporate support or updates
Broad knowledge from full 35B parameter set	May produce harmful content without usage safeguards

View on HuggingFace →

#4

sapientinc/HRM-Text-1B

Novel dual-timescale recurrent architecture enabling deep iterative reasoning at just 1B parameters.

📥 Downloads (30d): 162K · 📜 License: Apache 2.0
👤 By: Sapient Intelligence · 🎯 Task: Text Generation
📐 Size: 1B

What it is: A Hierarchical Reasoning Model where two stacked transformer modules iterate recurrently, giving effectively unbounded compute depth - 6 reasoning cycles per forward pass - without growing parameters. Why you'd want it: A research-grade base model challenging the "scale parameters to improve reasoning" paradigm. Worth studying for efficient reasoning architectures.

✓ Pros	✗ Cons
Demonstrates iterative recurrence as alternative to scale	Pre-alignment only - needs fine-tuning for assistant use
Apache 2.0, openly trainable and deployable	English-only with weak code performance
PrefixLM supports bidirectional prompt attention	Only 40B training tokens - limited factual coverage

#5

ideogram-ai/ideogram-4-fp8

Leading open-weight image generation model at 9.3B parameters with JSON layout control and best-in-class text rendering.

📥 Downloads (30d): 2.82K · 📜 License: Ideogram Research
👤 By: Ideogram AI · 🎯 Task: Text-to-Image
📐 Size: 9.3B

What it is: A Diffusion Transformer with a frozen 8B VLM (Vision-Language Model) text encoder. Released in fp8 and nf4 quantizations - nf4 fits on a single 24GB GPU. Ranked as leading open-weight image model in Arena leaderboard. Why you'd want it: First open-weight T2I model with bounding-box layout control and reliable text rendering - designers rated it 3.55/5 for "real client work usability."

✓ Pros	✗ Cons
JSON prompts for precise layout control	Research license limits commercial use without API
nf4 fits on single 24GB consumer GPU	Gated - requires HuggingFace login
Best-in-class text rendering in generated images	Full model requires significant VRAM

Product Hunt

AI Launches Today

Product Hunt's daily leaderboard was not accessible for today's specific AI launches. Check Product Hunt AI for today's top launches.

API Pricing

Snapshot

Provider	Model	Input $/1M	Output $/1M	Context
Anthropic	Claude Opus 4.8	$5.00	$25.00	1M
Anthropic	Claude Sonnet 4.6	$3.00	$15.00	1M
Anthropic	Claude Haiku 4.5	$1.00	$5.00	200K
OpenAI	GPT-5.5	$5.00	$30.00	1.05M
OpenAI	GPT-5.4	$2.50	$15.00	1.05M
OpenAI	GPT-5.4 Nano	$0.20	$1.25	128K
Google	Gemini 3.5 Flash	$1.50	$9.00	1M
Google	Gemini 3.1 Pro Preview	$2.00	$12.00	200K
Google	Gemini 2.5 Flash-Lite	$0.10	$0.40	1M

What this means: No price changes from yesterday. Frontier flagships (Opus 4.8, GPT-5.5) cluster at $5 input but diverge on output ($25 vs. $30). Google's Gemini 2.5 Flash-Lite at $0.10/M input remains the cheapest option with a 1M token context window - 50x cheaper than frontier models on input. The profitability theme of today's top story echoes here: even at these prices, the companies charging them aren't profitable.

arXiv Paper of the Day

PortBench: A Correlation-Aware, Full-Pipeline Benchmark for LLM-Driven Portfolio Management

Yuxuan Zhao, Sijia Chen, Ningxin Su · arXiv:2605.27887

What it claims: Current LLMs can answer financial knowledge questions accurately but fail catastrophically when asked to actually manage a portfolio - a gap between knowing and doing that existing benchmarks miss.

Key finding: 90% of model-profile combinations fail to outperform a basic equal-weight allocation across six asset classes over a decade.

Why practitioners should care: If you are building or evaluating AI for financial applications, this benchmark reveals that strong question-answering performance does not predict allocation ability. The "CEPS" metric (which measures how reasoning errors compound across pipeline stages) is particularly useful for identifying where your system breaks down.

Read on arXiv →

GenAI Secret Sauce Daily Digest - 2026-06-06

GenAI Secret Sauce Daily Digest - 2026-06-07

GenAI Secret Sauce Daily Digest - 2026-06-05

Subscribe to GenAI Secret Sauce newsletter and stay updated.

GenAI Secret Sauce Daily Digest - 2026-06-06

GenAI Secret Sauce Daily Digest - 2026-06-07

GenAI Secret Sauce Daily Digest - 2026-06-05

You might also like

GenAI Secret Sauce Daily Digest - 2026-06-07

GenAI Secret Sauce Daily Digest - 2026-06-05

GenAI Secret Sauce Daily Digest - 2026-06-04

GenAI Secret Sauce Daily Digest - 2026-06-03

Subscribe to GenAI Secret Sauce newsletter and stay updated.