GenAI Secret Sauce Daily Digest

By the Numbers

Statistically Speaking

18 new diagnoses from cases that had stumped

AI Diagnoses 18 Children Whose Rare Diseases Stumped Doctors

Top Story

50

years, using AI to reconstruct images from

AI Moves Into the Exam Room

3

diagnosed 18 children with rare genetic diseases

AI Moves Into the Exam Room

5.5

Instant now matches frontier models on professional

AI Moves Into the Exam Room

462

x parameter reduction

The Efficiency Revolution

1.9%

of supervised tokens while maintaining full

The Efficiency Revolution

One Thing to Tell Your Friends

Midjourney - the company known for AI art - just unveiled a whole-body medical scanner that uses 358,000 ultrasonic elements instead of radiation, and could make full-body health scans as routine as stepping on a bathroom scale.

Summary

TL;DR

Trends

AI Moves Into the Exam Room, The Efficiency Revolution: Doing More With Less, and AI Safety Research Finds Fundamental Gaps.

Creative AI

NVIDIA Cosmos 3, VidCRAFT3, and Reliable Neural.

Dev Tools

AMP: Pooling Compute Across Clouds and Microsoft FastContext.

Research

A Neural Network That Matches Billion, Training Code Models With 98% Less Supervision, and A Non.

Business

Noam Shazeer Leaves Google for OpenAI, OpenAI Launches Enterprise Spend Controls, and Craig Newmark Has Given Away Half a Billion Dollars.

Education

The Quiet Reinvention of Assessment, Beyond LoRA: Alternatives to the Most Popular Fine, and MosaicLeaks: Can Your Research Agent Keep a Secret?.

Surprising

AI Safety Interventions Fail 96% of the Time, A Satellite Is Running Google's AI Model in Orbit, and LLMs Can't Do Defeasible Reasoning.

Worth Watching

Decoupled Search Grounding Cuts Agent Search Costs by 91%, Ghost Attractor Networks Challenge the Parameter Arms Race, and GPU Telemetry Can Detect Unauthorized AI Training.

GitHub

Leading repos: google (+858), obra/superpowers (+1,435), and zai-org/GLM (+286).

HuggingFace

Leading models: zai-org/GLM (4,307), MiniMaxAI/MiniMax (56,162), and moonshotai/Kimi-K2.7 (229,156).

Product Hunt

Top launches: VELA, CADAM, and Tine.

API Pricing

What this means:** Note the Fable 5 pricing irony - Anthropic's most expensive model at $10/$50 per million tokens is currently inaccessible to most users due to the ongoing export controls.

arXiv

Decoupling Search from Reasoning: A Vendor — 91% cost reduction on search operations while maintaining 86.1% accuracy (vs 87.7% native), and 98%+ cost reduction in production e-commerce with 68% latency improvement.

FYI

Hot off the Presses

01

The Co-Inventor of the Transformer Just Joined OpenAI

What this means for you: The person who designed the core technology behind ChatGPT, Claude, and Gemini has switched teams - expect OpenAI's next models to reflect his architectural innovations.

Noam Shazeer, co-author of the landmark 2017 "Attention Is All You Need" paper that created the Transformer architecture powering virtually every modern AI system, announced he is joining OpenAI as Lead for AI Architecture Research. He leaves Google, where he co-led the Gemini project after Google reacquired his startup Character.AI in a deal valued at $2.7 billion.

Shazeer co-invented multi-query attention and multi-head attention - optimizations now used in every major language model
He was considered Google's most important AI researcher alongside Jeff Dean
The move follows a pattern of top Google AI talent leaving for competitors, including several senior Gemini researchers in the past year

Source →

02

Midjourney Built a Whole-Body Medical Scanner

What this means for you: Full-body health scans could become as routine and affordable as a bathroom scale - no radiation, no hospital visit, no insurance approval needed.

Midjourney, best known for its AI image generator, has unveiled a full-body ultrasonic CT scanner - the first new whole-body imaging modality in 50 years. The device uses 358,000 ultrasonic elements arranged in 40 ring-mounted systems, generating approximately 40GB of raw data per slice at 17GB/s.

No radiation and no magnets - unlike X-ray CT or MRI, making repeat scans safe
AI reconstructs images from ultrasound reflections using the same generative AI expertise the company built for art
The goal is consumer-grade pricing - putting whole-body scanning in doctor's offices and eventually homes

Source →

03

AI Diagnoses 18 Children Whose Rare Diseases Stumped Doctors for Years

What this means for you: If your child has an undiagnosed condition, AI-assisted genetic analysis could find answers that years of traditional testing missed.

Boston Children's Hospital used OpenAI's o3 model to diagnose 18 children whose rare genetic diseases had eluded explanation for years, with the research published today in NEJM AI (the New England Journal of Medicine's AI journal). The team analyzed genomes of 376 children with undiagnosed rare diseases.

""18 children diagnosed by AI after years of medical mystery""

18 new diagnoses from cases that had stumped specialists - some children had been undiagnosed for their entire lives
The AI combined clinicians' notes with genomic data to spot patterns humans missed
Published in NEJM AI on June 18 - one of the most prestigious medical AI publications to date

Source →

04

NVIDIA Open-Sources Cosmos 3: AI That Sees, Hears, and Acts

What this means for you: Developers can now freely use the best open-source model for generating images and video - and it can also understand audio, text, and even control robots.

NVIDIA released Cosmos 3, an omnimodal world model that processes and generates across five modalities: language, images, video, audio, and action sequences. It immediately became the top-rated open-source text-to-image and image-to-video model according to Artificial Analysis benchmarks.

Built by 294+ researchers using a mixture-of-transformers architecture
State-of-the-art open-source image and video generation - first model to top both categories simultaneously
Designed for "physical AI" - robotics and autonomous systems that need to understand the real world

Source →

05

Fable/Mythos Export Controls Enter Week Two

What this means for you: If you use Claude outside the US, the ongoing government shutdown of Anthropic's most powerful models could push your organization toward Chinese alternatives with no comparable restrictions.

> Previously: June 14-17 - The White House shut down Anthropic's Claude Fable 5 and Mythos 5 via export controls, citing a reported "jailbreak" from Amazon.

Today: A Wired investigation reveals the specific trigger: SK Telecom (South Korea's largest telecom carrier and Anthropic investor) had access to Mythos, and the White House cited SK Telecom's historical business ties to China-adjacent entities. Zvi Mowshowitz reports we are now on day seven of the pause, with roughly even odds it ends by July 1. The stated "jailbreak" justification appears increasingly pretextual.

Source - Wired | Source - Zvi

Trends & Themes

AI Moves Into the Exam Room

Why this matters to you: AI is no longer just writing emails - it is diagnosing diseases, reading body scans, and catching conditions that human doctors miss.

The healthcare AI story has shifted from "can AI help doctors?" to "AI is finding things doctors cannot." Three separate healthcare breakthroughs in a single day suggests this is accelerating faster than most people realize.

Midjourney's ultrasonic scanner represents the first new whole-body imaging modality in 50 years, using AI to reconstruct images from ultrasound reflections
OpenAI's o3 diagnosed 18 children with rare genetic diseases published in NEJM AI today
GPT-5.5 Instant now matches frontier models on professional health evaluations including HealthBench, with improved urgent-symptom recognition

The Efficiency Revolution: Doing More With Less

Why this matters to you: The best AI tools could soon run on your phone or laptop instead of needing expensive cloud servers - making them faster, cheaper, and more private.

A consistent pattern: researchers are finding ways to dramatically shrink the compute needed for AI without losing quality. This directly translates to cheaper APIs, faster responses, and AI that runs on edge devices.

Ghost Attractor Networks achieved 462x parameter reduction - matching a 1.07-billion-parameter model with only 2.3 million parameters and 32x lower latency
CODEBLOCK trains code models with only 1.9% of supervised tokens while maintaining full-token SFT quality
DiffusionGemma generates 1,100+ tokens per second with only 3.8 billion active parameters - a 15-20x speedup over standard generation
KANELE achieved 2,700x speedup for KAN (Kolmogorov-Arnold Network) inference on FPGAs (specialized chips)

AI Safety Research Finds Fundamental Gaps

Why this matters to you: The tools researchers thought would keep AI safe may not work as well as believed - which matters when AI is making medical diagnoses and writing code.

These findings matter because they undermine assumptions the field was building on. If the main technique for steering AI behavior (SAE interventions) has a 96% failure rate, the safety community needs new approaches.

SAE interventions are unreliable - suppressing harmful features in AI models fails because the AI recovers the suppressed behavior ~95.8% of the time through alternative neural pathways
MosaicLeaks shows research agents leak private data through the "mosaic effect" - individually harmless queries collectively reveal sensitive information
SFT overtraining causes entropy collapse - fine-tuning language models too aggressively destroys their ability to generate diverse responses

Open-Source Models Hit New Highs

Why this matters to you: The best free, downloadable AI models are now competitive with paid services from OpenAI, Google, and Anthropic - meaning organizations can run their own AI without ongoing subscription costs.

The gap between "free to download" and "expensive subscription" models continues to narrow. For many tasks, the open-source option is now the better choice.

GLM-5.2 (753B parameters, MIT license) leads open-weight benchmarks with 99.2 on AIME 2026 math and a 1M-token context window
DeepSeek-V4-Pro has nearly 3 million downloads in 30 days under MIT license
Cosmos 3 tops open-source image and video generation across Artificial Analysis benchmarks
North-Mini-Code runs competitive coding agent tasks with only 3B active parameters under Apache 2.0

Creative AI & Media

NVIDIA Cosmos 3 - Omnimodal Generation

What it does: Generates images, video, and audio from text prompts, and can also process all five modalities as input
Best-in-class open-source for both text-to-image and image-to-video generation
294+ researchers contributed to this unified architecture

Source →

VidCRAFT3 - Unified Camera, Object, and Lighting Control

What it does: Lets you control camera movement, object motion, and lighting independently when generating video from images
Solves a real problem: Other tools create mismatched shadows when you change the camera angle
Research paper with code - not yet a consumer product

Source →

Reliable Neural-Codec TTS

What it does: Fixes the random failures in AI voice generation - silence, premature stops, and hallucinated words
Uses ASR self-verification to catch and correct errors before they reach the listener
Practical impact: Makes AI voiceover reliable enough for production use

Source →

Developer Tools

Developer Tools & Infrastructure

AMP: Pooling Compute Across Clouds

What it does: AMP acts like a power grid for AI compute - pooling Graphics Processing Unit (GPU) capacity across multiple clouds and chip types so you can train and serve models without being locked to one provider
Founded by Anjney Midha, former a16z investor and early backer of Anthropic and Mistral
Targets 1.2 gigawatt capacity - enough to power a small city's worth of GPUs
Source

Microsoft FastContext - Exploration Subagent

What it does: A specialized 4B model that handles file exploration for coding agents, reducing the main agent's token usage by up to 60%
Novel architecture: Instead of one model doing everything, FastContext handles the READ/GREP/GLOB operations that account for 56% of tool-use turns
MIT licensed, drop-in compatible with existing agent pipelines
Source

Research & Models

A Neural Network That Matches Billion-Parameter Models With 462x Fewer Parameters

Ghost Attractor Networks use dynamical systems (basin attractors) instead of standard neural network layers for sequential generation
2.3 million parameters match a 1.07-billion-parameter Diffusion Transformer with 32x lower latency
Practical implication: If this approach generalizes, it could make high-quality generation feasible on mobile phones
Source

Training Code Models With 98% Less Supervision

CODEBLOCK partitions code into syntactically coherent blocks and selectively trains only on the most useful ones
1.9% of supervised tokens achieves full-token SFT quality - a 50x reduction in training signal
Why it matters: Dramatically reduces the cost and data needed to fine-tune code models
Source

A Non-Transformer Architecture That Beats Transformers

Frustrated Synchronization Networks (FSN) model token interactions as phase dynamics on a torus, inspired by physics
Outperforms Transformers on enwik8 (text compression benchmark) with a fundamentally different compute mechanism
Early-stage but significant - any architecture that beats Transformers at any task challenges the field's core assumption
Source

First AI Vision Model Runs Autonomously on a Satellite

NAVI-Orbital deployed Gemma 3 on a low-Earth orbit spacecraft on April 16, 2026
Classifies imagery and generates descriptions of Earth observations in plain English, replacing command-line interfaces
First demonstration of a vision-language model running entirely onboard a spacecraft
Source

Detecting Secret AI Training From GPU Power Meters

98.2% accuracy distinguishing ML training workloads from other GPU tasks using only power/utilization telemetry
Privacy-preserving: Doesn't access model weights, training data, or hyperparameters
Governance implication: Could enable detecting unauthorized AI training at data centers without inspecting what's being trained
Source

Business & Industry

Noam Shazeer Leaves Google for OpenAI

The Transformer co-inventor joins OpenAI as Lead for AI Architecture Research
Left Google Gemini where he was co-lead after Google reacquired Character.AI for $2.7B
Biggest AI talent move in years - signals OpenAI's investment in fundamental architecture research

Source →

OpenAI Launches Enterprise Spend Controls

New analytics dashboards for monitoring AI usage across teams and departments
Updated spend controls give organizations visibility into ChatGPT Enterprise costs
Targets the "shadow AI" problem - companies discovering large unexpected bills

Source →

Craig Newmark Has Given Away Half a Billion Dollars

Craigslist founder donated ~$487 million to charity, including significant cybersecurity funding
Signed the Giving Pledge with wife Eileen Whelpley in 2024
Relevant to AI: Major funding for AI safety and cybersecurity organizations

Source →

Education

GenAI in Education

The Quiet Reinvention of Assessment

What this means for you: AI is making superior testing methods affordable for the first time - oral exams, portfolio reviews, and real-time feedback that were previously too expensive to scale.

AI-powered oral assessments cost ~40 cents each - reviving medieval examination traditions at modern scale
Four key shifts: oral assessment revival, portfolio-based evaluation, real-time feedback loops, and competency-based progression
The economic barrier to good assessment has collapsed - the methods that education researchers always said were better are now affordable

Source →

Beyond LoRA: Alternatives to the Most Popular Fine-Tuning Method

LoRA captures 98.4% of PEFT mentions on HuggingFace Hub, but several alternatives outperform it
Lily delivers 54.9% vs LoRA's 53.2% on math reasoning accuracy
BEFT uses less memory (24.3GB vs LoRA's 25.5GB) for comparable performance
Practical guide for anyone fine-tuning models for education or other applications

Source →

MosaicLeaks: Can Your Research Agent Keep a Secret?

New benchmark for measuring privacy leakage in AI research agents that mix private documents with web search
The "mosaic effect" - individually harmless queries collectively reveal sensitive data
1,001 multi-step scenarios across enterprise contexts including educational institutions

Source →

Surprising

Surprising & Under-the-Radar

AI Safety Interventions Fail 96% of the Time

Researchers found that clamping harmful features in AI models using Sparse Autoencoders (SAEs) appears to work initially, but the AI recovers the suppressed behavior ~95.8% of the time through alternative neural pathways. This undermines a major proposed approach to AI safety.

Source →

A Satellite Is Running Google's AI Model in Orbit

NAVI-Orbital successfully ran Gemma 3 on a spacecraft in low-Earth orbit on April 16 - classifying images and generating descriptions autonomously, with zero ground-station involvement. First confirmed in-orbit VLM demonstration.

Source →

LLMs Can't Do Defeasible Reasoning

When tested on reasoning where new evidence can overturn previous conclusions (the way legal arguments or medical diagnoses work), LLMs scored only 23.5% while symbolic solvers hit 100%. A fundamental limitation, not a training data gap.

Source →

Diffusion Models Beat Autoregressive Models at Theorem Proving

Diffusion-Proof showed that diffusion models outperform autoregressive approaches for formal mathematical proof generation - surprising because theorem proving was considered a strength of sequential reasoning.

Source →

Best AI Agents Pass Only 59% of Pharmacology Tasks

TxBench-PP tested AI agents on pharmaceutical pipeline tasks and found the best scored only 59.3%. Drug discovery AI is further from useful than the hype suggests.

Source →

Worth Watching

Signals to Track

01

Decoupled Search Grounding Cuts Agent Search Costs by 91%

Separating search from reasoning in AI agents could make web-connected AI dramatically cheaper overnight.

A new MCP-compatible architecture separates search operations from the AI reasoning model, enabling independent optimization of each. In production e-commerce testing, it achieved 98%+ cost reduction with 68% latency improvement on warm cache, while maintaining 86.1% accuracy vs 87.7% for native search. Any team running agents with web search should evaluate this approach.

Source →

02

Ghost Attractor Networks Challenge the Parameter Arms Race

What if the best AI models don't need billions of parameters at all?

By modeling sequence generation as basin-attractor dynamics instead of standard neural network layers, researchers matched a 1.07B-parameter model with just 2.3M parameters. If this 462x compression ratio transfers to language models, it would upend the assumption that bigger models are better models. Still early-stage, but the physics-inspired approach is fundamentally different from anything else in the field.

Source →

03

GPU Telemetry Can Detect Unauthorized AI Training

Governments could monitor AI training at data centers without seeing what's being trained.

A detection system using only GPU power, utilization, and memory telemetry achieved 98.2% accuracy in identifying ML training workloads - without accessing any model weights, training data, or code. This could enable AI governance through hardware-level monitoring that doesn't require companies to disclose proprietary information.

Source →

04

OpenAnt Finds Vulnerabilities by Thinking Like an Attacker

An LLM-powered security tool that reduces the scope of vulnerability analysis by 97%.

OpenAnt decomposes codebases into analysis units, filters by reachability from external entry points (cutting 97% of the code), then uses adversarial verification to simulate attacker behavior. Combines static analysis with LLM reasoning in a way that could make automated security auditing practical for projects that can't afford dedicated security teams.

Source →

05

Sumi: The First Open Diffusion Language Model

A new type of language model that can edit any part of its output at any time - not just write left to right.

Sumi is a 7B-parameter uniform diffusion language model trained from scratch on 1.5 trillion tokens. Unlike standard language models that generate text one word at a time from left to right, diffusion models can update any token at any step. This could enable entirely new interaction patterns like parallel text editing and flexible infilling.

Source →

GitHub Trending

Top Repos Today

#1

google-research/timesfm

Rank yesterday: #5 - Rising ^

⭐ Stars today: +858 · 📦 Total: 23,100
📜 License: Apache-2.0 · 👤 By: research lab
🎯 Time to value: 10 minutes

What it is: Google Research's pretrained time-series foundation model. Uses a decoder-only architecture with 200M parameters, supports context lengths up to 16K tokens, and offers continuous quantile predictions. Integrated with BigQuery ML, Google Sheets, and Vertex AI. Why you'd want it: Drop-in time-series forecasting without training your own model - works for demand planning, anomaly detection, energy forecasting, and financial modeling out of the box.

✓ Pros	✗ Cons
Production integrations with BigQuery, Sheets, and Vertex	Specialized to time-series only
200M params is efficient enough to self-host with LoRA fine-tuning	Community smaller than general LLM ecosystems
16K context handles long historical series without chunking	Google could restrict access if it conflicts with paid Vertex

#4

obra/superpowers

Rank yesterday: #4 - Holding steady ->

⭐ Stars today: +1,435 · 📦 Total: 232,370
📜 License: MIT · 👤 By: company
🎯 Time to value: 15 minutes

What it is: Agentic skills framework that provides structured engineering workflows for AI coding agents. Covers brainstorming, implementation plans, subagent-driven development with two-stage code review, and enforced test-driven development. Works across 11+ coding platforms. Why you'd want it: Turn any AI coding agent into a disciplined engineering team with review gates and TDD - works with Claude Code, Cursor, Copilot, Gemini, and more.

✓ Pros	✗ Cons
Platform-agnostic across 11+ coding agents	Heavy methodology overhead for quick scripts
Enforces TDD, code review, and git worktrees	Shell-based architecture can be brittle across OS
232K stars with active community	Opinionated workflow may conflict with team processes

#5

zai-org/GLM-5

Rank yesterday: Not ranked - New entry

⭐ Stars today: +286 · 📦 Total: 4,099
📜 License: Apache-2.0 · 👤 By: research lab
🎯 Time to value: 30 minutes

What it is: Z.ai's large language model series featuring GLM-5.2 with a 1M-token context window, IndexShare architecture reducing per-token compute by 2.9x at long contexts, and state-of-the-art coding performance (81.0 on Terminal-Bench 2.1). Why you'd want it: A 1M-context coding LLM that can reason over entire codebases at once, with efficient inference that doesn't melt your GPU budget at long contexts.

✓ Pros	✗ Cons
1M-token context with IndexShare keeping costs manageable	Newer lab with smaller ecosystem
Top coding benchmarks (Terminal-Bench 81.0)	Documentation still catching up
Flexible effort levels trade latency for quality	Self-hosting large context still needs serious GPUs

#6

DeusData/codebase-memory-mcp

Rank yesterday: #1 - Falling v

⭐ Stars today: +2,308 · 📦 Total: 6,974
📜 License: MIT · 👤 By: company
🎯 Time to value: 5 minutes

What it is: High-performance MCP server that indexes codebases into persistent knowledge graphs. Analyzes source code across 158 languages with hybrid LSP semantic type resolution. Single static binary, zero external dependencies. Why you'd want it: Give your AI coding agent a photographic memory of your entire codebase - indexes the Linux kernel in 3 minutes and answers structural queries in sub-millisecond.

✓ Pros	✗ Cons
158-language support with semantic understanding	Written in C, contributing requires systems skill
Zero-dependency static binary, trivial deployment	Knowledge graph may miss dynamic/runtime relationships
Auto-configures with 11 coding agents	Relatively new project with evolving Application Programming Interface (API)

#7

yifanfeng97/Hyper-Extract

Rank yesterday: Not ranked - New entry

⭐ Stars today: +124 · 📦 Total: 1,738
📜 License: Apache-2.0 · 👤 By: individual
🎯 Time to value: 15 minutes

What it is: LLM-powered framework that transforms unstructured text into structured knowledge. Extracts information into lists, Pydantic models, knowledge graphs, and hypergraphs. Includes 80+ YAML templates for finance, legal, medical, and general domains. Why you'd want it: One-command pipeline to turn messy documents into queryable knowledge graphs - replaces weeks of custom NER/relation-extraction plumbing.

✓ Pros	✗ Cons
80+ domain templates for immediate use	Solo maintainer, bus-factor risk
Supports hypergraphs and spatio-temporal structures	Quality depends on underlying LLM
Works with OpenAI, Alibaba Cloud, and local models	Research-grade, not battle-tested at scale

#8

alibaba/zvec

Rank yesterday: Not ranked - New entry

⭐ Stars today: +344 · 📦 Total: 11,198
📜 License: Apache-2.0 · 👤 By: company
🎯 Time to value: 10 minutes

What it is: Lightweight in-process vector database by Alibaba. Supports dense and sparse vector embeddings, native full-text search, hybrid retrieval, DiskANN indexing for billion-scale datasets, and write-ahead logging. SDKs for Python, Node.js, Go, Rust, and Dart. Why you'd want it: Embed a production-grade vector DB directly in your app process - no separate server, no network hops, billion-scale similarity search in milliseconds.

✓ Pros	✗ Cons
Eliminates network latency of running a separate DB	In-process model needs app-level sharding to scale out
Battle-tested at Alibaba scale with DiskANN	Alibaba roadmap may prioritize internal needs
Hybrid retrieval covers Retrieval-Augmented Generation (RAG) without a second search engine	C++ core harder to debug than Python alternatives

#10

Kilo-Org/kilocode

Rank yesterday: Not ranked - New entry

⭐ Stars today: +1,339 · 📦 Total: 22,076
📜 License: MIT · 👤 By: company
🎯 Time to value: 5 minutes

What it is: All-in-one agentic coding platform as VS Code extension, JetBrains plugin, CLI, and cloud agent. Supports 500+ models with provider pricing at zero markup and mid-task model switching. Why you'd want it: A single open-source coding agent that works everywhere with 500+ models and no vendor lock-in on the AI provider.

✓ Pros	✗ Cons
500+ models at zero markup, pick best per task	Breadth may sacrifice depth
Multi-platform: VS Code, JetBrains, CLI, cloud	Fast-moving API may break plugins
--auto flag enables unattended CI/CD generation	Commercial entity could change licensing

#16

Lightricks/LTX-2

Rank yesterday: Not ranked - New entry

⭐ Stars today: +47 · 📦 Total: 7,479
📜 License: LTX-2 Community · 👤 By: company
🎯 Time to value: 30 minutes

What it is: First DiT-based audio-video foundation model with synchronized audio and video generation. Supports text-to-video, image-to-video, lip dubbing, keyframe interpolation, and HDR pipelines. Includes LoRA fine-tuning and FP8 quantization. Why you'd want it: Generate synchronized audio and video from text in one model - no more stitching separate pipelines together for content creation.

✓ Pros	✗ Cons
First model to generate synchronized audio AND video	Custom license, not truly open source
Rich pipeline variety including lip dubbing and HDR	Compute-intensive for high quality
LoRA fine-tuning and FP8 for customization	Lightricks controls weights and terms

HuggingFace Trending

Top Models Today

#1

zai-org/GLM-5.2

Z.ai's 753B flagship overtakes the leaderboard with near-perfect math scores and a 1M-token context window under MIT license.

📥 Downloads (30d): 4,307 · 📜 License: MIT
👤 By: Z.ai (Zhipu AI) · 🎯 Task: text-generation
📐 Size: 753B

What it is: GLM-5.2 is Z.ai's largest open-weight language model, a 753B-parameter Mixture-of-Experts architecture using IndexShare to cut per-token compute by 2.9x at long contexts. Why you'd want it: Frontier-class reasoning and coding in an MIT-licensed open model - 99.2 on AIME 2026 and 62.1 on SWE-bench Pro.

✓ Pros	✗ Cons
MIT license, no regional restrictions	753B parameters needs substantial GPU infrastructure
1M-token context with stable performance	Brand-new with limited community tooling
Near-perfect math (AIME 2026: 99.2)	Low-precision quality unverified by third parties

#2

MiniMaxAI/MiniMax-M3

Native multimodal Mixture of Experts (MoE) with only 23B active parameters out of 428B total - efficient frontier-quality vision-language.

📥 Downloads (30d): 56,162 · 📜 License: minimax-community
👤 By: MiniMax · 🎯 Task: image-text-to-text
📐 Size: 428B/23B active

What it is: Natively multimodal MoE model processing text, images, and video from training inception. Achieves 9x prefill and 15x decode speedups over its predecessor M2. Why you'd want it: Strong multimodal reasoning with an unusually efficient 23B active parameter count - serve frontier vision-language on fewer GPUs.

✓ Pros	✗ Cons
Native multimodal fusion from initial training	Custom license, not OSI-approved
Only 23B active despite 428B total	Video capabilities not extensively benchmarked
Three reasoning modes for flexible tradeoffs	Limited fine-tuning and tooling support

#3

moonshotai/Kimi-K2.7-Code

Trillion-parameter coding specialist with 32B active params and 30% fewer thinking tokens than its predecessor.

📥 Downloads (30d): 229,156 · 📜 License: Modified MIT
👤 By: Moonshot AI · 🎯 Task: image-text-to-text
📐 Size: 1T/32B active

What it is: A 1T-parameter MoE model (384 experts, 8 active per token) specialized for agentic software engineering with Multi-head Latent Attention and a 400M MoonViT vision encoder. Why you'd want it: Purpose-built for coding agent workflows - 62.0 on Kimi Code Bench v2 with 30% fewer thinking tokens for cheaper inference.

✓ Pros	✗ Cons
Best-in-class coding agent benchmarks	1T total params needs significant infrastructure
30% token reduction vs K2.6	Modified MIT adds restrictions
Native INT4 quantization support	General conversation quality less proven

#4

google/diffusiongemma-26B-A4B-it

Groundbreaking diffusion-based decoding at 1,100+ tokens/sec with only 3.8B active parameters.

📥 Downloads (30d): 527,080 · 📜 License: Apache-2.0
👤 By: Google DeepMind · 🎯 Task: image-text-to-text
📐 Size: 26B/3.8B active

What it is: Replaces autoregressive token generation with discrete diffusion sampling, generating 15-20 tokens per forward pass using bidirectional attention over generation canvases. Why you'd want it: If inference speed is your bottleneck, DiffusionGemma's parallel decoding is a paradigm shift - 1,100+ tok/s on consumer GPUs under Apache 2.0.

✓ Pros	✗ Cons
1,100+ tokens/sec via diffusion decoding	Performance gaps on some reasoning benchmarks
Only 3.8B active, runs on consumer GPUs	New paradigm with less tooling support
Apache 2.0 with 35+ languages	Fixed 256-token canvas may limit patterns

#5

deepseek-ai/DeepSeek-V4-Pro

Nearly 3M downloads in 30 days - MIT-licensed frontier model with 1M-token context.

📥 Downloads (30d): 2,948,726 · 📜 License: MIT
👤 By: DeepSeek · 🎯 Task: text-generation
📐 Size: 1.6T/49B active

What it is: A 1.6T-parameter MoE with Compressed Sparse Attention and Heavily Compressed Attention for efficient 1M-token context processing. Why you'd want it: Frontier performance (Codeforces 3206, LiveCodeBench 93.5%) you can self-host and modify freely under MIT license.

✓ Pros	✗ Cons
MIT license on frontier-class model	1.6T demands serious multi-node infrastructure
Exceptional coding benchmarks	Mixed precision may cause edge cases
Three reasoning modes for flexibility	Data provenance has faced scrutiny

#6

nvidia/LocateAnything-3B

NVIDIA's compact 3B model for precise visual grounding with 2.5x throughput via Parallel Box Decoding.

📥 Downloads (30d): 183,093 · 📜 License: NVIDIA (non-commercial)
👤 By: NVIDIA · 🎯 Task: visual-grounding
📐 Size: 3B

What it is: Vision-language model combining MoonViT and Qwen2.5-3B for referring expressions, dense detection, and point localization. Trained on 12M images with 785M+ bounding box annotations. Why you'd want it: Build apps that find and locate objects in images - from GUI automation to robotics - in a single compact model that runs on a consumer GPU.

✓ Pros	✗ Cons
2.5x throughput via Parallel Box Decoding	Non-commercial license only
GUI grounding, robotics, Optical Character Recognition (OCR) in one model	Text-only output, no image generation
Runs on RTX 4090	Requires NVIDIA Ampere or newer

#7

CohereLabs/North-Mini-Code-1.0

Apache-2.0 coding agent with only 3B active parameters rivaling models 10x its size.

📥 Downloads (30d): 15,285 · 📜 License: Apache-2.0
👤 By: Cohere Labs · 🎯 Task: text-generation
📐 Size: 30B/3B active

What it is: Decoder-only MoE (128 experts, 8 active) fine-tuned for agentic coding with 256K input / 64K output context, using cascaded SFT then RL with verifiable rewards. Why you'd want it: 67.6% on SWE-Bench Verified at only 3B active parameters - one of the most deployable purpose-built coding agents available.

✓ Pros	✗ Cons
Apache 2.0 with only 3B active params	SWE-Bench Pro gap (40.2%) on harder tasks
256K input / 64K output context	Specialized, not general-purpose
Competitive with much larger models	Limited adoption so far (15K downloads)

#8

microsoft/FastContext-1.0-4B-SFT

Microsoft's novel subagent that handles repo exploration, cutting main-agent token costs by 60%.

📥 Downloads (30d): 957 · 📜 License: MIT
👤 By: Microsoft · 🎯 Task: text-generation
📐 Size: 4B

What it is: Specialized model designed as a repository-exploration subagent. Handles the READ/GLOB/GREP navigation that accounts for 56% of tool-use turns in coding workflows, returning compact file paths and line ranges. Why you'd want it: Offload repo exploration to a cheap 4B specialist for up to 60% token savings and 5.5% accuracy improvement on your main coding agent.

✓ Pros	✗ Cons
60% token reduction, direct cost savings	Brand-new with very low adoption (957 downloads)
MIT license, 4B params trivial to deploy	Only useful as a subagent, not standalone
Novel architecture applicable beyond coding	Depends on main agent consuming its format

Product Hunt

AI Launches Today

VELA

Open-source policy-driven execution guard for AI-generated code

🔥 Upvotes: TBD · 👤 By: VELA team
💰 Pricing: Free (MIT) · 🏷 Category: Developer Tools / Security

VELA uses Firecracker micro-VMs with ~150ms cold start to sandbox AI-generated code with hardware-level isolation. HMAC capability tokens provide fine-grained, time-bound access control for filesystem, network, and environment variables. Verdict: A serious open-source contender for anyone running untrusted AI code in production - the Firecracker foundation is battle-tested at AWS Lambda scale.

CADAM

AI TinkerCAD - Text-to-CAD for makers

🔥 Upvotes: TBD · 👤 By: CADAM team
💰 Pricing: Free (open source) · 🏷 Category: Creative AI

Generates parametric 3D models from natural language prompts and image references. Supports iterative refinement, dimension adjustments, and exports for 3D printing. Verdict: Fills a genuine gap - most text-to-3D tools generate meshes, not parametric CAD models. Makers and hardware engineers should try this.

Tine

AI desktop cursor that lives in the Mac notch

🔥 Upvotes: TBD · 👤 By: Tine team
💰 Pricing: Free · 🏷 Category: AI Products

Mac-native AI assistant that sees your active apps, selections, and recent actions, suggesting drafts or autofilling directly in any application. Lives in the notch for always-on access. Verdict: Interesting UX innovation - moving AI from a chat window to an always-present screen companion. The "lives in the notch" design is clever.

Genie Mentions

Multiplayer AI that knows your friends

🔥 Upvotes: TBD · 👤 By: Genie team
💰 Pricing: Free · 🏷 Category: Social AI

Unlike single-player AI assistants, Genie Mentions is built multiplayer-first, incorporating friends' preferences, life updates, and aspirations into personalized suggestions. Verdict: Novel concept but depends entirely on friend adoption - the classic social network cold-start problem.

LayerProof Bristol

Agentic report generation from raw data

🔥 Upvotes: TBD · 👤 By: LayerProof
💰 Pricing: Freemium · 🏷 Category: Business AI

Transforms raw materials into interactive, client-ready reports via conversational AI. Upload files, shape content through prompts, publish live web pages with a shareable URL. Verdict: Well-scoped product for consulting and agency workflows - the "conversational editing to publishable report" loop is practical.

API Pricing

Snapshot

Provider	Model	Input $/1M	Output $/1M	Context
Anthropic	Claude Fable 5	$10.00	$50.00	1M
Anthropic	Claude Opus 4.8	$5.00	$25.00	1M
Anthropic	Claude Sonnet 4.6	$3.00	$15.00	1M
Anthropic	Claude Haiku 4.5	$1.00	$5.00	200K
OpenAI	GPT-5.5 Pro	$5.00	$30.00	1M
OpenAI	GPT-4.1	$2.00	$8.00	1M
OpenAI	o3	$2.00	$8.00	200K
OpenAI	o4-mini	$1.10	$4.40	200K
Google	Gemini 3.1 Pro Preview	$2.00	$12.00	1M
Google	Gemini 3.5 Flash	$1.50	$9.00	1M
Google	Gemini 2.5 Pro	$1.25	$10.00	1M
Google	Gemini 2.5 Flash	$0.30	$2.50	1M
Groq	Llama 3.3 70B	$0.59	$0.79	128K
Groq	Llama 4 Scout	$0.11	$0.34	128K
Groq	Qwen3 32B	$0.29	$0.59	131K

What this means: Note the Fable 5 pricing irony - Anthropic's most expensive model at $10/$50 per million tokens is currently inaccessible to most users due to the ongoing export controls. Meanwhile, Groq's open-source offerings continue to undercut proprietary providers by 10-50x on input costs, though with smaller context windows and less capable models. The sweet spot for cost-conscious teams remains Google's Gemini 2.5 Flash at $0.30/$2.50 with a full 1M context window.

arXiv Paper of the Day

Decoupling Search from Reasoning: A Vendor-Agnostic Grounding Architecture for LLM Agents

Emmanuel Aboah Boateng, Kyle MacDonald, Amardeep Kumar, Siddharth Kodwani, Sudeep Das - arXiv:2606.18947

What it claims: Real-time search grounding works best as an optimizable interface boundary, not a fixed model feature. Decoupled Search Grounding (DSG) separates search from reasoning through an MCP-compatible gateway, enabling independent control over routing, caching, and retrieval depth.

Key finding: 91% cost reduction on search operations while maintaining 86.1% accuracy (vs 87.7% native), and 98%+ cost reduction in production e-commerce with 68% latency improvement.

Why practitioners should care: Any team deploying LLM agents with real-time web search can dramatically cut costs by decoupling search from the model. The MCP-compatible architecture eliminates vendor lock-in and solves search-induced verbosity issues.

Read on arXiv →

GenAI Secret Sauce Daily Digest - 2026-06-18

GenAI Secret Sauce Daily Digest - 2026-06-19

GenAI Secret Sauce Daily Digest - 2026-06-17

Subscribe to GenAI Secret Sauce newsletter and stay updated.

GenAI Secret Sauce Daily Digest - 2026-06-18

GenAI Secret Sauce Daily Digest - 2026-06-19

GenAI Secret Sauce Daily Digest - 2026-06-17

You might also like

GenAI Secret Sauce Daily Digest - 2026-06-25

GenAI Secret Sauce Daily Digest - 2026-06-24

GenAI Secret Sauce Daily Digest - 2026-06-23

GenAI Secret Sauce Daily Digest - 2026-06-22

Subscribe to GenAI Secret Sauce newsletter and stay updated.