GenAI Secret Sauce Daily Digest - 2026-06-18

The Co-Inventor of the Transformer Just Joined OpenAI · Midjourney Built a Whole-Body Medical Scanner · AI Diagnoses 18 Children Whose Rare Diseases Stumped Doctors for Years
GenAI Secret Sauce Daily Digest - 2026-06-18

Watch today's digest as a video summary (generated by NotebookLM)

Statistically Speaking
18 new diagnoses from cases that had stumped
AI Diagnoses 18 Children Whose Rare Diseases Stumped Doctors
Top Story
50
years, using AI to reconstruct images from
AI Moves Into the Exam Room
3
diagnosed 18 children with rare genetic diseases
AI Moves Into the Exam Room
5.5
Instant now matches frontier models on professional
AI Moves Into the Exam Room
462
x parameter reduction
The Efficiency Revolution
1.9%
of supervised tokens while maintaining full
The Efficiency Revolution
One Thing to Tell Your Friends
Midjourney - the company known for AI art - just unveiled a whole-body medical scanner that uses 358,000 ultrasonic elements instead of radiation, and could make full-body health scans as routine as stepping on a bathroom scale.
TL;DR
Trends
AI Moves Into the Exam Room, The Efficiency Revolution: Doing More With Less, and AI Safety Research Finds Fundamental Gaps.
Dev Tools
AMP: Pooling Compute Across Clouds and Microsoft FastContext.
Research
A Neural Network That Matches Billion, Training Code Models With 98% Less Supervision, and A Non.
GitHub
Leading repos: google (+858), obra/superpowers (+1,435), and zai-org/GLM (+286).
HuggingFace
Leading models: zai-org/GLM (4,307), MiniMaxAI/MiniMax (56,162), and moonshotai/Kimi-K2.7 (229,156).
Product Hunt
Top launches: VELA, CADAM, and Tine.
API Pricing
What this means:** Note the Fable 5 pricing irony - Anthropic's most expensive model at $10/$50 per million tokens is currently inaccessible to most users due to the ongoing export controls.
arXiv
Decoupling Search from Reasoning: A Vendor — 91% cost reduction on search operations while maintaining 86.1% accuracy (vs 87.7% native), and 98%+ cost reduction in production e-commerce with 68% latency improvement.
Hot off the Presses
01
The Co-Inventor of the Transformer Just Joined OpenAI
What this means for you: The person who designed the core technology behind ChatGPT, Claude, and Gemini has switched teams - expect OpenAI's next models to reflect his architectural innovations.

Noam Shazeer, co-author of the landmark 2017 "Attention Is All You Need" paper that created the Transformer architecture powering virtually every modern AI system, announced he is joining OpenAI as Lead for AI Architecture Research. He leaves Google, where he co-led the Gemini project after Google reacquired his startup Character.AI in a deal valued at $2.7 billion.

  • Shazeer co-invented multi-query attention and multi-head attention - optimizations now used in every major language model
  • He was considered Google's most important AI researcher alongside Jeff Dean
  • The move follows a pattern of top Google AI talent leaving for competitors, including several senior Gemini researchers in the past year
02
Midjourney Built a Whole-Body Medical Scanner
What this means for you: Full-body health scans could become as routine and affordable as a bathroom scale - no radiation, no hospital visit, no insurance approval needed.

Midjourney, best known for its AI image generator, has unveiled a full-body ultrasonic CT scanner - the first new whole-body imaging modality in 50 years. The device uses 358,000 ultrasonic elements arranged in 40 ring-mounted systems, generating approximately 40GB of raw data per slice at 17GB/s.

  • No radiation and no magnets - unlike X-ray CT or MRI, making repeat scans safe
  • AI reconstructs images from ultrasound reflections using the same generative AI expertise the company built for art
  • The goal is consumer-grade pricing - putting whole-body scanning in doctor's offices and eventually homes
03
AI Diagnoses 18 Children Whose Rare Diseases Stumped Doctors for Years
What this means for you: If your child has an undiagnosed condition, AI-assisted genetic analysis could find answers that years of traditional testing missed.

Boston Children's Hospital used OpenAI's o3 model to diagnose 18 children whose rare genetic diseases had eluded explanation for years, with the research published today in NEJM AI (the New England Journal of Medicine's AI journal). The team analyzed genomes of 376 children with undiagnosed rare diseases.

""18 children diagnosed by AI after years of medical mystery""
  • 18 new diagnoses from cases that had stumped specialists - some children had been undiagnosed for their entire lives
  • The AI combined clinicians' notes with genomic data to spot patterns humans missed
  • Published in NEJM AI on June 18 - one of the most prestigious medical AI publications to date
04
NVIDIA Open-Sources Cosmos 3: AI That Sees, Hears, and Acts
What this means for you: Developers can now freely use the best open-source model for generating images and video - and it can also understand audio, text, and even control robots.

NVIDIA released Cosmos 3, an omnimodal world model that processes and generates across five modalities: language, images, video, audio, and action sequences. It immediately became the top-rated open-source text-to-image and image-to-video model according to Artificial Analysis benchmarks.

  • Built by 294+ researchers using a mixture-of-transformers architecture
  • State-of-the-art open-source image and video generation - first model to top both categories simultaneously
  • Designed for "physical AI" - robotics and autonomous systems that need to understand the real world
05
Fable/Mythos Export Controls Enter Week Two
What this means for you: If you use Claude outside the US, the ongoing government shutdown of Anthropic's most powerful models could push your organization toward Chinese alternatives with no comparable restrictions.

> Previously: June 14-17 - The White House shut down Anthropic's Claude Fable 5 and Mythos 5 via export controls, citing a reported "jailbreak" from Amazon.

Today: A Wired investigation reveals the specific trigger: SK Telecom (South Korea's largest telecom carrier and Anthropic investor) had access to Mythos, and the White House cited SK Telecom's historical business ties to China-adjacent entities. Zvi Mowshowitz reports we are now on day seven of the pause, with roughly even odds it ends by July 1. The stated "jailbreak" justification appears increasingly pretextual.

Source - Wired | Source - Zvi

Trends & Themes
Trends & Themes
AI Moves Into the Exam Room
Why this matters to you: AI is no longer just writing emails - it is diagnosing diseases, reading body scans, and catching conditions that human doctors miss.

The healthcare AI story has shifted from "can AI help doctors?" to "AI is finding things doctors cannot." Three separate healthcare breakthroughs in a single day suggests this is accelerating faster than most people realize.

  • Midjourney's ultrasonic scanner represents the first new whole-body imaging modality in 50 years, using AI to reconstruct images from ultrasound reflections
  • OpenAI's o3 diagnosed 18 children with rare genetic diseases published in NEJM AI today
  • GPT-5.5 Instant now matches frontier models on professional health evaluations including HealthBench, with improved urgent-symptom recognition
The Efficiency Revolution: Doing More With Less
Why this matters to you: The best AI tools could soon run on your phone or laptop instead of needing expensive cloud servers - making them faster, cheaper, and more private.

A consistent pattern: researchers are finding ways to dramatically shrink the compute needed for AI without losing quality. This directly translates to cheaper APIs, faster responses, and AI that runs on edge devices.

  • Ghost Attractor Networks achieved 462x parameter reduction - matching a 1.07-billion-parameter model with only 2.3 million parameters and 32x lower latency
  • CODEBLOCK trains code models with only 1.9% of supervised tokens while maintaining full-token SFT quality
  • DiffusionGemma generates 1,100+ tokens per second with only 3.8 billion active parameters - a 15-20x speedup over standard generation
  • KANELE achieved 2,700x speedup for KAN (Kolmogorov-Arnold Network) inference on FPGAs (specialized chips)
AI Safety Research Finds Fundamental Gaps
Why this matters to you: The tools researchers thought would keep AI safe may not work as well as believed - which matters when AI is making medical diagnoses and writing code.

These findings matter because they undermine assumptions the field was building on. If the main technique for steering AI behavior (SAE interventions) has a 96% failure rate, the safety community needs new approaches.

  • SAE interventions are unreliable - suppressing harmful features in AI models fails because the AI recovers the suppressed behavior ~95.8% of the time through alternative neural pathways
  • MosaicLeaks shows research agents leak private data through the "mosaic effect" - individually harmless queries collectively reveal sensitive information
  • SFT overtraining causes entropy collapse - fine-tuning language models too aggressively destroys their ability to generate diverse responses
Open-Source Models Hit New Highs
Why this matters to you: The best free, downloadable AI models are now competitive with paid services from OpenAI, Google, and Anthropic - meaning organizations can run their own AI without ongoing subscription costs.

The gap between "free to download" and "expensive subscription" models continues to narrow. For many tasks, the open-source option is now the better choice.

  • GLM-5.2 (753B parameters, MIT license) leads open-weight benchmarks with 99.2 on AIME 2026 math and a 1M-token context window
  • DeepSeek-V4-Pro has nearly 3 million downloads in 30 days under MIT license
  • Cosmos 3 tops open-source image and video generation across Artificial Analysis benchmarks
  • North-Mini-Code runs competitive coding agent tasks with only 3B active parameters under Apache 2.0
Creative AI & Media
NVIDIA Cosmos 3 - Omnimodal Generation
  • What it does: Generates images, video, and audio from text prompts, and can also process all five modalities as input
  • Best-in-class open-source for both text-to-image and image-to-video generation
  • 294+ researchers contributed to this unified architecture
VidCRAFT3 - Unified Camera, Object, and Lighting Control
  • What it does: Lets you control camera movement, object motion, and lighting independently when generating video from images
  • Solves a real problem: Other tools create mismatched shadows when you change the camera angle
  • Research paper with code - not yet a consumer product
Reliable Neural-Codec TTS
  • What it does: Fixes the random failures in AI voice generation - silence, premature stops, and hallucinated words
  • Uses ASR self-verification to catch and correct errors before they reach the listener
  • Practical impact: Makes AI voiceover reliable enough for production use
Developer Tools & Infrastructure
AMP: Pooling Compute Across Clouds
  • What it does: AMP acts like a power grid for AI compute - pooling Graphics Processing Unit (GPU) capacity across multiple clouds and chip types so you can train and serve models without being locked to one provider
  • Founded by Anjney Midha, former a16z investor and early backer of Anthropic and Mistral
  • Targets 1.2 gigawatt capacity - enough to power a small city's worth of GPUs
  • Source
Microsoft FastContext - Exploration Subagent
  • What it does: A specialized 4B model that handles file exploration for coding agents, reducing the main agent's token usage by up to 60%
  • Novel architecture: Instead of one model doing everything, FastContext handles the READ/GREP/GLOB operations that account for 56% of tool-use turns
  • MIT licensed, drop-in compatible with existing agent pipelines
  • Source
Research & Models
A Neural Network That Matches Billion-Parameter Models With 462x Fewer Parameters
  • Ghost Attractor Networks use dynamical systems (basin attractors) instead of standard neural network layers for sequential generation
  • 2.3 million parameters match a 1.07-billion-parameter Diffusion Transformer with 32x lower latency
  • Practical implication: If this approach generalizes, it could make high-quality generation feasible on mobile phones
  • Source
Training Code Models With 98% Less Supervision
  • CODEBLOCK partitions code into syntactically coherent blocks and selectively trains only on the most useful ones
  • 1.9% of supervised tokens achieves full-token SFT quality - a 50x reduction in training signal
  • Why it matters: Dramatically reduces the cost and data needed to fine-tune code models
  • Source
A Non-Transformer Architecture That Beats Transformers
  • Frustrated Synchronization Networks (FSN) model token interactions as phase dynamics on a torus, inspired by physics
  • Outperforms Transformers on enwik8 (text compression benchmark) with a fundamentally different compute mechanism
  • Early-stage but significant - any architecture that beats Transformers at any task challenges the field's core assumption
  • Source
First AI Vision Model Runs Autonomously on a Satellite
  • NAVI-Orbital deployed Gemma 3 on a low-Earth orbit spacecraft on April 16, 2026
  • Classifies imagery and generates descriptions of Earth observations in plain English, replacing command-line interfaces
  • First demonstration of a vision-language model running entirely onboard a spacecraft
  • Source
Detecting Secret AI Training From GPU Power Meters
  • 98.2% accuracy distinguishing ML training workloads from other GPU tasks using only power/utilization telemetry
  • Privacy-preserving: Doesn't access model weights, training data, or hyperparameters
  • Governance implication: Could enable detecting unauthorized AI training at data centers without inspecting what's being trained
  • Source
Business & Industry
Noam Shazeer Leaves Google for OpenAI
  • The Transformer co-inventor joins OpenAI as Lead for AI Architecture Research
  • Left Google Gemini where he was co-lead after Google reacquired Character.AI for $2.7B
  • Biggest AI talent move in years - signals OpenAI's investment in fundamental architecture research
OpenAI Launches Enterprise Spend Controls
  • New analytics dashboards for monitoring AI usage across teams and departments
  • Updated spend controls give organizations visibility into ChatGPT Enterprise costs
  • Targets the "shadow AI" problem - companies discovering large unexpected bills
Craig Newmark Has Given Away Half a Billion Dollars
  • Craigslist founder donated ~$487 million to charity, including significant cybersecurity funding
  • Signed the Giving Pledge with wife Eileen Whelpley in 2024
  • Relevant to AI: Major funding for AI safety and cybersecurity organizations
GenAI in Education
The Quiet Reinvention of Assessment
What this means for you: AI is making superior testing methods affordable for the first time - oral exams, portfolio reviews, and real-time feedback that were previously too expensive to scale.
  • AI-powered oral assessments cost ~40 cents each - reviving medieval examination traditions at modern scale
  • Four key shifts: oral assessment revival, portfolio-based evaluation, real-time feedback loops, and competency-based progression
  • The economic barrier to good assessment has collapsed - the methods that education researchers always said were better are now affordable
Beyond LoRA: Alternatives to the Most Popular Fine-Tuning Method
  • LoRA captures 98.4% of PEFT mentions on HuggingFace Hub, but several alternatives outperform it
  • Lily delivers 54.9% vs LoRA's 53.2% on math reasoning accuracy
  • BEFT uses less memory (24.3GB vs LoRA's 25.5GB) for comparable performance
  • Practical guide for anyone fine-tuning models for education or other applications
MosaicLeaks: Can Your Research Agent Keep a Secret?
  • New benchmark for measuring privacy leakage in AI research agents that mix private documents with web search
  • The "mosaic effect" - individually harmless queries collectively reveal sensitive data
  • 1,001 multi-step scenarios across enterprise contexts including educational institutions
Surprising & Under-the-Radar
AI Safety Interventions Fail 96% of the Time

Researchers found that clamping harmful features in AI models using Sparse Autoencoders (SAEs) appears to work initially, but the AI recovers the suppressed behavior ~95.8% of the time through alternative neural pathways. This undermines a major proposed approach to AI safety.

A Satellite Is Running Google's AI Model in Orbit

NAVI-Orbital successfully ran Gemma 3 on a spacecraft in low-Earth orbit on April 16 - classifying images and generating descriptions autonomously, with zero ground-station involvement. First confirmed in-orbit VLM demonstration.

LLMs Can't Do Defeasible Reasoning

When tested on reasoning where new evidence can overturn previous conclusions (the way legal arguments or medical diagnoses work), LLMs scored only 23.5% while symbolic solvers hit 100%. A fundamental limitation, not a training data gap.

Diffusion Models Beat Autoregressive Models at Theorem Proving

Diffusion-Proof showed that diffusion models outperform autoregressive approaches for formal mathematical proof generation - surprising because theorem proving was considered a strength of sequential reasoning.

Best AI Agents Pass Only 59% of Pharmacology Tasks

TxBench-PP tested AI agents on pharmaceutical pipeline tasks and found the best scored only 59.3%. Drug discovery AI is further from useful than the hype suggests.

Signals to Track
Worth Watching
01
Decoupled Search Grounding Cuts Agent Search Costs by 91%
Separating search from reasoning in AI agents could make web-connected AI dramatically cheaper overnight.

A new MCP-compatible architecture separates search operations from the AI reasoning model, enabling independent optimization of each. In production e-commerce testing, it achieved 98%+ cost reduction with 68% latency improvement on warm cache, while maintaining 86.1% accuracy vs 87.7% for native search. Any team running agents with web search should evaluate this approach.

02
Ghost Attractor Networks Challenge the Parameter Arms Race
What if the best AI models don't need billions of parameters at all?

By modeling sequence generation as basin-attractor dynamics instead of standard neural network layers, researchers matched a 1.07B-parameter model with just 2.3M parameters. If this 462x compression ratio transfers to language models, it would upend the assumption that bigger models are better models. Still early-stage, but the physics-inspired approach is fundamentally different from anything else in the field.

03
GPU Telemetry Can Detect Unauthorized AI Training
Governments could monitor AI training at data centers without seeing what's being trained.

A detection system using only GPU power, utilization, and memory telemetry achieved 98.2% accuracy in identifying ML training workloads - without accessing any model weights, training data, or code. This could enable AI governance through hardware-level monitoring that doesn't require companies to disclose proprietary information.

04
OpenAnt Finds Vulnerabilities by Thinking Like an Attacker
An LLM-powered security tool that reduces the scope of vulnerability analysis by 97%.

OpenAnt decomposes codebases into analysis units, filters by reachability from external entry points (cutting 97% of the code), then uses adversarial verification to simulate attacker behavior. Combines static analysis with LLM reasoning in a way that could make automated security auditing practical for projects that can't afford dedicated security teams.

05
Sumi: The First Open Diffusion Language Model
A new type of language model that can edit any part of its output at any time - not just write left to right.

Sumi is a 7B-parameter uniform diffusion language model trained from scratch on 1.5 trillion tokens. Unlike standard language models that generate text one word at a time from left to right, diffusion models can update any token at any step. This could enable entirely new interaction patterns like parallel text editing and flexible infilling.

Top Repos Today
Rank yesterday: #5 - Rising ^
Stars today: +858  ·  📦 Total: 23,100
📜 License: Apache-2.0  ·  👤 By: research lab
🎯 Time to value: 10 minutes
What it is: Google Research's pretrained time-series foundation model. Uses a decoder-only architecture with 200M parameters, supports context lengths up to 16K tokens, and offers continuous quantile predictions. Integrated with BigQuery ML, Google Sheets, and Vertex AI. Why you'd want it: Drop-in time-series forecasting without training your own model - works for demand planning, anomaly detection, energy forecasting, and financial modeling out of the box.
✓ Pros✗ Cons
Production integrations with BigQuery, Sheets, and VertexSpecialized to time-series only
200M params is efficient enough to self-host with LoRA fine-tuningCommunity smaller than general LLM ecosystems
16K context handles long historical series without chunkingGoogle could restrict access if it conflicts with paid Vertex
GitHub - google-research/timesfm: TimesFM (Time Series Foundation Model) is a pretrained time-series foundation model developed by Google Research for time-series forecasting.
TimesFM (Time Series Foundation Model) is a pretrained time-series foundation model developed by Google Research for time-series forecasting. - google-research/timesfm
Rank yesterday: #4 - Holding steady ->
Stars today: +1,435  ·  📦 Total: 232,370
📜 License: MIT  ·  👤 By: company
🎯 Time to value: 15 minutes
What it is: Agentic skills framework that provides structured engineering workflows for AI coding agents. Covers brainstorming, implementation plans, subagent-driven development with two-stage code review, and enforced test-driven development. Works across 11+ coding platforms. Why you'd want it: Turn any AI coding agent into a disciplined engineering team with review gates and TDD - works with Claude Code, Cursor, Copilot, Gemini, and more.
✓ Pros✗ Cons
Platform-agnostic across 11+ coding agentsHeavy methodology overhead for quick scripts
Enforces TDD, code review, and git worktreesShell-based architecture can be brittle across OS
232K stars with active communityOpinionated workflow may conflict with team processes
GitHub - obra/superpowers: An agentic skills framework & software development methodology that works.
An agentic skills framework & software development methodology that works. - obra/superpowers
Rank yesterday: Not ranked - New entry
Stars today: +286  ·  📦 Total: 4,099
📜 License: Apache-2.0  ·  👤 By: research lab
🎯 Time to value: 30 minutes
What it is: Z.ai's large language model series featuring GLM-5.2 with a 1M-token context window, IndexShare architecture reducing per-token compute by 2.9x at long contexts, and state-of-the-art coding performance (81.0 on Terminal-Bench 2.1). Why you'd want it: A 1M-context coding LLM that can reason over entire codebases at once, with efficient inference that doesn't melt your GPU budget at long contexts.
✓ Pros✗ Cons
1M-token context with IndexShare keeping costs manageableNewer lab with smaller ecosystem
Top coding benchmarks (Terminal-Bench 81.0)Documentation still catching up
Flexible effort levels trade latency for qualitySelf-hosting large context still needs serious GPUs
GitHub - zai-org/GLM-5: GLM-5: From Vibe Coding to Agentic Engineering
GLM-5: From Vibe Coding to Agentic Engineering. Contribute to zai-org/GLM-5 development by creating an account on GitHub.
Rank yesterday: #1 - Falling v
Stars today: +2,308  ·  📦 Total: 6,974
📜 License: MIT  ·  👤 By: company
🎯 Time to value: 5 minutes
What it is: High-performance MCP server that indexes codebases into persistent knowledge graphs. Analyzes source code across 158 languages with hybrid LSP semantic type resolution. Single static binary, zero external dependencies. Why you'd want it: Give your AI coding agent a photographic memory of your entire codebase - indexes the Linux kernel in 3 minutes and answers structural queries in sub-millisecond.
✓ Pros✗ Cons
158-language support with semantic understandingWritten in C, contributing requires systems skill
Zero-dependency static binary, trivial deploymentKnowledge graph may miss dynamic/runtime relationships
Auto-configures with 11 coding agentsRelatively new project with evolving Application Programming Interface (API)
GitHub - DeusData/codebase-memory-mcp: High-performance code intelligence MCP server. Indexes codebases into a persistent knowledge graph — average repo in milliseconds. 158 languages, sub-ms queries, 99% fewer tokens. Single static binary, zero dependencies.
High-performance code intelligence MCP server. Indexes codebases into a persistent knowledge graph — average repo in milliseconds. 158 languages, sub-ms queries, 99% fewer tokens. Single static bin…
Rank yesterday: Not ranked - New entry
Stars today: +124  ·  📦 Total: 1,738
📜 License: Apache-2.0  ·  👤 By: individual
🎯 Time to value: 15 minutes
What it is: LLM-powered framework that transforms unstructured text into structured knowledge. Extracts information into lists, Pydantic models, knowledge graphs, and hypergraphs. Includes 80+ YAML templates for finance, legal, medical, and general domains. Why you'd want it: One-command pipeline to turn messy documents into queryable knowledge graphs - replaces weeks of custom NER/relation-extraction plumbing.
✓ Pros✗ Cons
80+ domain templates for immediate useSolo maintainer, bus-factor risk
Supports hypergraphs and spatio-temporal structuresQuality depends on underlying LLM
Works with OpenAI, Alibaba Cloud, and local modelsResearch-grade, not battle-tested at scale
GitHub - yifanfeng97/Hyper-Extract: Transform unstructured text into structured knowledge with LLMs. Graphs, hypergraphs, and spatio-temporal extractions — with one command.
Transform unstructured text into structured knowledge with LLMs. Graphs, hypergraphs, and spatio-temporal extractions — with one command. - yifanfeng97/Hyper-Extract
Rank yesterday: Not ranked - New entry
Stars today: +344  ·  📦 Total: 11,198
📜 License: Apache-2.0  ·  👤 By: company
🎯 Time to value: 10 minutes
What it is: Lightweight in-process vector database by Alibaba. Supports dense and sparse vector embeddings, native full-text search, hybrid retrieval, DiskANN indexing for billion-scale datasets, and write-ahead logging. SDKs for Python, Node.js, Go, Rust, and Dart. Why you'd want it: Embed a production-grade vector DB directly in your app process - no separate server, no network hops, billion-scale similarity search in milliseconds.
✓ Pros✗ Cons
Eliminates network latency of running a separate DBIn-process model needs app-level sharding to scale out
Battle-tested at Alibaba scale with DiskANNAlibaba roadmap may prioritize internal needs
Hybrid retrieval covers Retrieval-Augmented Generation (RAG) without a second search engineC++ core harder to debug than Python alternatives
GitHub - alibaba/zvec: A lightweight, lightning-fast, in-process vector database
A lightweight, lightning-fast, in-process vector database - alibaba/zvec
Rank yesterday: Not ranked - New entry
Stars today: +1,339  ·  📦 Total: 22,076
📜 License: MIT  ·  👤 By: company
🎯 Time to value: 5 minutes
What it is: All-in-one agentic coding platform as VS Code extension, JetBrains plugin, CLI, and cloud agent. Supports 500+ models with provider pricing at zero markup and mid-task model switching. Why you'd want it: A single open-source coding agent that works everywhere with 500+ models and no vendor lock-in on the AI provider.
✓ Pros✗ Cons
500+ models at zero markup, pick best per taskBreadth may sacrifice depth
Multi-platform: VS Code, JetBrains, CLI, cloudFast-moving API may break plugins
--auto flag enables unattended CI/CD generationCommercial entity could change licensing
GitHub - Kilo-Org/kilocode: Kilo is the all-in-one agentic engineering platform. Build, ship, and iterate faster with the most popular open source coding agent.
Kilo is the all-in-one agentic engineering platform. Build, ship, and iterate faster with the most popular open source coding agent. - Kilo-Org/kilocode
Rank yesterday: Not ranked - New entry
Stars today: +47  ·  📦 Total: 7,479
📜 License: LTX-2 Community  ·  👤 By: company
🎯 Time to value: 30 minutes
What it is: First DiT-based audio-video foundation model with synchronized audio and video generation. Supports text-to-video, image-to-video, lip dubbing, keyframe interpolation, and HDR pipelines. Includes LoRA fine-tuning and FP8 quantization. Why you'd want it: Generate synchronized audio and video from text in one model - no more stitching separate pipelines together for content creation.
✓ Pros✗ Cons
First model to generate synchronized audio AND videoCustom license, not truly open source
Rich pipeline variety including lip dubbing and HDRCompute-intensive for high quality
LoRA fine-tuning and FP8 for customizationLightricks controls weights and terms
GitHub - Lightricks/LTX-2: Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model.
Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model. - Lightricks/LTX-2
Top Models Today
Z.ai's 753B flagship overtakes the leaderboard with near-perfect math scores and a 1M-token context window under MIT license.
📥 Downloads (30d): 4,307  ·  📜 License: MIT
👤 By: Z.ai (Zhipu AI)  ·  🎯 Task: text-generation
📐 Size: 753B
What it is: GLM-5.2 is Z.ai's largest open-weight language model, a 753B-parameter Mixture-of-Experts architecture using IndexShare to cut per-token compute by 2.9x at long contexts. Why you'd want it: Frontier-class reasoning and coding in an MIT-licensed open model - 99.2 on AIME 2026 and 62.1 on SWE-bench Pro.
✓ Pros✗ Cons
MIT license, no regional restrictions753B parameters needs substantial GPU infrastructure
1M-token context with stable performanceBrand-new with limited community tooling
Near-perfect math (AIME 2026: 99.2)Low-precision quality unverified by third parties
zai-org/GLM-5.2 · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Native multimodal Mixture of Experts (MoE) with only 23B active parameters out of 428B total - efficient frontier-quality vision-language.
📥 Downloads (30d): 56,162  ·  📜 License: minimax-community
👤 By: MiniMax  ·  🎯 Task: image-text-to-text
📐 Size: 428B/23B active
What it is: Natively multimodal MoE model processing text, images, and video from training inception. Achieves 9x prefill and 15x decode speedups over its predecessor M2. Why you'd want it: Strong multimodal reasoning with an unusually efficient 23B active parameter count - serve frontier vision-language on fewer GPUs.
✓ Pros✗ Cons
Native multimodal fusion from initial trainingCustom license, not OSI-approved
Only 23B active despite 428B totalVideo capabilities not extensively benchmarked
Three reasoning modes for flexible tradeoffsLimited fine-tuning and tooling support
MiniMaxAI/MiniMax-M3 · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Trillion-parameter coding specialist with 32B active params and 30% fewer thinking tokens than its predecessor.
📥 Downloads (30d): 229,156  ·  📜 License: Modified MIT
👤 By: Moonshot AI  ·  🎯 Task: image-text-to-text
📐 Size: 1T/32B active
What it is: A 1T-parameter MoE model (384 experts, 8 active per token) specialized for agentic software engineering with Multi-head Latent Attention and a 400M MoonViT vision encoder. Why you'd want it: Purpose-built for coding agent workflows - 62.0 on Kimi Code Bench v2 with 30% fewer thinking tokens for cheaper inference.
✓ Pros✗ Cons
Best-in-class coding agent benchmarks1T total params needs significant infrastructure
30% token reduction vs K2.6Modified MIT adds restrictions
Native INT4 quantization supportGeneral conversation quality less proven
moonshotai/Kimi-K2.7-Code · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Groundbreaking diffusion-based decoding at 1,100+ tokens/sec with only 3.8B active parameters.
📥 Downloads (30d): 527,080  ·  📜 License: Apache-2.0
👤 By: Google DeepMind  ·  🎯 Task: image-text-to-text
📐 Size: 26B/3.8B active
What it is: Replaces autoregressive token generation with discrete diffusion sampling, generating 15-20 tokens per forward pass using bidirectional attention over generation canvases. Why you'd want it: If inference speed is your bottleneck, DiffusionGemma's parallel decoding is a paradigm shift - 1,100+ tok/s on consumer GPUs under Apache 2.0.
✓ Pros✗ Cons
1,100+ tokens/sec via diffusion decodingPerformance gaps on some reasoning benchmarks
Only 3.8B active, runs on consumer GPUsNew paradigm with less tooling support
Apache 2.0 with 35+ languagesFixed 256-token canvas may limit patterns
google/diffusiongemma-26B-A4B-it · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Nearly 3M downloads in 30 days - MIT-licensed frontier model with 1M-token context.
📥 Downloads (30d): 2,948,726  ·  📜 License: MIT
👤 By: DeepSeek  ·  🎯 Task: text-generation
📐 Size: 1.6T/49B active
What it is: A 1.6T-parameter MoE with Compressed Sparse Attention and Heavily Compressed Attention for efficient 1M-token context processing. Why you'd want it: Frontier performance (Codeforces 3206, LiveCodeBench 93.5%) you can self-host and modify freely under MIT license.
✓ Pros✗ Cons
MIT license on frontier-class model1.6T demands serious multi-node infrastructure
Exceptional coding benchmarksMixed precision may cause edge cases
Three reasoning modes for flexibilityData provenance has faced scrutiny
deepseek-ai/DeepSeek-V4-Pro · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
NVIDIA's compact 3B model for precise visual grounding with 2.5x throughput via Parallel Box Decoding.
📥 Downloads (30d): 183,093  ·  📜 License: NVIDIA (non-commercial)
👤 By: NVIDIA  ·  🎯 Task: visual-grounding
📐 Size: 3B
What it is: Vision-language model combining MoonViT and Qwen2.5-3B for referring expressions, dense detection, and point localization. Trained on 12M images with 785M+ bounding box annotations. Why you'd want it: Build apps that find and locate objects in images - from GUI automation to robotics - in a single compact model that runs on a consumer GPU.
✓ Pros✗ Cons
2.5x throughput via Parallel Box DecodingNon-commercial license only
GUI grounding, robotics, Optical Character Recognition (OCR) in one modelText-only output, no image generation
Runs on RTX 4090Requires NVIDIA Ampere or newer
nvidia/LocateAnything-3B · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Apache-2.0 coding agent with only 3B active parameters rivaling models 10x its size.
📥 Downloads (30d): 15,285  ·  📜 License: Apache-2.0
👤 By: Cohere Labs  ·  🎯 Task: text-generation
📐 Size: 30B/3B active
What it is: Decoder-only MoE (128 experts, 8 active) fine-tuned for agentic coding with 256K input / 64K output context, using cascaded SFT then RL with verifiable rewards. Why you'd want it: 67.6% on SWE-Bench Verified at only 3B active parameters - one of the most deployable purpose-built coding agents available.
✓ Pros✗ Cons
Apache 2.0 with only 3B active paramsSWE-Bench Pro gap (40.2%) on harder tasks
256K input / 64K output contextSpecialized, not general-purpose
Competitive with much larger modelsLimited adoption so far (15K downloads)
CohereLabs/North-Mini-Code-1.0 · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Microsoft's novel subagent that handles repo exploration, cutting main-agent token costs by 60%.
📥 Downloads (30d): 957  ·  📜 License: MIT
👤 By: Microsoft  ·  🎯 Task: text-generation
📐 Size: 4B
What it is: Specialized model designed as a repository-exploration subagent. Handles the READ/GLOB/GREP navigation that accounts for 56% of tool-use turns in coding workflows, returning compact file paths and line ranges. Why you'd want it: Offload repo exploration to a cheap 4B specialist for up to 60% token savings and 5.5% accuracy improvement on your main coding agent.
✓ Pros✗ Cons
60% token reduction, direct cost savingsBrand-new with very low adoption (957 downloads)
MIT license, 4B params trivial to deployOnly useful as a subagent, not standalone
Novel architecture applicable beyond codingDepends on main agent consuming its format
microsoft/FastContext-1.0-4B-SFT · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
AI Launches Today
Open-source policy-driven execution guard for AI-generated code
🔥 Upvotes: TBD  ·  👤 By: VELA team
💰 Pricing: Free (MIT)  ·  🏷 Category: Developer Tools / Security
VELA uses Firecracker micro-VMs with ~150ms cold start to sandbox AI-generated code with hardware-level isolation. HMAC capability tokens provide fine-grained, time-bound access control for filesystem, network, and environment variables. Verdict: A serious open-source contender for anyone running untrusted AI code in production - the Firecracker foundation is battle-tested at AWS Lambda scale.
VELA: Securely execute AI-generated & untrusted code | Product Hunt
Autonomous AI agents are writing and executing code, but running it on your host server is a massive security risk. Vela (powered by the Aegis runtime) solves this. It’s a policy-driven execution guard that uses Firecracker micro-VMs and HMAC capability tokens to safely run untrusted code. Get structured results, fine-grained filesystem/network restrictions, and a full JSONL audit trail. Open-source, MIT licensed, and built for LangChain/LlamaIndex.
AI TinkerCAD - Text-to-CAD for makers
🔥 Upvotes: TBD  ·  👤 By: CADAM team
💰 Pricing: Free (open source)  ·  🏷 Category: Creative AI
Generates parametric 3D models from natural language prompts and image references. Supports iterative refinement, dimension adjustments, and exports for 3D printing. Verdict: Fills a genuine gap - most text-to-3D tools generate meshes, not parametric CAD models. Makers and hardware engineers should try this.
CADAM: AI Tinkercad | Product Hunt
CADAM is an open source Text to CAD platform. Think of it like AI TinkerCAD. It generates parametric 3D models from natural language, with support for both text prompts and image references.
AI desktop cursor that lives in the Mac notch
🔥 Upvotes: TBD  ·  👤 By: Tine team
💰 Pricing: Free  ·  🏷 Category: AI Products
Mac-native AI assistant that sees your active apps, selections, and recent actions, suggesting drafts or autofilling directly in any application. Lives in the notch for always-on access. Verdict: Interesting UX innovation - moving AI from a chat window to an always-present screen companion. The "lives in the notch" design is clever.
Tine: An AI desktop cursor that does the work for you | Product Hunt
Tine is a second cursor for your Mac that lives in the notch. Unlike chatbots boxed in a window, it sees your actual screen the active app, your selection, your last move so there’s nothing to re-paste. Say the word and it drives the cursor across every app: posts to Slack, writes the note, runs the research, fills the form. It works through your real apps, logs every step, runs on-device, and hands control back the moment you touch the mouse.
Agentic report generation from raw data
🔥 Upvotes: TBD  ·  👤 By: LayerProof
💰 Pricing: Freemium  ·  🏷 Category: Business AI
Transforms raw materials into interactive, client-ready reports via conversational AI. Upload files, shape content through prompts, publish live web pages with a shareable URL. Verdict: Well-scoped product for consulting and agency workflows - the "conversational editing to publishable report" loop is practical.
LayerProof: Agentic reports your clients want to read | Product Hunt
Bristol turns your materials into interactive, agentic reports. Just drop files or data, shape the report by chatting, publish a live web page in one click. Built for agencies, freelancers, and consultants who want their reports actually read.
Snapshot
ProviderModelInput $/1MOutput $/1MContext
AnthropicClaude Fable 5$10.00$50.001M
AnthropicClaude Opus 4.8$5.00$25.001M
AnthropicClaude Sonnet 4.6$3.00$15.001M
AnthropicClaude Haiku 4.5$1.00$5.00200K
OpenAIGPT-5.5 Pro$5.00$30.001M
OpenAIGPT-4.1$2.00$8.001M
OpenAIo3$2.00$8.00200K
OpenAIo4-mini$1.10$4.40200K
GoogleGemini 3.1 Pro Preview$2.00$12.001M
GoogleGemini 3.5 Flash$1.50$9.001M
GoogleGemini 2.5 Pro$1.25$10.001M
GoogleGemini 2.5 Flash$0.30$2.501M
GroqLlama 3.3 70B$0.59$0.79128K
GroqLlama 4 Scout$0.11$0.34128K
GroqQwen3 32B$0.29$0.59131K
What this means: Note the Fable 5 pricing irony - Anthropic's most expensive model at $10/$50 per million tokens is currently inaccessible to most users due to the ongoing export controls. Meanwhile, Groq's open-source offerings continue to undercut proprietary providers by 10-50x on input costs, though with smaller context windows and less capable models. The sweet spot for cost-conscious teams remains Google's Gemini 2.5 Flash at $0.30/$2.50 with a full 1M context window.

Decoupling Search from Reasoning: A Vendor-Agnostic Grounding Architecture for LLM Agents
Emmanuel Aboah Boateng, Kyle MacDonald, Amardeep Kumar, Siddharth Kodwani, Sudeep Das - arXiv:2606.18947
What it claims: Real-time search grounding works best as an optimizable interface boundary, not a fixed model feature. Decoupled Search Grounding (DSG) separates search from reasoning through an MCP-compatible gateway, enabling independent control over routing, caching, and retrieval depth.

Key finding: 91% cost reduction on search operations while maintaining 86.1% accuracy (vs 87.7% native), and 98%+ cost reduction in production e-commerce with 68% latency improvement.

Why practitioners should care: Any team deploying LLM agents with real-time web search can dramatically cut costs by decoupling search from the model. The MCP-compatible architecture eliminates vendor lock-in and solves search-induced verbosity issues.

Subscribe to GenAI Secret Sauce newsletter and stay updated.

Don't miss anything. Get all the latest posts delivered straight to your inbox. It's free!
Great! Check your inbox and click the link to confirm your subscription.
Error! Please enter a valid email address!