Watch today's digest as a video summary (generated by NotebookLM)
> Previously: Kimi K2.6 appeared in yesterday's digest. Today's signal is the community verdict after widespread testing - 999 upvotes on r/LocalLLaMA calling it "a legit Opus 4.7 replacement."
Moonshot AI, a Beijing-based startup, released Kimi K2.6 - a Mixture-of-Experts (MoE, a design where only a fraction of the model activates per query) model with 1 trillion total parameters but just 32 billion active at any time.
The post titled "Kimi K2.6 is a legit Opus 4.7 replacement" drew 999 upvotes, with users reporting comparable performance on coding and creative tasks at a fraction of the cost. A separate post (237 upvotes) from a self-described "Opus 4.7 Max subscriber" announced they were switching to Kimi K2.6 for daily use.
- 384 experts with 8 routed plus 1 shared per query - making it efficient despite its massive total size
- Benchmarks rival the best closed models: SWE-Bench Pro 58.6, BrowseComp 83.2, Math Vision 93.2
- 68.6% win+tie rate against Gemini 3.1 Pro (Google's best model) in frontend design tasks
- Supports 4,000+ tool calls and 12+ hour continuous runs with 300 parallel sub-agents via "Claw Groups"
- Available immediately on vLLM, OpenRouter, Cloudflare Workers AI, and MLX with INT4 quantization (a compression technique that shrinks the model to fit on consumer hardware)
Amazon committed up to $25 billion in fresh funding to Anthropic (the company that makes Claude), structured as an initial $5 billion infusion followed by up to $20 billion tied to commercial milestones.
The deal represents a mutual lock-in: Amazon gets a guaranteed hyperscale customer for its chips, and Anthropic gets the computing power to train increasingly large models without building its own data centers.
- This adds to Amazon's previous $8 billion investment, bringing the total relationship to $33 billion
- Anthropic valued at $380 billion - roughly the market cap of Netflix
- Anthropic will spend over $100 billion on Amazon Web Services (AWS) over the next decade, securing up to 5 gigawatts (GW) of computing capacity
- Nearly 1 GW of Trainium2 and Trainium3 capacity (Amazon's custom AI chips) coming online by year-end
OpenAI launched ChatGPT Images 2.0 on April 21, with Sam Altman describing the upgrade as "equivalent to jumping from GPT-3 to GPT-5."
Simon Willison tested the model against Gemini's Nano Banana 2 using a "Where's Waldo"-style prompt. High-quality mode (approximately $0.40 per image) produced successful complex illustrations. However, he discovered a notable limitation: the models cannot reliably identify objects in their own generated images, fabricating details when asked.
- First image model with built-in reasoning - it thinks through composition and content before generating
- Generates up to 8 consistent images from a single prompt - useful for storyboards and design variations
- 2K resolution through the Application Programming Interface (API) with aspect ratios from ultra-wide (3:1) to ultra-tall (1:3)
- Dramatically improved text rendering in non-Latin scripts including Japanese, Korean, Hindi, and Bengali
- Web search integration means the model can reference current information while generating
Meta is installing tracking software called Model Capability Initiative (MCI) on US-based employees' work computers to capture mouse movements, clicks, keystrokes, and periodic screen snapshots.
The initiative is part of Meta's race against OpenAI and Anthropic to build AI agents (software that can perform tasks on a computer without human guidance). The disclosure comes as multiple companies are pursuing "computer use" capabilities - Anthropic and OpenAI both launched similar features in the past month.
- Data is used to train AI models that can navigate software interfaces and perform white-collar tasks autonomously
- The tool runs on work-related apps and websites - not personal browsing
- Framed as employee-driven model improvement for tasks like navigating dropdown menus and using keyboard shortcuts
- Meta says safeguards protect sensitive content and data won't be used beyond model training
Bloomberg reports that Anthropic's Mythos - a model deemed too dangerous for public release due to its unprecedented ability to discover and exploit security vulnerabilities - is being accessed by unauthorized users.
The situation highlights the tension between restricting dangerous AI capabilities and ensuring the right organizations have access for defense. Senator Nagel has called for access to be granted "on a level playing field."
- Anthropic provided Mythos to 40+ organizations for testing after deciding against public release
- CISA (Cybersecurity and Infrastructure Security Agency), the nation's top cyber defense agency, does not have access despite being responsible for protecting critical infrastructure
- The NSA is reportedly using Mythos despite a Pentagon blacklist of the model
- The model's existence was originally leaked through an unsecured public data store containing nearly 3,000 unpublished Anthropic assets
The pattern: frontier capability reaches open-source within weeks, then gets compressed to run on consumer hardware within days. The value proposition of $20-200/month AI subscriptions increasingly rests on convenience and integration, not raw capability.
- Kimi K2.6 matches frontier models on coding, math, and browsing benchmarks despite being fully open-source
- PrismML's Ternary Bonsai fits an 8B model in 1.75 GB - running at 82 tokens per second on an M4 Pro laptop and 27 tokens per second on iPhone 17
- Unsloth published Kimi K2.6 GGUF (a compression format for running models locally) within hours of release, with 66 upvotes celebrating immediate accessibility
- Gemma 4's hidden E4B variant found inside Android reportedly outperforms the publicly released version
Companies are racing to build AI that can use computers like humans do. To train those models, they need data about how humans actually use computers. The privacy implications of this data collection are only beginning to be examined.
- Meta's MCI tool captures mouse movements, keystrokes, and screenshots from employee work computers for AI training
- Claude Desktop silently registered browser automation hooks across seven Chromium-based browsers without user consent, enabling access to browser login state
- Anthropic restructured pricing to block third-party agent frameworks from subscription plans, pushing users toward pay-as-you-go billing that generates more usage data
- OpenAI's Codex now runs on macOS with computer use capabilities, adding another layer of system-level access
The trend across the industry is away from all-you-can-eat subscriptions and toward metered, usage-based billing. This benefits casual users who pay less but penalizes power users who relied on flat-rate plans for heavy agent workloads.
- Claude Pro no longer lists Claude Code as included (760 upvotes on r/ClaudeAI) and third-party agents like OpenClaw are blocked from using subscription limits
- Opus 4.7's new tokenizer uses up to 35% more tokens for the same text, effectively raising costs without changing the sticker price
- Enterprise Claude subscriptions shifted from a $200/user flat fee to $20/seat plus usage-based charges
- OpenAI dropped ChatGPT Business from $25 to $20 per seat while shifting Codex to token-based pricing
These cases share a common thread: governments are struggling to maintain oversight and access as AI capabilities outpace the bureaucratic processes that govern them.
- CISA does not have access to Anthropic's Mythos despite being the nation's lead cyber defense agency
- The NSA is using Mythos despite a Pentagon blacklist, creating a governance contradiction
- UK government is considering ending Palantir's 330 million pound NHS contract after only 3-4 of 13 capabilities were delivered
- Jeff Bezos's Project Prometheus raised $10 billion for physical-world AI, adding another powerful system that will require government oversight frameworks
- Reasoning-powered generation means the model plans composition before drawing, reducing the "random nonsense in the background" problem
- Non-Latin script support now includes Japanese, Korean, Hindi, and Bengali with legible typography
- Multi-image consistency generates up to 8 related images from one prompt, useful for design systems and storyboards
- Cost: approximately $0.40 per high-resolution image through the API
- ERNIE-Image and ERNIE-Image-Turbo both appeared in the top 10 trending models on HuggingFace
- 5,950 downloads for Turbo and 4,520 for the standard version in the first wave
- Unsloth published a GGUF version of ERNIE-Image-Turbo with 35,300 downloads, enabling local use
A Reddit user (414 upvotes on r/ClaudeAI) asked Claude to help set up monitoring on their NAS (Network-Attached Storage, a home server) and the AI identified a hidden cryptocurrency miner that had been running undetected for approximately two years. The post sparked discussion about AI's growing role in identifying security issues that slip past human attention during routine system administration.
A privacy researcher discovered that Claude Desktop placed Native Messaging manifest files across seven Chromium-based browsers (Chrome, Brave, Edge, Arc, Vivaldi, Opera) without user consent. The bridge enables sharing browser login state and extracting page data. Four of the seven browsers weren't even installed on the test machine. The researcher published the findings with 106 upvotes on r/ClaudeAI.
Multiple r/ClaudeAI posts describe Opus 4.7 as having "more ego than any prior model," with 315 upvotes on a post titled "I genuinely hate the conversation tone" and 150 upvotes on a post warning that Claude can now run shell commands with sandboxing disabled. One user summarized it: "Claude said, 'So am I.'"
A screenshot of OpenRouter usage rankings (185 upvotes on r/LocalLLaMA) showed that non-coding use cases dominate token consumption. This challenges the narrative that AI tools are primarily developer productivity aids.
A r/MachineLearning post (57 upvotes) documents building a 235-million parameter diffusion language model (DLM) on a single RTX 5080. DLMs generate text by starting with noise and refining it, rather than predicting one word at a time like standard models. If this approach scales, it could offer faster parallel generation for certain tasks. The fact that a solo developer can build one from scratch signals the architecture is maturing.
Ternary Bonsai's 8B model at 1.75 GB scoring 75.5 on average benchmarks is remarkable, but the real question is whether ternary quantization works at 70B+ parameters. If it does, models that currently require server clusters could run on gaming PCs. PrismML is Apache 2.0 licensed and actively publishing, so we should know within months.
Project Prometheus is building models that simulate material fatigue and aerodynamics, fundamentally different from the text-based AI that dominates today. With 120+ hires from major AI labs and a potential $100 billion manufacturing vehicle, this could become the first serious attempt to apply frontier AI to physical-world problems at scale.
No single company is doing anything illegal. But the pattern - track how humans use computers, register hooks in browsers, meter every interaction - creates an ecosystem where enormous amounts of behavioral data flows to AI companies. The privacy frameworks governing this data are years behind the technology collecting it.
QIMMA evaluates LLM performance specifically on Arabic language tasks, while NVIDIA's Nemotron Personas project focuses on grounding Korean AI agents in real demographics. These are early signals that the next phase of AI development will prioritize linguistic and cultural specificity rather than treating non-English languages as an afterthought.
📜 License: MIT · 👤 By: Startup
🎯 Time to value: 10 minutes
| ✓ Pros | ✗ Cons |
|---|---|
| Free alternative to Bloomberg/Refinitiv | Data sources may be less comprehensive |
| Terminal-native with rich visualizations | Steep learning curve for non-CLI users |
| Active development with rapid feature adds | Still pre-1.0 stability |
📜 License: MIT · 👤 By: Independent developer
🎯 Time to value: 30 minutes
| ✓ Pros | ✗ Cons |
|---|---|
| No cameras needed - pure WiFi sensing | Accuracy varies with environment layout |
| Works through walls and obstacles | Requires compatible WiFi hardware |
| Vital sign monitoring without wearables | Complex calibration for precise readings |
📜 License: MPL-2.0 · 👤 By: Mozilla Foundation
🎯 Time to value: 5 minutes
| ✓ Pros | ✗ Cons |
|---|---|
| Choose any AI provider or run locally | Still in early development |
| No vendor lock-in or data sharing | Thunderbird UI may feel dated |
| Mozilla Foundation backing | Limited model options vs cloud-native tools |
📜 License: MIT · 👤 By: Zilliz Tech (company)
🎯 Time to value: 10 minutes
| ✓ Pros | ✗ Cons |
|---|---|
| 40% token reduction claimed | Requires Zilliz Cloud or Milvus setup |
| Incremental indexing stays fresh | Additional dependency for code search |
| AST-based intelligent chunking | Vector DB adds infrastructure complexity |
📜 License: MIT · 👤 By: Microsoft
🎯 Time to value: 15 minutes
| ✓ Pros | ✗ Cons |
|---|---|
| Well-structured beginner curriculum | May lag behind latest agent frameworks |
| Hands-on Jupyter exercises | Microsoft-centric tool choices |
| 57k stars = strong community | Some lessons assume Azure familiarity |
📜 License: MIT · 👤 By: University research group
🎯 Time to value: 15 minutes
| ✓ Pros | ✗ Cons |
|---|---|
| Handles all document types natively | Academic origin may mean rough edges |
| Unified pipeline reduces integration work | Performance at scale not yet proven |
| Active development and community | Documentation still catching up |
📜 License: MIT · 👤 By: Independent developer
🎯 Time to value: 10 minutes
| ✓ Pros | ✗ Cons |
|---|---|
| Multi-platform aggregation | Requires API keys for each platform |
| Intelligent trend detection | Alert fatigue if not tuned carefully |
| 53k stars with active community | Resource-intensive for real-time monitoring |
👤 By: Alibaba Cloud · 🎯 Task: Image-Text-to-Text
📐 Size: 36B (3B active)
| ✓ Pros | ✗ Cons |
|---|---|
| Only 3B active params = fast inference | Chinese company origin may concern some |
| Native image understanding | Smaller active size limits complex reasoning |
| Apache 2.0 = full commercial use | MoE can be unpredictable on edge cases |

👤 By: Moonshot AI (startup) · 🎯 Task: Image-Text-to-Text
📐 Size: 1.1T (32B active)
| ✓ Pros | ✗ Cons |
|---|---|
| Matches frontier models on key benchmarks | 1T total params needs significant hardware |
| Free and open under Apache 2.0 | Newer model with less community testing |
| 256K context with multimodality | MoE routing can produce inconsistent outputs |

👤 By: Unsloth (open-source project) · 🎯 Task: Image-Text-to-Text
📐 Size: 35B
| ✓ Pros | ✗ Cons |
|---|---|
| Runs on consumer hardware via llama.cpp | Some quality loss from quantization |
| Multiple quant levels available | Still needs 8-16GB RAM minimum |
| Most downloaded version of the model | GGUF format updates may lag original |

👤 By: Google · 🎯 Task: Image-Text-to-Text
📐 Size: 33B
| ✓ Pros | ✗ Cons |
|---|---|
| 4.47M downloads = proven community | Gemma license more restrictive than Apache |
| 256K context with native vision | 33B requires decent hardware |
| Built from Gemini 3 research | Safety filtering can be overly cautious |

👤 By: MiniMax AI (startup) · 🎯 Task: Text Generation
📐 Size: 229B
| ✓ Pros | ✗ Cons |
|---|---|
| 229B params = strong reasoning | Too large for consumer hardware |
| Apache 2.0 fully open license | Less community support than Qwen/Gemma |
| 358k downloads show real adoption | Chinese company origin |

💰 Pricing: Freemium · 🏷 Category: Analytics

💰 Pricing: Paid · 🏷 Category: Productivity

| Provider | Model | Input $/1M | Output $/1M | Context |
|---|---|---|---|---|
| Anthropic | Claude Opus 4.7 | $5.00 | $25.00 | 1M tokens |
| Anthropic | Claude Sonnet 4.6 | $3.00 | $15.00 | 200K tokens |
| Anthropic | Claude Haiku 4.5 | $1.00 | $5.00 | 200K tokens |
| OpenAI | GPT-5.4 | $2.50 | $15.00 | 128K tokens |
| OpenAI | o3 | $2.00 | $8.00 | 200K tokens |
| OpenAI | o4-mini | $1.10 | $4.40 | 200K tokens |
| Gemini 3 Pro | $2.00 | $12.00 | 2M tokens | |
| Gemini 2.5 Flash-Lite | $0.10 | $0.40 | 1M tokens | |
| Groq | Llama 3.3 70B | $0.59 | $0.79 | 128K tokens |
| Groq | Llama 3.1 8B | $0.05 | $0.08 | 128K tokens |
Notable: Anthropic now offers Claude Mythos Preview with a 1M token context at standard pricing, matching Google's long-context advantage. Opus 4.7 also added a "fast mode" at 6x standard rates ($30/$150 per million tokens) for applications needing lower latency.
Key finding: The architecture achieves state-of-the-art results on algorithmic reasoning tasks while maintaining the learning flexibility of standard neural networks.
Why practitioners should care: If neural computers can reliably learn and execute algorithms, it could eliminate the need for many hand-coded post-processing steps in AI pipelines. The practical impact would be AI systems that are both more capable and more predictable on structured tasks.