-
BarrierSteer: LLM Safety via Learning Barrier Steering 8 sources
tool-releaseresearch-papersafety-researchmodel-releaseproduct-launchinfrastructure
-
PAC Learning with Bandit Feedback: Sharp Sample Complexity in the Realizable Setting 5 sources
research-paper
-
Why Agentic Theorem Prover Works: A Statistical Provability Theory of Mathematical Reasoning Models 7 sources
research-papercontroversy
-
Partner-Aware Hierarchical Skill Discovery for Robust Human-AI Collaboration 4 sources
applicationresearch-papersafety-researchbenchmarktool-releasecommentary
-
CurveRL: Principled Distribution-Aware Context Reweighting for LLM Reasoning 2 sources
research-papertool-release
-
Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security 2 sources
safety-researchresearch-paperapplicationmodel-releasetool-release
-
How Much Thinking is Enough? Quantifying and Understanding Redundancy in LLM Reasoning 2 sources
research-paperregulationbenchmarkinfrastructuretool-release
-
VideoTemp-o3: Harmonizing Temporal Grounding and Video Understanding in Agentic Thinking-with-Videos 3 sources
research-paperbenchmarktool-releasecommentary
-
VeriTrace: Evolving Mental Models for Deep Research Agents 4 sources
applicationregulationtool-releasemodel-releaseresearch-paperbenchmark
-
Lipschitz Optimization for Formal Verification of Homographies 2 sources
safety-researchapplicationresearch-papermodel-releaseproduct-launchbenchmarktool-release
-
Polymorphism Is Rotation: Operational Mechanistic Interpretability from a Two-Layer Transformer to Pythia-70m 2 sources
safety-researchresearch-papertool-release
-
AGZO: Activation-Guided Zeroth-Order Optimization for LLM Fine-Tuning 2 sources
research-papermodel-releaseproduct-launchbenchmarktool-release
-
A Spectral Framework for Graph Neural Operators: Convergence Guarantees and Tradeoffs 7 sources
tool-releaseresearch-paper
-
Explainable Retinal Imaging for Prediction of Multi-Organ Dysfunction in Type 2 Diabetes 2 sources
tool-releasemodel-releaseresearch-paperproduct-launchsafety-researchcommentary
-
Real vs. Semi-Simulated: Rethinking Evaluation for Treatment Effect Estimation 2 sources
research-paperbenchmark
-
LangFlash: Feed-forward 3D Language Gaussian Splatting from Sparse Unposed Images 6 sources
tool-releasemodel-releaseresearch-paperproduct-launch
-
When Search Becomes Memory: Turning Robot Design Trials into Transferable Skills
applicationtool-releasemodel-releaseresearch-paper
-
Adaptive Graph Refinement and Label Propagation with LLMs for Cost-Effective Entity Resolution
research-paperbenchmarktool-release
-
Profiling-Driven Adaptive Distributed Transformer Inference on Embedded Edge Deployment
infrastructureresearch-paper
-
SomaliBench Eval: Measuring English-to-Somali Refusal Gaps in Open-Weight Language Models
model-releaseresearch-papercontroversyregulationbenchmark
-
Beyond Killer Robots: General AI Attitudes and Public Support for Military AI in Nine Countries
applicationcommentaryresearch-paperregulation
-
APT-Agent: Automated Penetration Testing using Large Language Models
applicationresearch-papertool-release
-
World-State Transformations for Neuro-symbolic Interactive Storytelling
research-papertool-release
-
The Path Matters: Learning a Token-Commitment Policy for Diffusion Language Models
regulationmodel-releaseresearch-paperproduct-launchtool-release
-
Mix-MoE: Improving Multilingual Machine Translation of Large Language Models through Mixed MoEs
tool-releaseresearch-paper
-
How Many Tools Should an LLM Agent See? A Chance-Corrected Answer
applicationresearch-paperregulation
-
PILOT: Policy-Informed Learned Optimization for Adaptive Deep Network Training
regulationtool-releasemodel-releaseresearch-paperproduct-launch
-
Momentum Streams for Optimizer-Inspired Transformers
research-paper
-
ScaleAcross Explorer: Exploring Communication Optimization for Scale-Across AI Model Training
research-paperbenchmarkinfrastructure
-
Benchmarking Patent Embeddings: A Multi-Task Evaluation of 22 Models Across Retrieval, Classification, and Clustering
research-papertool-releasemodel-releaseproduct-launchsafety-researchbenchmarkcommentary
-
Unlocking Apple's Private Cloud Compute: An Analysis of Privacy-Preserving Artificial Intelligence
research-papercommentarymodel-releaseproduct-launchtool-release
-
Agent-ToM: Learning to Monitor Autonomous LLM Agents via Theory-of-Mind Reasoning
applicationresearch-papersafety-researchcommentarybenchmarkinfrastructuretool-release
-
Teaching Through Analogies: A Modular Pipeline for Educational Analogy Generation
research-paperbenchmarkcommentary
-
PromptAudit: Auditing Prompt Sensitivity in LLM-Based Vulnerability Detection
research-papertool-release
-
Overcoming "Physics Shock" in Earth Observation A Heteroscedastic Uncertainty Framework for PINN-based Flood Inference
tool-releaseinfrastructureresearch-paper
-
The Time is Here for Just-in-Time Systems: Challenges and Opportunities
applicationresearch-paperbenchmark
-
Feature Lottery? A Bifurcation Theory of Concept Emergence
research-papersafety-research
-
Mode-as-Sequence: Translating Multimodal Motion Prediction into Unified Sequential Mode Modeling
research-papertool-releaseinfrastructure
-
Machine Intelligence that Understands Visual and Linguistic Information and Interacts with Humans and Environments
applicationresearch-paperbenchmarkinfrastructuretool-release
-
A World Model of Radiologist Reading for Medical Image Representation Learning
research-paperbenchmarkinfrastructure
-
AI-Driven Controlled Environment Agriculture as Resilient Infrastructure for U.S. Fresh-Produce Supply Chains
research-papertool-releaseregulationapplicationcommentary
-
High-Risk AI Systems and the Problem of Identity in the European AI Act
regulationresearch-papertool-release
-
Authority Signals in Claude AI Health Citations: A Descriptive Analysis Using the Authority Signals Framework
tool-releasecommentaryresearch-paper
-
Learning to Search and Searching to Learn for Generalization in Planning
research-paperregulationtool-release
-
Security of OpenClaw Agents: Fundamentals, Attacks, and Countermeasures
applicationtool-releaseresearch-paper
-
Context-CoT: Enhancing Context Learning via High-Quality Reasoning Synthesis
research-paperbenchmark
-
FrontierOR: Benchmarking LLMs' Capacity for Efficient Algorithm Design in Large-Scale Optimization
research-paperapplicationbenchmarkmodel-releaseproduct-launchtool-release
-
Meta-Agent: From Task Descriptions to Verified Multi-Agent Systems
applicationresearch-papertool-release
-
DarkForest: Less Talk, Higher Accuracy for Multi-Agent LLMs
applicationresearch-paperregulationbenchmarkinfrastructuretool-release
-
Representation Without Control: Testing the Realization Effect in Language Models
research-paper
Generated 2026-06-05T21:04:39Z · 50 stories shown.