Front

BarrierSteer: LLM Safety via Learning Barrier Steering 8 sources

via export.arxiv.org · Global · 7d ago

tool-releaseresearch-papersafety-researchmodel-releaseproduct-launchinfrastructure
PAC Learning with Bandit Feedback: Sharp Sample Complexity in the Realizable Setting 5 sources

via export.arxiv.org · Global · 8d ago

research-paper
Why Agentic Theorem Prover Works: A Statistical Provability Theory of Mathematical Reasoning Models 7 sources

via export.arxiv.org · Global · 8d ago

research-papercontroversy
Partner-Aware Hierarchical Skill Discovery for Robust Human-AI Collaboration 4 sources

via export.arxiv.org · Global · 8d ago

applicationresearch-papersafety-researchbenchmarktool-releasecommentary
CurveRL: Principled Distribution-Aware Context Reweighting for LLM Reasoning 2 sources

via export.arxiv.org · Global · 10d ago

research-papertool-release
Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security 2 sources

via export.arxiv.org · Global · 10d ago

safety-researchresearch-paperapplicationmodel-releasetool-release
How Much Thinking is Enough? Quantifying and Understanding Redundancy in LLM Reasoning 2 sources

via export.arxiv.org · Global · 10d ago

research-paperregulationbenchmarkinfrastructuretool-release
VideoTemp-o3: Harmonizing Temporal Grounding and Video Understanding in Agentic Thinking-with-Videos 3 sources

via export.arxiv.org · Global · 10d ago

research-paperbenchmarktool-releasecommentary
VeriTrace: Evolving Mental Models for Deep Research Agents 4 sources

via export.arxiv.org · Global · 10d ago

applicationregulationtool-releasemodel-releaseresearch-paperbenchmark
Lipschitz Optimization for Formal Verification of Homographies 2 sources

via export.arxiv.org · Global · 10d ago

safety-researchapplicationresearch-papermodel-releaseproduct-launchbenchmarktool-release
Polymorphism Is Rotation: Operational Mechanistic Interpretability from a Two-Layer Transformer to Pythia-70m 2 sources

via export.arxiv.org · Global · 10d ago

safety-researchresearch-papertool-release
AGZO: Activation-Guided Zeroth-Order Optimization for LLM Fine-Tuning 2 sources

via export.arxiv.org · Global · 10d ago

research-papermodel-releaseproduct-launchbenchmarktool-release
A Spectral Framework for Graph Neural Operators: Convergence Guarantees and Tradeoffs 7 sources

via export.arxiv.org · Global · 10d ago

tool-releaseresearch-paper
Explainable Retinal Imaging for Prediction of Multi-Organ Dysfunction in Type 2 Diabetes 2 sources

via export.arxiv.org · Global · 10d ago

tool-releasemodel-releaseresearch-paperproduct-launchsafety-researchcommentary
Real vs. Semi-Simulated: Rethinking Evaluation for Treatment Effect Estimation 2 sources

via export.arxiv.org · Global · 10d ago

research-paperbenchmark
LangFlash: Feed-forward 3D Language Gaussian Splatting from Sparse Unposed Images 6 sources

via export.arxiv.org · Global · 10d ago

tool-releasemodel-releaseresearch-paperproduct-launch
When Search Becomes Memory: Turning Robot Design Trials into Transferable Skills

via export.arxiv.org · Global · 10d ago

applicationtool-releasemodel-releaseresearch-paper
Adaptive Graph Refinement and Label Propagation with LLMs for Cost-Effective Entity Resolution

via export.arxiv.org · Global · 10d ago

research-paperbenchmarktool-release
Profiling-Driven Adaptive Distributed Transformer Inference on Embedded Edge Deployment

via export.arxiv.org · Global · 10d ago

infrastructureresearch-paper
SomaliBench Eval: Measuring English-to-Somali Refusal Gaps in Open-Weight Language Models

via export.arxiv.org · Global · 10d ago

model-releaseresearch-papercontroversyregulationbenchmark
Beyond Killer Robots: General AI Attitudes and Public Support for Military AI in Nine Countries

via export.arxiv.org · Global · 10d ago

applicationcommentaryresearch-paperregulation
APT-Agent: Automated Penetration Testing using Large Language Models

via export.arxiv.org · Global · 10d ago

applicationresearch-papertool-release
World-State Transformations for Neuro-symbolic Interactive Storytelling

via export.arxiv.org · Global · 10d ago

research-papertool-release
The Path Matters: Learning a Token-Commitment Policy for Diffusion Language Models

via export.arxiv.org · Global · 10d ago

regulationmodel-releaseresearch-paperproduct-launchtool-release
Mix-MoE: Improving Multilingual Machine Translation of Large Language Models through Mixed MoEs

via export.arxiv.org · Global · 10d ago

tool-releaseresearch-paper
How Many Tools Should an LLM Agent See? A Chance-Corrected Answer

via export.arxiv.org · Global · 10d ago

applicationresearch-paperregulation
PILOT: Policy-Informed Learned Optimization for Adaptive Deep Network Training

via export.arxiv.org · Global · 10d ago

regulationtool-releasemodel-releaseresearch-paperproduct-launch
Momentum Streams for Optimizer-Inspired Transformers

via export.arxiv.org · Global · 10d ago

research-paper
ScaleAcross Explorer: Exploring Communication Optimization for Scale-Across AI Model Training

via export.arxiv.org · Global · 10d ago

research-paperbenchmarkinfrastructure
Benchmarking Patent Embeddings: A Multi-Task Evaluation of 22 Models Across Retrieval, Classification, and Clustering

via export.arxiv.org · Global · 10d ago

research-papertool-releasemodel-releaseproduct-launchsafety-researchbenchmarkcommentary
Unlocking Apple's Private Cloud Compute: An Analysis of Privacy-Preserving Artificial Intelligence

via export.arxiv.org · Global · 10d ago

research-papercommentarymodel-releaseproduct-launchtool-release
Agent-ToM: Learning to Monitor Autonomous LLM Agents via Theory-of-Mind Reasoning

via export.arxiv.org · Global · 10d ago

applicationresearch-papersafety-researchcommentarybenchmarkinfrastructuretool-release
Teaching Through Analogies: A Modular Pipeline for Educational Analogy Generation

via export.arxiv.org · Global · 10d ago

research-paperbenchmarkcommentary
PromptAudit: Auditing Prompt Sensitivity in LLM-Based Vulnerability Detection

via export.arxiv.org · Global · 10d ago

research-papertool-release
Overcoming "Physics Shock" in Earth Observation A Heteroscedastic Uncertainty Framework for PINN-based Flood Inference

via export.arxiv.org · Global · 10d ago

tool-releaseinfrastructureresearch-paper
The Time is Here for Just-in-Time Systems: Challenges and Opportunities

via export.arxiv.org · Global · 10d ago

applicationresearch-paperbenchmark
Feature Lottery? A Bifurcation Theory of Concept Emergence

via export.arxiv.org · Global · 10d ago

research-papersafety-research
Mode-as-Sequence: Translating Multimodal Motion Prediction into Unified Sequential Mode Modeling

via export.arxiv.org · Global · 10d ago

research-papertool-releaseinfrastructure
Machine Intelligence that Understands Visual and Linguistic Information and Interacts with Humans and Environments

via export.arxiv.org · Global · 10d ago

applicationresearch-paperbenchmarkinfrastructuretool-release
A World Model of Radiologist Reading for Medical Image Representation Learning

via export.arxiv.org · Global · 10d ago

research-paperbenchmarkinfrastructure
AI-Driven Controlled Environment Agriculture as Resilient Infrastructure for U.S. Fresh-Produce Supply Chains

via export.arxiv.org · Global · 10d ago

research-papertool-releaseregulationapplicationcommentary
High-Risk AI Systems and the Problem of Identity in the European AI Act

via export.arxiv.org · Global · 10d ago

regulationresearch-papertool-release
Authority Signals in Claude AI Health Citations: A Descriptive Analysis Using the Authority Signals Framework

via export.arxiv.org · Global · 10d ago

tool-releasecommentaryresearch-paper
Learning to Search and Searching to Learn for Generalization in Planning

via export.arxiv.org · Global · 10d ago

research-paperregulationtool-release
Security of OpenClaw Agents: Fundamentals, Attacks, and Countermeasures

via export.arxiv.org · Global · 10d ago

applicationtool-releaseresearch-paper
Context-CoT: Enhancing Context Learning via High-Quality Reasoning Synthesis

via export.arxiv.org · Global · 10d ago

research-paperbenchmark
FrontierOR: Benchmarking LLMs' Capacity for Efficient Algorithm Design in Large-Scale Optimization

via export.arxiv.org · Global · 10d ago

research-paperapplicationbenchmarkmodel-releaseproduct-launchtool-release
Meta-Agent: From Task Descriptions to Verified Multi-Agent Systems

via export.arxiv.org · Global · 10d ago

applicationresearch-papertool-release
DarkForest: Less Talk, Higher Accuracy for Multi-Agent LLMs

via export.arxiv.org · Global · 10d ago

applicationresearch-paperregulationbenchmarkinfrastructuretool-release
Representation Without Control: Testing the Realization Effect in Language Models

via export.arxiv.org · Global · 10d ago

research-paper

Generated 2026-06-05T21:04:39Z · 50 stories shown.

BarrierSteer: LLM Safety via Learning Barrier Steering 8 sources

PAC Learning with Bandit Feedback: Sharp Sample Complexity in the Realizable Setting 5 sources

Why Agentic Theorem Prover Works: A Statistical Provability Theory of Mathematical Reasoning Models 7 sources

Partner-Aware Hierarchical Skill Discovery for Robust Human-AI Collaboration 4 sources

CurveRL: Principled Distribution-Aware Context Reweighting for LLM Reasoning 2 sources

Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security 2 sources

How Much Thinking is Enough? Quantifying and Understanding Redundancy in LLM Reasoning 2 sources

VideoTemp-o3: Harmonizing Temporal Grounding and Video Understanding in Agentic Thinking-with-Videos 3 sources

VeriTrace: Evolving Mental Models for Deep Research Agents 4 sources

Lipschitz Optimization for Formal Verification of Homographies 2 sources

Polymorphism Is Rotation: Operational Mechanistic Interpretability from a Two-Layer Transformer to Pythia-70m 2 sources

AGZO: Activation-Guided Zeroth-Order Optimization for LLM Fine-Tuning 2 sources

A Spectral Framework for Graph Neural Operators: Convergence Guarantees and Tradeoffs 7 sources

Explainable Retinal Imaging for Prediction of Multi-Organ Dysfunction in Type 2 Diabetes 2 sources

Real vs. Semi-Simulated: Rethinking Evaluation for Treatment Effect Estimation 2 sources

LangFlash: Feed-forward 3D Language Gaussian Splatting from Sparse Unposed Images 6 sources