AutoXiv

Efficient LLM Training.

Research on improving reinforcement learning, reasoning generalization, and optimization efficiency for training and fine-tuning large language models under resource constraints.

13 papers

Papers.

260421.0038
MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval
Alshammari · Wen · Zainal +5
MathNet is a large-scale, multilingual dataset of 30,676 Olympiad-level math problems from 47 countries spanning two decades, designed to benchmark both mathematical reasoning in generative models and mathematical retrieval in embedding systems. The benchmark reveals that even state-of-the-art models struggle with these problems, with top models achieving only 78.4% accuracy, and that retrieval quality significantly impacts retrieval-augmented generation performance.
Formal Sciences
260421.0041
When Can LLMs Learn to Reason with Weak Supervision?
Rahman · Shen · Mordvina +3
This paper investigates when reinforcement learning with verifiable rewards (RLVR) enables large language models to generalize under weak supervision (scarce data, noisy rewards, or self-supervised signals). The key finding is that models generalize when they exhibit prolonged pre-saturation training dynamics, which is predicted by reasoning faithfulness—the degree to which intermediate reasoning steps logically support final answers.
Formal Sciences
260421.0042
Back into Plato's Cave: Examining Cross-modal Representational Convergence at Scale
Koepke · Zverev · Ginosar +1
This paper challenges the Platonic Representation Hypothesis by showing that apparent alignment between vision and language models is an artifact of small-scale evaluation. When tested at scale with millions of samples and realistic many-to-many settings, cross-modal alignment degrades substantially, suggesting different modalities learn different representations of reality.
Formal Sciences
260421.0049
FUSE: Ensembling Verifiers with Zero Labeled Data
Lee · Ma · Zhao +4
FUSE is a method that combines multiple imperfect AI verifiers to better judge model outputs without needing any labeled training data. It matches or beats semi-supervised methods across diverse benchmarks by controlling how verifiers depend on each other using spectral algorithms.
Formal Sciences
260421.0051
Duality for the Adversarial Total Variation
Bungert · Schmitt
This paper establishes a mathematical duality framework for adversarial total variation, showing that adversarial training of binary classifiers can be understood through nonlocal calculus of variations. The work provides rigorous characterizations of subdifferentials using dual representations and integration by parts formulas in both metric and Euclidean spaces.
Formal Sciences
260421.0052
IDOBE: Infectious Disease Outbreak forecasting Benchmark Ecosystem
Adiga · Chou · Chiranth +7
IDOBE is a standardized benchmark dataset containing over 10,000 infectious disease outbreak segments from a century of surveillance data across 13 diseases, designed to evaluate and compare epidemic forecasting methods. The authors test 11 baseline models and find MLP-based methods perform most robustly, with statistical methods excelling in pre-peak phases.
Formal Sciences
260421.0054
Too Correct to Learn: Reinforcement Learning on Saturated Reasoning Data
Liang · Zhou · Lu +3
This paper addresses a critical problem in reinforcement learning for large language models: when base models are already very accurate on training benchmarks, standard RL methods fail because there aren't enough errors to learn from, causing models to collapse into repetitive solutions. The authors propose CUTS, a novel sampling strategy that maintains solution diversity even when models are highly accurate, improving generalization on challenging out-of-domain math problems by up to 15.1%.
Formal Sciences
260421.0056
Faster by Design: Interactive Aerodynamics via Neural Surrogates Trained on Expert-Validated CFD
Thumiger · Bartezzaghi · Rigotti +5
This paper introduces GIST, a neural network surrogate that predicts race-car aerodynamics 10,000× faster than traditional CFD simulations while maintaining accuracy suitable for early-stage design. The work includes a new high-fidelity dataset of LMP2 race-car aerodynamics validated by industry experts at Dallara, enabling interactive design exploration in motorsport.
Formal Sciences
260421.0060
Train Separately, Merge Together: Modular Post-Training with Mixture-of-Experts
Morrison · Adhikesaven · Bhagia +3
BAR (Branch-Adapt-Route) trains separate domain experts independently and combines them via Mixture-of-Experts, enabling modular updates to language models without retraining everything or degrading existing capabilities. This approach matches monolithic retraining performance while scaling linearly instead of quadratically when adding new domains.
Formal Sciences
260421.0067
AutoPPA: Automated Circuit PPA Optimization via Contrastive Code-based Rule Library Learning
Li · Jin · Huang +14
AutoPPA is an automated framework for optimizing circuit performance, power, and area (PPA) that learns optimization rules by contrasting code pairs rather than relying on manual rules. It outperforms existing methods including manual optimization and state-of-the-art automated approaches.
Formal Sciences
260421.0068
ProtoCLIP: Prototype-Aligned Latent Refinement for Robust Zero-Shot Chest X-Ray Classification
Kittler · Bhat · Maier
ProtoCLIP refines CLIP-style vision-language models for chest X-ray classification by using curated training data and prototype-aligned distillation to reduce co-occurrence bias and improve zero-shot performance. The method achieves 2-10 percentage point AUC improvements over baseline CLIP on unseen chest X-ray datasets without large-scale retraining.
Formal Sciences
260421.0074
Learning from Less: Measuring the Effectiveness of RLVR in Low Data and Compute Regimes
Bauer · Walshe · Pham +4
This paper investigates how well small language models can learn reasoning tasks through reinforcement learning when training data and compute are limited. The study finds that mixing easy and hard problems during training provides up to 5x better sample efficiency than training on easy problems alone.
Formal Sciences
260421.0087
Universally Empowering Zeroth-Order Optimization via Adaptive Layer-wise Sampling
Wang · Shen · Ding +3
AdaLeZO accelerates zeroth-order optimization for fine-tuning large language models by intelligently sampling layers based on their sensitivity rather than uniformly perturbing all parameters. This adaptive approach achieves 1.7-3.0x speedup over existing methods while maintaining memory efficiency and acting as a universal plug-in for existing ZO optimizers.
Formal Sciences