- Diffusion Models Are Real-Time Game Engines
- Distributed Training Over-the-Internet
- Reward Modeling as Next-Token Prediction
- A Law of Next-Token Prediction in LLMs
- The Mamba in the Llama
- Physics of Language Models
- BaichuanSEED
- Long Context as a New Modality
- PolyRouter
- Training-Free Activation Sparsity in LLMs
- Pluto and Charon
- RAGLAB
- Fire-Flyer AI-HPC
- Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts
- Text2SQL is Not Enough
- Unsupervised-to-Online Reinforcement Learning
- LlamaDuo
- Performance Law of Large Language Models
MLLM/VLM/CV