大模型后训练算法

Responsibilities -Lead the design and development of post‑training algorithms for large language models, including Supervised Fine‑Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), DPO, PPO, and other alignment techniques. -Build and optimize data pipelines for instruction tuning, preference modeling, reward modeling, and safety alignment. -Develop scalable training strategies to improve model helpfulness, safety, reasoning ability, and robustness across diverse tasks. -Conduct experiments to evaluate model behavior, diagnose failure cases, and iterate on training methods to improve performance. -Collaborate with data, infrastructure, and product teams to define post‑training objectives and integrate aligned models into production systems. -Research and apply state‑of‑the‑art techniques in alignment, distillation, preference optimization, and model evaluation. -Establish evaluation frameworks and benchmarks for reasoning, factuality, safety, and user experience. -Mentor junior researchers and contribute to long‑term technical planning for model alignment and post‑training.

Qualifications -Master’s degree or PhD in Computer Science, Machine Learning, Artificial Intelligence, or a related field. -Strong expertise in machine learning, deep learning, and large‑scale model training. -Hands‑on experience with LLM post‑training methods such as SFT, RLHF, DPO, PPO, or reward modeling. -Proficiency in Python and deep learning frameworks such as PyTorch. -Solid understanding of transformer architectures, attention mechanisms, and large‑scale distributed training. -Experience working with large datasets, data quality control, and instruction‑tuning data design. -Ability to design experiments, analyze results, and iterate quickly. -Strong communication skills and ability to collaborate across research, engineering, and product teams. Preferred Qualifications -Experience training or aligning large‑scale foundation models (LLMs, multimodal models). -Background in human‑AI interaction, safety alignment, or evaluation methodology. -Familiarity with reinforcement learning, preference learning, or probabilistic modeling. -Experience with model compression, distillation, or inference optimization. -Publications in top AI/ML conferences or contributions to open‑source LLM projects.

职位描述

职位要求

Cassie Lin