某明星AI初创公司
大模型后训练算法
信息技术
数字技术
杭州
经验不限
本科
¥40 - 50K14薪
职位描述
Responsibilities
-Lead the design and development of post‑training algorithms for large language models, including Supervised Fine‑Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), DPO, PPO, and other alignment techniques.
-Build and optimize data pipelines for instruction tuning, preference modeling, reward modeling, and safety alignment.
-Develop scalable training strategies to improve model helpfulness, safety, reasoning ability, and robustness across diverse tasks.
-Conduct experiments to evaluate model behavior, diagnose failure cases, and iterate on training methods to improve performance.
-Collaborate with data, infrastructure, and product teams to define post‑training objectives and integrate aligned models into production systems.
-Research and apply state‑of‑the‑art techniques in alignment, distillation, preference optimization, and model evaluation.
-Establish evaluation frameworks and benchmarks for reasoning, factuality, safety, and user experience.
-Mentor junior researchers and contribute to long‑term technical planning for model alignment and post‑training.
职位要求
Qualifications
-Master’s degree or PhD in Computer Science, Machine Learning, Artificial Intelligence, or a related field.
-Strong expertise in machine learning, deep learning, and large‑scale model training.
-Hands‑on experience with LLM post‑training methods such as SFT, RLHF, DPO, PPO, or reward modeling.
-Proficiency in Python and deep learning frameworks such as PyTorch.
-Solid understanding of transformer architectures, attention mechanisms, and large‑scale distributed training.
-Experience working with large datasets, data quality control, and instruction‑tuning data design.
-Ability to design experiments, analyze results, and iterate quickly.
-Strong communication skills and ability to collaborate across research, engineering, and product teams.
Preferred Qualifications
-Experience training or aligning large‑scale foundation models (LLMs, multimodal models).
-Background in human‑AI interaction, safety alignment, or evaluation methodology.
-Familiarity with reinforcement learning, preference learning, or probabilistic modeling.
-Experience with model compression, distillation, or inference optimization.
-Publications in top AI/ML conferences or contributions to open‑source LLM projects.
咨询顾问
Cassie Lin
Section Manager行业经理
分享