AI 后训练

Design, build, and optimize post‑training workflows, including SFT, RLHF, RLAIF, DPO, and other preference‑based training methods Develop high‑quality training datasets, including instruction data, preference data, and safety‑related data Implement and maintain scalable training pipelines for fine‑tuning and alignment of large language models Collaborate with research teams to experiment with new post‑training techniques and evaluate their effectiveness Build automated evaluation frameworks for model quality, safety, robustness, and user experience Analyze model outputs, identify weaknesses, and propose targeted improvements Work with product teams to understand real‑world use cases and translate them into training and evaluation requirements Ensure compliance with safety, ethics, and responsible AI guidelines throughout the post‑training process Optimize model performance through prompt engineering, data curation, and iterative training cycles

职位描述

职位要求

Fiona Lu