某AI公司
AIGC 数据工程师
IT开发
数字技术
北京
3-5年
本科
¥40 - 50K14薪
公司介绍
我们的客户是一家快速发展的AI硬件公司
职位描述
Responsibilities
-Build and optimize data pipelines for AIGC (Generative AI) workflows, including data collection, cleaning, annotation, augmentation, and quality evaluation.
-Develop and maintain high‑quality datasets to support training, fine‑tuning, and evaluation of large models (LLMs, diffusion models, multimodal models).
-Design scalable and efficient data processing frameworks to improve throughput, stability, and reliability.
-Collaborate closely with AI/ML engineers to understand model training requirements and develop data strategies (sampling, distribution control, alignment data, RLHF data, etc.).
-Establish data quality monitoring systems and continuously improve data standards, governance processes, and security practices.
-Research and apply advanced data augmentation and synthetic data generation techniques to enhance model performance.
-Support AIGC product development with data‑related tasks, including prompt optimization and inference‑time data improvements.
职位要求
Qualifications
-Bachelor’s degree or above in Computer Science, Data Science, Artificial Intelligence, or related fields.
-Strong proficiency in Python and experience with data processing libraries such as Pandas, PySpark, or similar.
-Familiarity with large‑scale data processing and distributed computing frameworks (Hadoop, Spark, Flink) and data lake/warehouse architectures.
-Understanding of LLMs, diffusion models, embeddings, and vector databases (Faiss, Milvus, Pinecone).
-Experience with data annotation, cleaning, augmentation, and dataset construction for machine learning.
-Knowledge of model training workflows (pre‑training, fine‑tuning, alignment, evaluation) is a strong plus.
-Solid engineering skills with experience using Git, Docker, Kubernetes, or similar tools.
-Strong communication skills, cross‑team collaboration ability, and a self‑driven mindset.
Preferred Qualifications
-Hands‑on experience with LLM training, SFT, RLHF, or RAG systems.
-Experience working with multimodal data (image, audio, video).
-Background in building data platforms or data governance systems.
-Contributions to open‑source projects or technical writing.
咨询顾问
Cassie Lin
Section Manager行业经理
分享