company

Senior R&D Engineer of AI Large Model Distributed Platform

Job Responsibilities

  • The platform performance is optimized to improve the throughput and response speed of the platform. In AI large data model prediction and inference, accurate traffic prediction and fast cold start of inference instances enable on-demand elastic scaling of serverless inference.
  • Ensure high availability and scalability of the platform. Simple APIs can be flexibly extended and customized. For example, the AI model training code can be inherent framework code.
  • Supports distributed loading and processing of AI data, public AI datasets, and out-of-the-box data enhancement operators. Focus on efficient development, high resource utilization, high performance, and high availability of large-scale heterogeneous clusters.

Job Requirements

  • The platform performance is optimized to improve the throughput and response speed of the platform. In AI large data model prediction and inference, accurate traffic prediction and fast cold start of inference instances enable on-demand elastic scaling of serverless inference.
  • Ensure high availability and scalability of the platform. Simple APIs can be flexibly extended and customized. For example, the AI model training code can be inherent framework code.
  • Supports distributed loading and processing of AI data, public AI datasets, and out-of-the-box data enhancement operators. Focus on efficient development, high resource utilization, high performance, and high availability of large-scale heterogeneous clusters.
  • Familiar with Linux system, at least familiar with one of c++/golang/python:
  • Familiar with distributed middleware, such as redis, kafka, etcd, zookeeper, with development experience is preferred;
  • Familiar with technologies such as distributed computing and cloud computing, and able to master and use distributed computing frameworks such as Ray to process and analyze data (preferred)