directory-icon

SRE (Site Reliability Engineering)

Job Responsibilities

  • -Continuously reduce the risk of system operation and maintenance, and can handle various emergency events 7*24 hours;
  • -Communicate with the project team and R&D, and give regular feedback and promote improvements to the problems existing in the business operation environment;
  • -According to user feedback and business development needs, continuously promote system/process iterative upgrades and quickly respond to business needs;
  • -Assist in the planning, design, implementation and optimization of the automated operation and maintenance platform;

Job Requirements

  • ●\tA strong passion for automation and repeatable processes
  • ●\tHands-on experience leveraging containerization including Kubernetes (EKS or similar), docker and other container technologies
  • ●\tDeep exposure to at least one of the following cloud providers: AWS, GCP or Alibaba cloud
  • ●\tExperience working with a mainstream programming language such as Java, go and Python
  • ●\tBroad experience with modern CI/CD pipelines ( Gitlab, Jenkins etc)
  • ●\tDemonstrates innovative methods of declaratively automating cloud-based IaaS/PaaS deployments and applications using modern and innovative GitOps and DevOps techniques and technologies.
  • ●\tA fervent enthusiasm for infrastructure-as-code
  • ●\tA love for operational metrics including MTTD & MTTR and the capabilities and practices that allow these to be continually improved upon
  • ●\tExcellent understanding of OS, platform and network security practices, patterns and frameworks
  • ●\tNetwork topology and infrastructure architectures on Linux based platforms - Data driven and passions in Observability