IT Infrastructure Engineer

工作编号: c2046

TÜV Rheinland

永久, 全职

上海市

截至目前

We are looking for a senior AI Infrastructure Engineer to design, build, and operate robust AI/ML infrastructure on public cloud platforms. The ideal candidate will have deep hands-on experience in cloud-native environments, container orchestration, Infrastructure as Code, CI/CD, and observability, ensuring scalable, secure, and efficient AI workloads.
我们正在招聘一名资深的 AI基础架构工程师,负责在公有云环境中设计、搭建和运营稳定高效的AI/ML基础架构。理想候选人需要具备云原生环境、容器编排、基础设施即代码、CI/CD 以及可观测性方面的丰富实战经验,确保AI工作负载的可扩展性、安全性和高性能。

  • Design, deploy, and operate AI/ML infrastructure on public cloud platforms (AWS/Azure/GCP or domestic clouds like Alibaba Cloud/Tencent Cloud).
    在公有云平台(AWS/Azure/GCP 或阿里云/腾讯云等国内云)上设计、部署并运维 AI/ML 基础架构。
  • Build and maintain containerized environments using Docker and manage large-scale workloads with Kubernetes.
    使用 Docker 构建和维护容器化环境,并通过 Kubernetes 管理大规模工作负载。
  • Use Infrastructure as Code (e.g., Terraform, Ansible) to manage and automate environment provisioning, configuration, and changes.
    使用基础设施即代码工具(如 Terraform、Ansible)进行环境的自动化部署、配置与变更管理。
  • Design, implement, and optimize CI/CD pipelines to support frequent, reliable, and secure deployment of AI and backend services.
    设计、实现并优化 CI/CD 流水线,支持 AI 及后端服务的高频、可靠和安全部署。
  • Implement and maintain monitoring, logging, and alerting systems to ensure high availability and quick incident response.
    部署并维护监控、日志与告警系统,保障系统高可用性并支持快速故障响应。
  • Collaborate closely with AI/ML engineers and backend teams to ensure infrastructure meets performance, security, and compliance requirements.
    与 AI/ML 工程师及后端团队紧密合作,确保基础架构满足性能、安全与合规要求。
  • Continuously optimize cost, performance, and reliability of infrastructure, and drive best practices in cloud-native and DevOps.
    持续优化基础架构的成本、性能与可靠性,推动云原生与 DevOps 相关最佳实践的落地。
  • Cloud & Operations | 云平台与运维经验
    • Senior level hands-on experience with deployment and operations on public cloud platforms (AWS/Azure/GCP or domestic platforms like Alibaba Cloud/Tencent Cloud).
      具备资深水平的公有云平台实战经验,能够在 AWS/Azure/GCP 或阿里云/腾讯云等国内平台上独立完成系统的部署与运维。
  • Container & Orchestration | 容器与编排
    • Proficient in containerization technologies (Docker) and container orchestration tools (Kubernetes).
      精通容器化技术(Docker)以及容器编排工具(Kubernetes),具有实际生产环境经验。
  • Infrastructure as Code | 基础设施即代码
    • Skilled in using Infrastructure as Code tools (e.g., Terraform, Ansible) for environment management.
      熟练使用基础设施即代码工具(如 Terraform、Ansible)进行环境管理和自动化运维。
  • CI/CD | 持续集成与持续交付
    • Practical experience in building, maintaining, and optimizing CI/CD pipelines (familiar with tools like GitHub Actions/GitLab CI/Jenkins).
      具备搭建、维护和优化 CI/CD 流水线的实践经验,熟悉 GitHub Actions、GitLab CI、Jenkins 等工具。
  • Monitoring & Observability | 监控与可观测性
    • Familiar with monitoring, logging, and alerting systems (e.g., Prometheus, Grafana, ELK Stack).
      熟悉监控、日志与告警系统(如 Prometheus、Grafana、ELK Stack),能独立完成监控体系的搭建与优化。
  • Networking Fundamentals | 网络基础
    • Senior level knowledge of computer networking, DNS, CDN, and other related fundamentals.
      具备资深水平的计算机网络基础知识,熟悉 DNS、CDN 等相关原理和配置。

    我们只通过在线申请系统接受求职申请。通过电子邮件发送的申请我们无法受理。

    您应该知道的其他事项

    工作编号c2046
    合同类型永久
    就业类型全职
    工作模式不详
    公司名称TÜV Rheinland

    只要职位在我们的 招聘页面上公布、 我们正在寻找合适的候选人 (男女不限)。我们期待您的 申请!

    您可能感兴趣的内容

    TÜV Rheinland 作为雇主

    TÜV Rheinland 作为雇主

    我们的使命

    我们的使命

    可持续性和合规性

    可持续性和合规性

    您的申請程序

    1. 在线申请

    您只能通过我们的职业页面进行在线申请。申请流程非常简单,只需几分钟即可完成。

    stagewave-01