Cloud & GenAI Ops Architect

KPMG’s Digital & GenAI team builds practical, secure cloud and AI solutions for enterprise clients. We combine consulting, engineering and product delivery to help organizations adopt scalable, cost-effective AI and cloud platforms.

About the job

Role summary

We’re hiring a hands-on Cloud & GenAI Ops Architect who designs and runs production multi-cloud architectures (AWS / Azure / GCP), productionizes GenAI/ML systems and RAG pipelines, and leads client-facing delivery. This is an implementation + advisory role: you’ll produce architectures and runbooks, help teams implement them, and enable clients to operate and evolve the solution.

Key responsibilities

  • Design and deliver end-to-end multi-cloud solutions (architecture diagrams, runbooks, operational playbooks, runbooks for incident response and rollback).
  • Choose and implement the right compute patterns across clouds: static hosting, PaaS web apps, container platforms (AKS/GKE/EKS + Fargate), serverless (Azure Functions / Cloud Run / Lambda), VM scale sets and managed app services — and justify trade-offs to clients.
  • Productionize GenAI / MLOps: model serving, inference scaling, A/B / canary rollouts, versioning, drift detection, model metrics, and embedding/RAG pipelines.
  • Work with cloud AI platforms & model catalogs (e.g., Vertex AI / Model Garden and Azure AI Foundry) to evaluate, deploy and integrate models into products. (Google Cloud, Microsoft Azure)
  • Implement observability and diagnostics for ML and platform services (metrics, logs, traces, dashboards, alerts) using native tooling (AWS CloudWatch, Azure Monitor / Application Insights, Google Cloud Monitoring) and integrate with SIEM/incident workflows. (AWS Documentation, Microsoft Learn)
  • Build repeatable IaC and GitOps patterns (Terraform modules, Bicep/ARM, Helm, GitOps flows) and author CI/CD pipelines (Azure DevOps, GitHub Actions, GitLab CI, Jenkins) for secure, auditable deployments.
  • Integrate security & compliance with Dev and SecOps teams (IAM design, least-privilege, key management, network segmentation, encryption, audit trails).
  • Drive FinOps and cost governance (resource tagging, cost allocation, rightsizing, reservation strategies and budgets).
  • Mentor and enable client/internal teams with architecture reviews, runbooks, workshops, and handovers.
  • Participate in presales/technical scoping and produce pragmatic proposals and effort estimates.

Requirements

  • Deep, demonstrable experience across AWS, Azure and GCP designing and operating production systems.
  • Strong production experience with Kubernetes (cluster ops, autoscaling, CNI networking, storage, multi-tenant considerations) and Terraform (module design, remote state, testing).
  • Broad knowledge of compute patterns: static web hosting, managed web apps, containers, serverless, VM scale sets and when to use each pattern in AWS/GCP/Azure.
  • Hands-on GenAI / MLOps experience: model serving, monitoring (model and infra), drift detection, embeddings, vector stores, and RAG pipelines.
  • Familiarity with cloud model platforms / model catalogs and deployment flows (for example Google’s Model Garden on Vertex AI and Microsoft’s Azure AI Foundry). (Google Cloud, Microsoft Azure)
  • Observability competence: designing alerts, dashboards and SLO/SLI — experience with CloudWatch, Azure Monitor / Application Insights, or Google Cloud Monitoring. (AWS Documentation, Microsoft Learn)
  • Strong background in CI/CD (Azure DevOps, GitHub Actions, GitLab CI), container registries, image scanning, and secure build pipelines.
  • Security & compliance awareness: IAM, KMS/HSM, VNet/VPC design, audit logging and regulatory constraints.
  • Solid scripting/automation skills (Python, Bash; Go / PowerShell a plus).
  • Excellent client communication: can present trade-offs, lead workshops, and translate technical decisions to business stakeholders.
  • Proven problem solver with a can-do spirit — creative, pragmatic, able to design out-of-the-box solutions under constraints and pressure.

Nice to have

  • Cloud / platform certifications (AWS/Azure/GCP Architect, CKA/CKS, HashiCorp Terraform, FinOps).
  • Enterprise consulting experience and prior delivery in regulated industries.
  • Prior exposure to large LLM deployments, inference cost-management and hybrid/sovereign AI deployments.

מדהים! זו המשרה עליה חלמתי

טופס הגשת מועמדות בטעינה

site by: TWB.co.il
©2024 כל הזכויות שמורות ל -KPMG סומך חייקין, שותפות רשומה בישראל ופירמה חברה בארגון הגלובלי של KPMG המורכב מפירמות עצמאיות המסונפות ל-KPMG International Limited, חברה אנגלית פרטית מוגבלת באחריות
צור קשר KPMG Home page