Cloud & GenAI Ops Architect
- תל אביב
- Technology consulting
KPMG’s Digital & GenAI team builds practical, secure cloud and AI solutions for enterprise clients. We combine consulting, engineering and product delivery to help organizations adopt scalable, cost-effective AI and cloud platforms.
About the job
Role summary
We’re hiring a hands-on Cloud & GenAI Ops Architect who designs and runs production multi-cloud architectures (AWS / Azure / GCP), productionizes GenAI/ML systems and RAG pipelines, and leads client-facing delivery. This is an implementation + advisory role: you’ll produce architectures and runbooks, help teams implement them, and enable clients to operate and evolve the solution.
Key responsibilities
- Design and deliver end-to-end multi-cloud solutions (architecture diagrams, runbooks, operational playbooks, runbooks for incident response and rollback).
- Choose and implement the right compute patterns across clouds: static hosting, PaaS web apps, container platforms (AKS/GKE/EKS + Fargate), serverless (Azure Functions / Cloud Run / Lambda), VM scale sets and managed app services — and justify trade-offs to clients.
- Productionize GenAI / MLOps: model serving, inference scaling, A/B / canary rollouts, versioning, drift detection, model metrics, and embedding/RAG pipelines.
- Work with cloud AI platforms & model catalogs (e.g., Vertex AI / Model Garden and Azure AI Foundry) to evaluate, deploy and integrate models into products. (Google Cloud, Microsoft Azure)
- Implement observability and diagnostics for ML and platform services (metrics, logs, traces, dashboards, alerts) using native tooling (AWS CloudWatch, Azure Monitor / Application Insights, Google Cloud Monitoring) and integrate with SIEM/incident workflows. (AWS Documentation, Microsoft Learn)
- Build repeatable IaC and GitOps patterns (Terraform modules, Bicep/ARM, Helm, GitOps flows) and author CI/CD pipelines (Azure DevOps, GitHub Actions, GitLab CI, Jenkins) for secure, auditable deployments.
- Integrate security & compliance with Dev and SecOps teams (IAM design, least-privilege, key management, network segmentation, encryption, audit trails).
- Drive FinOps and cost governance (resource tagging, cost allocation, rightsizing, reservation strategies and budgets).
- Mentor and enable client/internal teams with architecture reviews, runbooks, workshops, and handovers.
- Participate in presales/technical scoping and produce pragmatic proposals and effort estimates.
Requirements
- Deep, demonstrable experience across AWS, Azure and GCP designing and operating production systems.
- Strong production experience with Kubernetes (cluster ops, autoscaling, CNI networking, storage, multi-tenant considerations) and Terraform (module design, remote state, testing).
- Broad knowledge of compute patterns: static web hosting, managed web apps, containers, serverless, VM scale sets and when to use each pattern in AWS/GCP/Azure.
- Hands-on GenAI / MLOps experience: model serving, monitoring (model and infra), drift detection, embeddings, vector stores, and RAG pipelines.
- Familiarity with cloud model platforms / model catalogs and deployment flows (for example Google’s Model Garden on Vertex AI and Microsoft’s Azure AI Foundry). (Google Cloud, Microsoft Azure)
- Observability competence: designing alerts, dashboards and SLO/SLI — experience with CloudWatch, Azure Monitor / Application Insights, or Google Cloud Monitoring. (AWS Documentation, Microsoft Learn)
- Strong background in CI/CD (Azure DevOps, GitHub Actions, GitLab CI), container registries, image scanning, and secure build pipelines.
- Security & compliance awareness: IAM, KMS/HSM, VNet/VPC design, audit logging and regulatory constraints.
- Solid scripting/automation skills (Python, Bash; Go / PowerShell a plus).
- Excellent client communication: can present trade-offs, lead workshops, and translate technical decisions to business stakeholders.
- Proven problem solver with a can-do spirit — creative, pragmatic, able to design out-of-the-box solutions under constraints and pressure.
Nice to have
- Cloud / platform certifications (AWS/Azure/GCP Architect, CKA/CKS, HashiCorp Terraform, FinOps).
- Enterprise consulting experience and prior delivery in regulated industries.
- Prior exposure to large LLM deployments, inference cost-management and hybrid/sovereign AI deployments.
מדהים! זו המשרה עליה חלמתי
טופס הגשת מועמדות בטעינה
Liked it? Share it!