Home › Companies › 10pearls › Staff/Senior Software Consultant - Azure, AKS, MLflow & Kubeflow
Staff/Senior Software Consultant - Azure, AKS, MLflow & Kubeflow
10pearls · Karachi, Lahore, Islamabad · Active · JazzHR / ApplyToJob
Job facts
| Field | Value |
|---|---|
| Company | 10pearls |
| Title | Staff/Senior Software Consultant - Azure, AKS, MLflow & Kubeflow |
| Normalized title | - |
| Department / team | - |
| Location | Karachi, Lahore, Islamabad |
| Work model | - |
| Employment type | Full Time |
| Salary | - |
| Status | active |
| ATS provider | JazzHR / ApplyToJob |
| Posted / first seen | 2026-05-16 / 2026-05-30 |
| Changed / last seen | 2026-05-30 / 2026-06-06 |
Related slices
| Page | What it contains | Open |
|---|---|---|
| Company jobs | Active postings from 10pearls. | Open |
| Company breakdowns | Role, location, ATS, and work model facets for this company. | Open |
| ATS provider jobs | Active postings observed through JazzHR / ApplyToJob. | Open |
| Provider filtered search | The same provider as a filtered job collection. | Open |
| Lifecycle events | Open, update, close, and reopen events for this posting. | Open |
| Original posting | Canonical source or apply URL captured from the ATS. | Open |
Linked records
| Company | 10pearls |
| Source | ecc85604-d4af-4971-b467-d3e9f14798bc |
| ATS provider | JazzHR / ApplyToJob |
Description
Company Overview 10Pearls is an award-winning end-to-end digital innovation company that helps businesses imagine and build the future. We are proud to announce that 10Pearls was named as winner of the Best Tech Work Culture Timmy Award in Washington DC by Tech in Motion, recognized on the Inc. 5000 Fastest-Growing Companies List, and was ranked the #1 Most Diverse Midsize Company in Greater Washington. We partner with businesses to help them transform, scale, and accelerate by adopting digital and exponential technologies. Our work has ranged from creating highly usable, secure digital experiences, mobile and software products, to helping businesses modernize through cloud adoption and development and the digitalization of their business processes. Our clientele is highly diverse, including Global 1000 enterprises, mid-market businesses, and high-growth start-ups. But those are just the facts. What makes us unique is that we have true heart and soul. We have a strong focus on a double bottom line and actively support and engage with the communities where we live and work to make the world a better place. In a nutshell, we believe in doing well, while doing good, and know how to balance the two.
Role 10Pearls is seeking a Staff/Senior MLOps Engineer – Azure ML Platform & LLMOps to design, build, and operate production-grade machine learning and LLM infrastructure at scale. This role is ideal for an experienced MLOps engineer who can move machine learning and generative AI systems from experimentation into secure, reliable, and scalable production environments. In this highly hands-on engineering role, you will lead the development of core platform capabilities across Azure-based AI systems, with a strong emphasis on performance, safety, and cost efficiency.
Own key platform areas, including ML infrastructure, deployment automation, monitoring, observability, scalability, and operational excellence.
Enable fast, secure, and cost-effective ML operations across production environments.
Partner closely with ML Engineers, Data Engineers, and platform teams to support successful delivery and operation of AI systems.
Responsibilities • Design and operate end-to-end ML infrastructure on Microsoft Azure, including training environments, model registries, deployment workflows, and scalable inference systems on Azure Kubernetes Service (AKS)
• Own and evolve MLflow and Kubeflow platforms, including experiment tracking, model registry management, reproducible training workflows, and pipeline orchestration
• Build and maintain robust CI/CD pipelines in GitLab for ML models and AI services, including validation gates, canary deployments, progressive delivery, and automated rollback strategies
• Design scalable inference systems using AKS autoscaling, GPU scheduling, Redis caching, asynchronous processing with Azure Service Bus, and cost-aware infrastructure planning
• Implement comprehensive monitoring and observability for ML and LLM systems, covering infrastructure metrics, latency, drift detection, token usage, quality metrics, and operational cost tracking
• Define and enforce platform-level security controls including IAM policies, secrets management, network segmentation, audit logging, dependency scanning, and model access governance
• Build highly available and fault-tolerant ML serving infrastructure with strong focus on scalability, disaster recovery, resilience, and platform reliability
• Define and maintain platform SLOs for ML services, including incident response processes, postmortems, and operational improvement initiatives
• Partner closely with ML Engineers to productionize new ML models, LLM systems, and agentic AI workflows with safe rollout and evaluation patterns
• Optimize infrastructure utilization and operational cost across compute, GPU workloads, and LLM provider usage through batching, caching, autoscaling, and routing strategies
• Ensure all production ML and AI services have actionable dashboards, alerts, observability standards, and operational playbooks for on-call readiness
Requirements • Bachelor’s degree in Computer Science, Engineering, or a related field (preferred)
• 5+ years of professional experience in MLOps, DevOps, SRE, Platform Engineering, or ML Infrastructure roles
• Minimum 3 years of hands-on experience supporting production-grade ML systems and AI platforms
• Strong hands-on experience with Microsoft Azure, including Azure Kubernetes Service (AKS), Azure Service Bus, Azure Storage, networking, identity management, and cloud cost optimization
• Strong Kubernetes operational expertise including Helm, Ingress Controllers, autoscaling (HPA/VPA/KEDA), GPU scheduling, workload troubleshooting, and large-scale container orchestration
• Production experience with MLflow, Kubeflow, or equivalent ML platform tooling for experiment tracking, model registries, and ML pipeline orchestration
• Strong expertise in GitLab CI/CD or equivalent CI/CD tooling for automated deployments, validation gates, rollback workflows, and progressive delivery patterns
• Hands-on experience with monitoring and observability platforms including Prometheus, Grafana, OpenTelemetry, Azure Monitor, Datadog, New Relic, or Elastic
• Experience monitoring ML/LLM systems including latency, model performance, drift, token usage, infrastructure health, and operational costs
• Strong proficiency in Python and shell scripting for automation and operational tooling
• Experience with Infrastructure-as-Code tools such as Terraform, Bicep, or ARM templates
• Strong troubleshooting, debugging, and incident response capabilities across distributed systems and cloud-native environments
• Excellent written and verbal communication skills, including technical documentation, runbooks, and incident reporting
Nice to Have • Experience operating production-grade LLM or Generative AI systems, including prompt versioning, evaluation frameworks, routing layers, and vector store operations
• Experience with Azure AI Foundry, AWS AgentCore, SageMaker, or similar AI platform services
• Exposure to GPU infrastructure and inference tooling such as NVIDIA GPU Operator, Triton Inference Server, vLLM, or TGI
• Familiarity with model observability and evaluation platforms such as Arize, Fiddler, WhyLabs, or Evidently
• Experience implementing security and compliance controls for enterprise ML environments
• Experience working with vector databases, semantic search systems, or Retrieval-Augmented Generation (RAG) architectures
Full job record
| Job ID | 6a13c861c20d95ec00ab3288d248643ae25125d5 |
| Org ID | e69a6fcc-024f-4d99-ada4-5630f4f934d3 |
| Source ID | ecc85604-d4af-4971-b467-d3e9f14798bc |
| Board ID | ecc85604-d4af-4971-b467-d3e9f14798bc |
| Provider | jazzhr |
| Provider Job Key | LQxHJjfMrD |
| Title | Staff/Senior Software Consultant - Azure, AKS, MLflow & Kubeflow |
| Normalized Title | — |
| Status | active |
| Active | yes |
| Location Text | Karachi, Lahore, Islamabad |
| Department | — |
| Team | — |
| Employment Type | full_time |
| Workplace Type | — |
| Remote Policy | — |
| Country | — |
| Region | — |
| City | — |
| Salary Raw | — |
| Salary Min | — |
| Salary Max | — |
| Salary Currency | — |
| Salary Period | — |
| Source URL | https://10pearls.applytojob.com/apply/LQxHJjfMrD/StaffSenior-Software-Consultant-Azure-AKS-MLflow-Kubeflow |
| Apply URL | https://10pearls.applytojob.com/apply/LQxHJjfMrD/StaffSenior-Software-Consultant-Azure-AKS-MLflow-Kubeflow |
| First Seen At | 2026-05-30 06:11:52Z |
| Last Seen At | 2026-06-06 10:53:47Z |
| Last Checked At | 2026-06-06 10:53:47Z |
| Last Changed At | 2026-05-30 06:11:52Z |
| Inactive At | — |
| Source Posted At | 2026-05-16 00:00:00Z |
| Source Updated At | — |
| Raw Payload Uri | s3://job-postings-prod-raw-590183727216/raw/provider=jazzhr/board=10pearls/date=2026-06-06/2026-06-06T10-53-46-778Z-dd1be45ea8392f07e55ccd9e692a28f4fb6f163fbfc9b33232e4c22b3d4a5683.json |
Event Fields
{
"content_hash": "9d8f4045351f1f95cf1f7ecb56a318037d864015250e14f9ca4472d5030c834f",
"source_hash": "0322c412bdfc8181eb9267cad9b962857e9af68055e44bd78a54b6011b5f3af4",
"last_changed_at": "2026-05-30T06:11:52.568Z",
"active_status": "active"
}Parsed Structured
{
"language": "en",
"location": {
"raw": "Karachi, Lahore, Islamabad",
"city": null,
"region": null,
"country": null,
"is_remote": false,
"confidence": 0.8
},
"salary_max": null,
"salary_min": null,
"inferred_at": "2026-06-06T10:53:47.684Z",
"launch_scope": {
"reason": "jazzhr_production_catalog",
"included": true,
"location": {
"raw": "Karachi, Lahore, Islamabad",
"city": null,
"region": null,
"country": null,
"is_remote": false,
"confidence": 0.8
},
"countries": []
},
"remote_policy": null,
"salary_period": null,
"workplace_type": null,
"salary_currency": null
}Extensions
{}Native Structured
{
"detail": {
"url": "https://10pearls.applytojob.com/apply/jobs/details/LQxHJjfMrD?&",
"heading": "Company Overview 10Pearls is an award-winning end-to-end digital innovation company that helps businesses imagine and build the future. We are proud to announce that 10Pearls was named as winner of the Best Tech Work Culture Timmy Award in Washington DC by Tech in Motion, recognized on the Inc. 5000 Fastest-Growing Companies List, and was ranked the #1 Most Diverse Midsize Company in Greater Washington. We partner with businesses to help them transform, scale, and accelerate by adopting digital and exponential technologies. Our work has ranged from creating highly usable, secure digital experiences, mobile and software products, to helping businesses modernize through cloud adoption and development and the digitalization of their business processes. Our clientele is highly diverse, including Global 1000 enterprises, mid-market businesses, and high-growth start-ups. But those are just the facts. What makes us unique is that we have true heart and soul. We have a strong focus on a double bottom line and actively support and engage with the communities where we live and work to make the world a better place. In a nutshell, we believe in doing well, while doing good, and know how to balance the two. Role 10Pearls is seeking a Staff\\/Senior MLOps Engineer – Azure ML Platform & LLMOps to design, build, and operate production-grade machine learning and LLM infrastructure at scale. This role is ideal for an experienced MLOps engineer who can move machine learning and generative AI systems from experimentation into secure, reliable, and scalable production environments. In this highly hands-on engineering role, you will lead the development of core platform capabilities across Azure-based AI systems, with a strong emphasis on performance, safety, and cost efficiency. Own key platform areas, including ML infrastructure, deployment automation, monitoring, observability, scalability, and operational excellence. Enable fast, secure, and cost-effective ML operations across production environments. Partner closely with ML Engineers, Data Engineers, and platform teams to support successful delivery and operation of AI systems. Responsibilities • Design and operate end-to-end ML infrastructure on Microsoft Azure, including training environments, model registries, deployment workflows, and scalable inference systems on Azure Kubernetes Service (AKS)\n• Own and evolve MLflow and Kubeflow platforms, including experiment tracking, model registry management, reproducible training workflows, and pipeline orchestration\n• Build and maintain robust CI\\/CD pipelines in GitLab for ML models and AI services, including validation gates, canary deployments, progressive delivery, and automated rollback strategies\n• Design scalable inference systems using AKS autoscaling, GPU scheduling, Redis caching, asynchronous processing with Azure Service Bus, and cost-aware infrastructure planning\n• Implement comprehensive monitoring and observability for ML and LLM systems, covering infrastructure metrics, latency, drift detection, token usage, quality metrics, and operational cost tracking\n• Define and enforce platform-level security controls including IAM policies, secrets management, network segmentation, audit logging, dependency scanning, and model access governance\n• Build highly available and fault-tolerant ML serving infrastructure with strong focus on scalability, disaster recovery, resilience, and platform reliability\n• Define and maintain platform SLOs for ML services, including incident response processes, postmortems, and operational improvement initiatives\n• Partner closely with ML Engineers to productionize new ML models, LLM systems, and agentic AI workflows with safe rollout and evaluation patterns\n• Optimize infrastructure utilization and operational cost across compute, GPU workloads, and LLM provider usage through batching, caching, autoscaling, and routing strategies\n• Ensure all production ML and AI services have actionable dashboards, alerts, observability standards, and operational playbooks for on-call readiness Requirements • Bachelor’s degree in Computer Science, Engineering, or a related field (preferred)\n• 5+ years of professional experience in MLOps, DevOps, SRE, Platform Engineering, or ML Infrastructure roles\n• Minimum 3 years of hands-on experience supporting production-grade ML systems and AI platforms\n• Strong hands-on experience with Microsoft Azure, including Azure Kubernetes Service (AKS), Azure Service Bus, Azure Storage, networking, identity management, and cloud cost optimization\n• Strong Kubernetes operational expertise including Helm, Ingress Controllers, autoscaling (HPA\\/VPA\\/KEDA), GPU scheduling, workload troubleshooting, and large-scale container orchestration\n• Production experience with MLflow, Kubeflow, or equivalent ML platform tooling for experiment tracking, model registries, and ML pipeline orchestration\n• Strong expertise in GitLab CI\\/CD or equivalent CI\\/CD tooling for automated deployments, validation gates, rollback workflows, and progressive delivery patterns\n• Hands-on experience with monitoring and observability platforms including Prometheus, Grafana, OpenTelemetry, Azure Monitor, Datadog, New Relic, or Elastic\n• Experience monitoring ML\\/LLM systems including latency, model performance, drift, token usage, infrastructure health, and operational costs\n• Strong proficiency in Python and shell scripting for automation and operational tooling\n• Experience with Infrastructure-as-Code tools such as Terraform, Bicep, or ARM templates\n• Strong troubleshooting, debugging, and incident response capabilities across distributed systems and cloud-native environments\n• Excellent written and verbal communication skills, including technical documentation, runbooks, and incident reporting Nice to Have • Experience operating production-grade LLM or Generative AI systems, including prompt versioning, evaluation frameworks, routing layers, and vector store operations\n• Experience with Azure AI Foundry, AWS AgentCore, SageMaker, or similar AI platform services\n• Exposure to GPU infrastructure and inference tooling such as NVIDIA GPU Operator, Triton Inference Server, vLLM, or TGI\n• Familiarity with model observability and evaluation platforms such as Arize, Fiddler, WhyLabs, or Evidently\n• Experience implementing security and compliance controls for enterprise ML environments\n• Experience working with vector databases, semantic search systems, or Retrieval-Augmented Generation (RAG) architectures \",\n \"datePosted\": \"2026-05-16\",\n \"validThrough\": \"2026-08-14\",\n \"employmentType\": \"FULL_TIME\",\n \"hiringOrganization\": {\n \"@type\": \"Organization\",\n \"name\": \"10Pearls\",\n \"sameAs\": \"https:\\/\\/10pearls.com\\/\",\n \"logo\": \"https:\\/\\/s3.amazonaws.com\\/resumator\\/customer_20200617142926_FIFOKRA3QXMMR03Z\\/logos\\/20230316120547_10plogo_2_x50.png\"\n },\n \"jobLocation\": {\n \"@type\": \"Place\",\n \"address\": {\n \"@type\": \"PostalAddress\",\n \"addressLocality\": \"Karachi, Lahore, Islamabad\",\n \"addressRegion\": \"\",\n \"postalCode\": \"\"\n }\n },\n \"experienceRequirements\": \"Experienced\",\n \"uniqueJobCode\": \"job_20260516101234_TTZFZDOW0EMMJ8ML\"\n}\n « Back to Jobs\n 10Pearls\n Staff/Senior Software Consultant - Azure, AKS, MLflow & Kubeflow",
"html_title": "JazzHR » Job Listings",
"canonical_url": "https://10pearls.applytojob.com/apply/LQxHJjfMrD/StaffSenior-Software-Consultant-Azure-AKS-MLflow-Kubeflow",
"description_html": "<h1><span style=\"font-size:14px;\"><span style=\"font-family:Arial, Helvetica, sans-serif;\"><strong>Company Overview</strong></span></span></h1><p><span style=\"font-size:14px;\"><span style=\"font-family:Arial, Helvetica, sans-serif;\">10Pearls is an award-winning end-to-end digital innovation company that helps businesses imagine and build the future. We are proud to announce that 10Pearls was named as winner of the Best Tech Work Culture Timmy Award in Washington DC by Tech in Motion, recognized on the Inc. 5000 Fastest-Growing Companies List, and was ranked the #1 Most Diverse Midsize Company in Greater Washington. We partner with businesses to help them transform, scale, and accelerate by adopting digital and exponential technologies. Our work has ranged from creating highly usable, secure digital experiences, mobile and software products, to helping businesses modernize through cloud adoption and development and the digitalization of their business processes. Our clientele is highly diverse, including Global 1000 enterprises, mid-market businesses, and high-growth start-ups. But those are just the facts. What makes us unique is that we have true heart and soul. We have a strong focus on a double bottom line and actively support and engage with the communities where we live and work to make the world a better place. In a nutshell, we believe in doing well, while doing good, and know how to balance the two.</span></span></p><h1><span style=\"font-size:14px;\"><span style=\"font-family:Arial, Helvetica, sans-serif;\"><strong>Role</strong></span></span></h1><p><span style=\"font-size:14px;\"><span style=\"font-family:Arial, Helvetica, sans-serif;\">10Pearls is seeking a Staff/Senior MLOps Engineer – Azure ML Platform & LLMOps to design, build, and operate production-grade machine learning and LLM infrastructure at scale. This role is ideal for an experienced MLOps engineer who can move machine learning and generative AI systems from experimentation into secure, reliable, and scalable production environments. </span></span><span style=\"font-size:14px;\"><span style=\"font-family:Arial, Helvetica, sans-serif;\">In this highly hands-on engineering role, you will lead the development of core platform capabilities across Azure-based AI systems, with a strong emphasis on performance, safety, and cost efficiency. </span></span></p><ul><li><p><span style=\"font-size:14px;\"><span style=\"font-family:Arial, Helvetica, sans-serif;\">Own key platform areas, including ML infrastructure, deployment automation, monitoring, observability, scalability, and operational excellence. </span></span></p></li></ul><ul><li><p><span style=\"font-size:14px;\"><span style=\"font-family:Arial, Helvetica, sans-serif;\">Enable fast, secure, and cost-effective ML operations across production environments. </span></span></p></li></ul><ul><li><p><span style=\"font-size:14px;\"><span style=\"font-family:Arial, Helvetica, sans-serif;\">Partner closely with ML Engineers, Data Engineers, and platform teams to support successful delivery and operation of AI systems. </span></span></p></li></ul><h1><span style=\"font-size:14px;\"><span style=\"font-family:Arial, Helvetica, sans-serif;\"><strong>Responsibilities</strong></span></span></h1><p><span style=\"font-size:14px;\"><span style=\"font-family:Arial, Helvetica, sans-serif;\">• Design and operate end-to-end ML infrastructure on Microsoft Azure, including training environments, model registries, deployment workflows, and scalable inference systems on Azure Kubernetes Service (AKS)<br>• Own and evolve MLflow and Kubeflow platforms, including experiment tracking, model registry management, reproducible training workflows, and pipeline orchestration<br>• Build and maintain robust CI/CD pipelines in GitLab for ML models and AI services, including validation gates, canary deployments, progressive delivery, and automated rollback strategies<br>• Design scalable inference systems using AKS autoscaling, GPU scheduling, Redis caching, asynchronous processing with Azure Service Bus, and cost-aware infrastructure planning<br>• Implement comprehensive monitoring and observability for ML and LLM systems, covering infrastructure metrics, latency, drift detection, token usage, quality metrics, and operational cost tracking<br>• Define and enforce platform-level security controls including IAM policies, secrets management, network segmentation, audit logging, dependency scanning, and model access governance<br>• Build highly available and fault-tolerant ML serving infrastructure with strong focus on scalability, disaster recovery, resilience, and platform reliability<br>• Define and maintain platform SLOs for ML services, including incident response processes, postmortems, and operational improvement initiatives<br>• Partner closely with ML Engineers to productionize new ML models, LLM systems, and agentic AI workflows with safe rollout and evaluation patterns<br>• Optimize infrastructure utilization and operational cost across compute, GPU workloads, and LLM provider usage through batching, caching, autoscaling, and routing strategies<br>• Ensure all production ML and AI services have actionable dashboards, alerts, observability standards, and operational playbooks for on-call readiness</span></span></p><h1><span style=\"font-size:14px;\"><span style=\"font-family:Arial, Helvetica, sans-serif;\"><strong>Requirements</strong></span></span></h1><p><span style=\"font-size:14px;\"><span style=\"font-family:Arial, Helvetica, sans-serif;\">• Bachelor’s degree in Computer Science, Engineering, or a related field (preferred)<br>• 5+ years of professional experience in MLOps, DevOps, SRE, Platform Engineering, or ML Infrastructure roles<br>• Minimum 3 years of hands-on experience supporting production-grade ML systems and AI platforms<br>• Strong hands-on experience with Microsoft Azure, including Azure Kubernetes Service (AKS), Azure Service Bus, Azure Storage, networking, identity management, and cloud cost optimization<br>• Strong Kubernetes operational expertise including Helm, Ingress Controllers, autoscaling (HPA/VPA/KEDA), GPU scheduling, workload troubleshooting, and large-scale container orchestration<br>• Production experience with MLflow, Kubeflow, or equivalent ML platform tooling for experiment tracking, model registries, and ML pipeline orchestration<br>• Strong expertise in GitLab CI/CD or equivalent CI/CD tooling for automated deployments, validation gates, rollback workflows, and progressive delivery patterns<br>• Hands-on experience with monitoring and observability platforms including Prometheus, Grafana, OpenTelemetry, Azure Monitor, Datadog, New Relic, or Elastic<br>• Experience monitoring ML/LLM systems including latency, model performance, drift, token usage, infrastructure health, and operational costs<br>• Strong proficiency in Python and shell scripting for automation and operational tooling<br>• Experience with Infrastructure-as-Code tools such as Terraform, Bicep, or ARM templates<br>• Strong troubleshooting, debugging, and incident response capabilities across distributed systems and cloud-native environments<br>• Excellent written and verbal communication skills, including technical documentation, runbooks, and incident reporting</span></span></p><h1><span style=\"font-size:14px;\"><span style=\"font-family:Arial, Helvetica, sans-serif;\"><strong>Nice to Have</strong></span></span></h1><p><span style=\"font-size:14px;\"><span style=\"font-family:Arial, Helvetica, sans-serif;\">• Experience operating production-grade LLM or Generative AI systems, including prompt versioning, evaluation frameworks, routing layers, and vector store operations<br>• Experience with Azure AI Foundry, AWS AgentCore, SageMaker, or similar AI platform services<br>• Exposure to GPU infrastructure and inference tooling such as NVIDIA GPU Operator, Triton Inference Server, vLLM, or TGI<br>• Familiarity with model observability and evaluation platforms such as Arize, Fiddler, WhyLabs, or Evidently<br>• Experience implementing security and compliance controls for enterprise ML environments<br>• Experience working with vector databases, semantic search systems, or Retrieval-Augmented Generation (RAG) architectures</span></span></p>",
"description_text": "Company Overview\n 10Pearls is an award-winning end-to-end digital innovation company that helps businesses imagine and build the future. We are proud to announce that 10Pearls was named as winner of the Best Tech Work Culture Timmy Award in Washington DC by Tech in Motion, recognized on the Inc. 5000 Fastest-Growing Companies List, and was ranked the #1 Most Diverse Midsize Company in Greater Washington. We partner with businesses to help them transform, scale, and accelerate by adopting digital and exponential technologies. Our work has ranged from creating highly usable, secure digital experiences, mobile and software products, to helping businesses modernize through cloud adoption and development and the digitalization of their business processes. Our clientele is highly diverse, including Global 1000 enterprises, mid-market businesses, and high-growth start-ups. But those are just the facts. What makes us unique is that we have true heart and soul. We have a strong focus on a double bottom line and actively support and engage with the communities where we live and work to make the world a better place. In a nutshell, we believe in doing well, while doing good, and know how to balance the two.\n Role\n 10Pearls is seeking a Staff/Senior MLOps Engineer – Azure ML Platform & LLMOps to design, build, and operate production-grade machine learning and LLM infrastructure at scale. This role is ideal for an experienced MLOps engineer who can move machine learning and generative AI systems from experimentation into secure, reliable, and scalable production environments. In this highly hands-on engineering role, you will lead the development of core platform capabilities across Azure-based AI systems, with a strong emphasis on performance, safety, and cost efficiency.\n Own key platform areas, including ML infrastructure, deployment automation, monitoring, observability, scalability, and operational excellence.\n Enable fast, secure, and cost-effective ML operations across production environments.\n Partner closely with ML Engineers, Data Engineers, and platform teams to support successful delivery and operation of AI systems.\n Responsibilities\n • Design and operate end-to-end ML infrastructure on Microsoft Azure, including training environments, model registries, deployment workflows, and scalable inference systems on Azure Kubernetes Service (AKS)\n• Own and evolve MLflow and Kubeflow platforms, including experiment tracking, model registry management, reproducible training workflows, and pipeline orchestration\n• Build and maintain robust CI/CD pipelines in GitLab for ML models and AI services, including validation gates, canary deployments, progressive delivery, and automated rollback strategies\n• Design scalable inference systems using AKS autoscaling, GPU scheduling, Redis caching, asynchronous processing with Azure Service Bus, and cost-aware infrastructure planning\n• Implement comprehensive monitoring and observability for ML and LLM systems, covering infrastructure metrics, latency, drift detection, token usage, quality metrics, and operational cost tracking\n• Define and enforce platform-level security controls including IAM policies, secrets management, network segmentation, audit logging, dependency scanning, and model access governance\n• Build highly available and fault-tolerant ML serving infrastructure with strong focus on scalability, disaster recovery, resilience, and platform reliability\n• Define and maintain platform SLOs for ML services, including incident response processes, postmortems, and operational improvement initiatives\n• Partner closely with ML Engineers to productionize new ML models, LLM systems, and agentic AI workflows with safe rollout and evaluation patterns\n• Optimize infrastructure utilization and operational cost across compute, GPU workloads, and LLM provider usage through batching, caching, autoscaling, and routing strategies\n• Ensure all production ML and AI services have actionable dashboards, alerts, observability standards, and operational playbooks for on-call readiness\n Requirements\n • Bachelor’s degree in Computer Science, Engineering, or a related field (preferred)\n• 5+ years of professional experience in MLOps, DevOps, SRE, Platform Engineering, or ML Infrastructure roles\n• Minimum 3 years of hands-on experience supporting production-grade ML systems and AI platforms\n• Strong hands-on experience with Microsoft Azure, including Azure Kubernetes Service (AKS), Azure Service Bus, Azure Storage, networking, identity management, and cloud cost optimization\n• Strong Kubernetes operational expertise including Helm, Ingress Controllers, autoscaling (HPA/VPA/KEDA), GPU scheduling, workload troubleshooting, and large-scale container orchestration\n• Production experience with MLflow, Kubeflow, or equivalent ML platform tooling for experiment tracking, model registries, and ML pipeline orchestration\n• Strong expertise in GitLab CI/CD or equivalent CI/CD tooling for automated deployments, validation gates, rollback workflows, and progressive delivery patterns\n• Hands-on experience with monitoring and observability platforms including Prometheus, Grafana, OpenTelemetry, Azure Monitor, Datadog, New Relic, or Elastic\n• Experience monitoring ML/LLM systems including latency, model performance, drift, token usage, infrastructure health, and operational costs\n• Strong proficiency in Python and shell scripting for automation and operational tooling\n• Experience with Infrastructure-as-Code tools such as Terraform, Bicep, or ARM templates\n• Strong troubleshooting, debugging, and incident response capabilities across distributed systems and cloud-native environments\n• Excellent written and verbal communication skills, including technical documentation, runbooks, and incident reporting\n Nice to Have\n • Experience operating production-grade LLM or Generative AI systems, including prompt versioning, evaluation frameworks, routing layers, and vector store operations\n• Experience with Azure AI Foundry, AWS AgentCore, SageMaker, or similar AI platform services\n• Exposure to GPU infrastructure and inference tooling such as NVIDIA GPU Operator, Triton Inference Server, vLLM, or TGI\n• Familiarity with model observability and evaluation platforms such as Arize, Fiddler, WhyLabs, or Evidently\n• Experience implementing security and compliance controls for enterprise ML environments\n• Experience working with vector databases, semantic search systems, or Retrieval-Augmented Generation (RAG) architectures",
"jsonld_jobposting": {
"url": "https://10pearls.applytojob.com/apply/LQxHJjfMrD/StaffSenior-Software-Consultant-Azure-AKS-MLflow-Kubeflow",
"@type": "JobPosting",
"title": "Staff/Senior Software Consultant - Azure, AKS, MLflow & Kubeflow",
"@context": "http://schema.org/",
"datePosted": "2026-05-16",
"description": "<h1><span style=\"font-size:14px;\"><span style=\"font-family:Arial, Helvetica, sans-serif;\"><strong>Company Overview</strong></span></span></h1><p><span style=\"font-size:14px;\"><span style=\"font-family:Arial, Helvetica, sans-serif;\">10Pearls is an award-winning end-to-end digital innovation company that helps businesses imagine and build the future. We are proud to announce that 10Pearls was named as winner of the Best Tech Work Culture Timmy Award in Washington DC by Tech in Motion, recognized on the Inc. 5000 Fastest-Growing Companies List, and was ranked the #1 Most Diverse Midsize Company in Greater Washington. We partner with businesses to help them transform, scale, and accelerate by adopting digital and exponential technologies. Our work has ranged from creating highly usable, secure digital experiences, mobile and software products, to helping businesses modernize through cloud adoption and development and the digitalization of their business processes. Our clientele is highly diverse, including Global 1000 enterprises, mid-market businesses, and high-growth start-ups. But those are just the facts. What makes us unique is that we have true heart and soul. We have a strong focus on a double bottom line and actively support and engage with the communities where we live and work to make the world a better place. In a nutshell, we believe in doing well, while doing good, and know how to balance the two.</span></span></p><h1><span style=\"font-size:14px;\"><span style=\"font-family:Arial, Helvetica, sans-serif;\"><strong>Role</strong></span></span></h1><p><span style=\"font-size:14px;\"><span style=\"font-family:Arial, Helvetica, sans-serif;\">10Pearls is seeking a Staff/Senior MLOps Engineer – Azure ML Platform & LLMOps to design, build, and operate production-grade machine learning and LLM infrastructure at scale. This role is ideal for an experienced MLOps engineer who can move machine learning and generative AI systems from experimentation into secure, reliable, and scalable production environments. </span></span><span style=\"font-size:14px;\"><span style=\"font-family:Arial, Helvetica, sans-serif;\">In this highly hands-on engineering role, you will lead the development of core platform capabilities across Azure-based AI systems, with a strong emphasis on performance, safety, and cost efficiency. </span></span></p><ul><li><p><span style=\"font-size:14px;\"><span style=\"font-family:Arial, Helvetica, sans-serif;\">Own key platform areas, including ML infrastructure, deployment automation, monitoring, observability, scalability, and operational excellence. </span></span></p></li></ul><ul><li><p><span style=\"font-size:14px;\"><span style=\"font-family:Arial, Helvetica, sans-serif;\">Enable fast, secure, and cost-effective ML operations across production environments. </span></span></p></li></ul><ul><li><p><span style=\"font-size:14px;\"><span style=\"font-family:Arial, Helvetica, sans-serif;\">Partner closely with ML Engineers, Data Engineers, and platform teams to support successful delivery and operation of AI systems. </span></span></p></li></ul><h1><span style=\"font-size:14px;\"><span style=\"font-family:Arial, Helvetica, sans-serif;\"><strong>Responsibilities</strong></span></span></h1><p><span style=\"font-size:14px;\"><span style=\"font-family:Arial, Helvetica, sans-serif;\">• Design and operate end-to-end ML infrastructure on Microsoft Azure, including training environments, model registries, deployment workflows, and scalable inference systems on Azure Kubernetes Service (AKS)<br>• Own and evolve MLflow and Kubeflow platforms, including experiment tracking, model registry management, reproducible training workflows, and pipeline orchestration<br>• Build and maintain robust CI/CD pipelines in GitLab for ML models and AI services, including validation gates, canary deployments, progressive delivery, and automated rollback strategies<br>• Design scalable inference systems using AKS autoscaling, GPU scheduling, Redis caching, asynchronous processing with Azure Service Bus, and cost-aware infrastructure planning<br>• Implement comprehensive monitoring and observability for ML and LLM systems, covering infrastructure metrics, latency, drift detection, token usage, quality metrics, and operational cost tracking<br>• Define and enforce platform-level security controls including IAM policies, secrets management, network segmentation, audit logging, dependency scanning, and model access governance<br>• Build highly available and fault-tolerant ML serving infrastructure with strong focus on scalability, disaster recovery, resilience, and platform reliability<br>• Define and maintain platform SLOs for ML services, including incident response processes, postmortems, and operational improvement initiatives<br>• Partner closely with ML Engineers to productionize new ML models, LLM systems, and agentic AI workflows with safe rollout and evaluation patterns<br>• Optimize infrastructure utilization and operational cost across compute, GPU workloads, and LLM provider usage through batching, caching, autoscaling, and routing strategies<br>• Ensure all production ML and AI services have actionable dashboards, alerts, observability standards, and operational playbooks for on-call readiness</span></span></p><h1><span style=\"font-size:14px;\"><span style=\"font-family:Arial, Helvetica, sans-serif;\"><strong>Requirements</strong></span></span></h1><p><span style=\"font-size:14px;\"><span style=\"font-family:Arial, Helvetica, sans-serif;\">• Bachelor’s degree in Computer Science, Engineering, or a related field (preferred)<br>• 5+ years of professional experience in MLOps, DevOps, SRE, Platform Engineering, or ML Infrastructure roles<br>• Minimum 3 years of hands-on experience supporting production-grade ML systems and AI platforms<br>• Strong hands-on experience with Microsoft Azure, including Azure Kubernetes Service (AKS), Azure Service Bus, Azure Storage, networking, identity management, and cloud cost optimization<br>• Strong Kubernetes operational expertise including Helm, Ingress Controllers, autoscaling (HPA/VPA/KEDA), GPU scheduling, workload troubleshooting, and large-scale container orchestration<br>• Production experience with MLflow, Kubeflow, or equivalent ML platform tooling for experiment tracking, model registries, and ML pipeline orchestration<br>• Strong expertise in GitLab CI/CD or equivalent CI/CD tooling for automated deployments, validation gates, rollback workflows, and progressive delivery patterns<br>• Hands-on experience with monitoring and observability platforms including Prometheus, Grafana, OpenTelemetry, Azure Monitor, Datadog, New Relic, or Elastic<br>• Experience monitoring ML/LLM systems including latency, model performance, drift, token usage, infrastructure health, and operational costs<br>• Strong proficiency in Python and shell scripting for automation and operational tooling<br>• Experience with Infrastructure-as-Code tools such as Terraform, Bicep, or ARM templates<br>• Strong troubleshooting, debugging, and incident response capabilities across distributed systems and cloud-native environments<br>• Excellent written and verbal communication skills, including technical documentation, runbooks, and incident reporting</span></span></p><h1><span style=\"font-size:14px;\"><span style=\"font-family:Arial, Helvetica, sans-serif;\"><strong>Nice to Have</strong></span></span></h1><p><span style=\"font-size:14px;\"><span style=\"font-family:Arial, Helvetica, sans-serif;\">• Experience operating production-grade LLM or Generative AI systems, including prompt versioning, evaluation frameworks, routing layers, and vector store operations<br>• Experience with Azure AI Foundry, AWS AgentCore, SageMaker, or similar AI platform services<br>• Exposure to GPU infrastructure and inference tooling such as NVIDIA GPU Operator, Triton Inference Server, vLLM, or TGI<br>• Familiarity with model observability and evaluation platforms such as Arize, Fiddler, WhyLabs, or Evidently<br>• Experience implementing security and compliance controls for enterprise ML environments<br>• Experience working with vector databases, semantic search systems, or Retrieval-Augmented Generation (RAG) architectures</span></span></p>",
"jobLocation": {
"@type": "Place",
"address": {
"@type": "PostalAddress",
"postalCode": "",
"addressRegion": "",
"addressLocality": "Karachi, Lahore, Islamabad"
}
},
"validThrough": "2026-08-14",
"uniqueJobCode": "job_20260516101234_TTZFZDOW0EMMJ8ML",
"employmentType": "FULL_TIME",
"hiringOrganization": {
"logo": "https://s3.amazonaws.com/resumator/customer_20200617142926_FIFOKRA3QXMMR03Z/logos/20230316120547_10plogo_2_x50.png",
"name": "10Pearls",
"@type": "Organization",
"sameAs": "https://10pearls.com/"
},
"experienceRequirements": "Experienced"
}
},
"list_job": {
"id": "LQxHJjfMrD",
"title": "Staff/Senior Software Consultant - Azure, AKS, MLflow & Kubeflow",
"detailUrl": "https://10pearls.applytojob.com/apply/jobs/details/LQxHJjfMrD?&"
},
"detail_errors": []
}Get this page with API
Rendered from the bluedoor Job Postings API. Reproduce it:
GET https://api.bluedoor.sh/job-postings/v1/jobs/6a13c861c20d95ec00ab3288d248643ae25125d5?include=descriptionJSONGET https://api.bluedoor.sh/job-postings/v1/orgs/e69a6fcc-024f-4d99-ada4-5630f4f934d3JSONGET https://api.bluedoor.sh/job-postings/v1/sources/ecc85604-d4af-4971-b467-d3e9f14798bcJSONGET https://api.bluedoor.sh/job-postings/v1/jobs/6a13c861c20d95ec00ab3288d248643ae25125d5/eventsJSON