MLOps Engineer @ RemoDevs

Warszawa, mazowieckie
Stała
Pełny etat

16 dni temu

Good knowledge of cloud infrastructure (AWS, Azure, or GCP) and container orchestration (Docker, Kubernetes, ECS/EKS).
Hands-on experience running AI/ML services in production.
Experience with CI/CD pipelines for AI, LLM workflows, and model deployments.
Knowledge of distributed AI serving frameworks and inference optimization.
Understanding of monitoring, observability, and incident response for AI.
Experience setting up AI system health metrics, dashboards, and alerts.
Awareness of AI security, data protection, and compliance needs.
Interest in learning and using new AIOps and AI observability tools.

OverviewWe are a leader in AI-powered business operations. Our goal is to make companies work better and faster by using smart technology. We help improve efficiency, simplify workflows, and create new growth opportunities, especially in private capital markets.Our ecosystem has three main parts:

PaaS (Platform as a Service): Our core AI platform that improves workflows, finds insights, and supports value creation across portfolios.
SaaS (Software as a Service): A cloud platform that delivers strong performance, intelligence, and execution at scale.
S&C (Solutions and Consulting Suite): Modular technology playbooks that help companies manage, grow, and improve performance.

With more than 10 years of experience supporting fast-growing companies and private equity-backed platforms, we know how to turn technology into a real business advantage.About the RoleWe are looking for an MLOps / AIOps Engineer to manage the deployment, running, and monitoring of AI services in production. This role combines infrastructure engineering and AI systems. You will make sure our AI-powered APIs, RAG pipelines, MCPs, and agent services work safely, reliably, and at scale. You will work closely with ML Engineers, Python Developers, and AI Architects to design strong infrastructure and workflows for distributed AI applications.Key Responsibilities

Create and maintain infrastructure-as-code for AI services (Terraform, Pulumi, AWS CDK).
Build and run CI/CD pipelines for AI APIs, RAG pipelines, MCP services, and LLM agent workflows.
Set up monitoring and alerting for AI systems and LLM observability.
Track metrics like latency, error rates, drift detection, and hallucination monitoring.
Improve inference workloads and manage distributed AI serving tools (Ray Serve, BentoML, vLLM, Hugging Face TGI).
Work with ML Engineers and Python Developers to define safe, scalable, and automated deployment processes.
Follow standards for AI system security, data governance, and compliance.
Keep up to date with new AIOps and LLM observability tools and best practices.

Required Skills & Experience

Good knowledge of cloud infrastructure (AWS, Azure, or GCP) and container orchestration (Docker, Kubernetes, ECS/EKS).
Hands-on experience running AI/ML services in production.
Experience with CI/CD pipelines for AI, LLM workflows, and model deployments.
Knowledge of distributed AI serving frameworks and inference optimization.
Understanding of monitoring, observability, and incident response for AI.
Experience setting up AI system health metrics, dashboards, and alerts.
Awareness of AI security, data protection, and compliance needs.
Interest in learning and using new AIOps and AI observability tools.

Why Join Us?We value creative problem solvers who learn quickly, enjoy teamwork, and always aim higher. We work hard, but we also enjoy what we do and create a fun environment together. ,[Create and maintain infrastructure-as-code for AI services (Terraform, Pulumi, AWS CDK)., Build and run CI/CD pipelines for AI APIs, RAG pipelines, MCP services, and LLM agent workflows., Set up monitoring and alerting for AI systems and LLM observability., Track metrics like latency, error rates, drift detection, and hallucination monitoring., Improve inference workloads and manage distributed AI serving tools (Ray Serve, BentoML, vLLM, Hugging Face TGI)., Work with ML Engineers and Python Developers to define safe, scalable, and automated deployment processes., Follow standards for AI system security, data governance, and compliance., Keep up to date with new AIOps and LLM observability tools and best practices.] Requirements: Cloud, Container, AI, MLOps, CI/CD

No Fluff Jobs

Aplikuj