
MLOps Engineer @ RemoDevs
- Warszawa, mazowieckie
- Stała
- Pełny etat
- Good knowledge of cloud infrastructure (AWS, Azure, or GCP) and container orchestration (Docker, Kubernetes, ECS/EKS).
- Hands-on experience running AI/ML services in production.
- Experience with CI/CD pipelines for AI, LLM workflows, and model deployments.
- Knowledge of distributed AI serving frameworks and inference optimization.
- Understanding of monitoring, observability, and incident response for AI.
- Experience setting up AI system health metrics, dashboards, and alerts.
- Awareness of AI security, data protection, and compliance needs.
- Interest in learning and using new AIOps and AI observability tools.
- PaaS (Platform as a Service): Our core AI platform that improves workflows, finds insights, and supports value creation across portfolios.
- SaaS (Software as a Service): A cloud platform that delivers strong performance, intelligence, and execution at scale.
- S&C (Solutions and Consulting Suite): Modular technology playbooks that help companies manage, grow, and improve performance.
- Create and maintain infrastructure-as-code for AI services (Terraform, Pulumi, AWS CDK).
- Build and run CI/CD pipelines for AI APIs, RAG pipelines, MCP services, and LLM agent workflows.
- Set up monitoring and alerting for AI systems and LLM observability.
- Track metrics like latency, error rates, drift detection, and hallucination monitoring.
- Improve inference workloads and manage distributed AI serving tools (Ray Serve, BentoML, vLLM, Hugging Face TGI).
- Work with ML Engineers and Python Developers to define safe, scalable, and automated deployment processes.
- Follow standards for AI system security, data governance, and compliance.
- Keep up to date with new AIOps and LLM observability tools and best practices.
- Good knowledge of cloud infrastructure (AWS, Azure, or GCP) and container orchestration (Docker, Kubernetes, ECS/EKS).
- Hands-on experience running AI/ML services in production.
- Experience with CI/CD pipelines for AI, LLM workflows, and model deployments.
- Knowledge of distributed AI serving frameworks and inference optimization.
- Understanding of monitoring, observability, and incident response for AI.
- Experience setting up AI system health metrics, dashboards, and alerts.
- Awareness of AI security, data protection, and compliance needs.
- Interest in learning and using new AIOps and AI observability tools.
No Fluff Jobs