
Lead Problem Manager - Platform Azure
- Warszawa, mazowieckie
- Stała
- Pełny etat
- Lead deep-dive investigations into major and recurring issues. Facilitate root cause analysis (RCA) sessions, coordinate across engineering and operations, and maintain thorough problem records.
- Ensure RCAs are completed and delivered within agreed service level agreements (SLAs) to meet compliance and stakeholder expectations.
- Recognize systemic risks and take steps to mitigate them through changes in infrastructure, updates to configurations, or automation solutions. Prevent recurrence of high-impact issues.
- Develop and maintain integrations (in collaboration with our Service Enablement team) across incident, change, and problem management systems. Ensure seamless data flow and traceability from incidents to RCAs and preventive actions.
- Partner with Service Enablement specialists, SREs and platform engineers to design and implement automation that detects, mitigates, or resolves known errors and platform vulnerabilities.
- Create and maintain clear RCA documentation, known error databases, and self-service materials to upskill engineering and support teams.
- Deliver visibility into recurring issues, mean time to resolution (MTTR), and problem trends via dashboards and reports. Use insights to influence prioritization of technical debt and improvement initiatives.
- Drive post-incident reviews and retrospectives. Champion a culture of learning and accountability through feedback loops and operational best practices.
- 5+ years of experience in Problem Management, Service Management in general (ITIL) or platform operations, beneficially in cloud environments.
- Well-versed in ITIL practices, especially in problem, incident, and change management
- Experience with cloud monitoring and alerting platforms (e.g. Opsgenie, Grafana, Prometheus)
- Proficiency in RCA methodologies (e.g., 5 Whys, Fishbone, Pareto) and problem tracking systems (e.g., Salesforce, Cadalys, ServiceNow, Jira)
- Familiarity with automation and self-healing frameworks for cloud operations
- Experience with reporting and data analysis tools (e.g., Power BI, Azure Log Analytics)
- Proficient communication, facilitation, and cross-functional collaboration skills
- Ability to prioritize effectively in a diverse, global, and multicultural environment.