Data Engineer

Warszawa, mazowieckie
Stała
Pełny etat

17 godzin temu
Aplikuj teraz

Fetcherr, experts in deep learning, algo, e-commerce, and digitization, is disrupting traditional systems with its cutting-edge AI technology. At its core is the Large Market Model (LMM), an adaptable AI engine that forecasts demand and market trends with precision, empowering real-time decision-making. Specializing initially in the airline industry, Fetcherr aims to revolutionize industries with dynamic AI-driven solutions.Fetcher is seeking a Data Engineer to build large-scale optimized data pipelines using cutting-edge technology and tools. We're looking for someone with advanced Python skills and a deep understanding of memory and CPU optimization in distributed environments. This is a high-impact role with responsibilities that directly influence the company's strategic decisions and data-driven initiatives.Key Responsibilities:

Design and build scalable, cross-client data pipelines and transformation workflows using modern ELT tools, ensuring high performance, reusability, and cost-efficiency across diverse data products. Leverage orchestration frameworks like Dagster to manage dependencies, retries, and monitoring.
Develop and operate distributed data processing systems that handle large-scale workloads efficiently, adapting to dynamic data volumes and infrastructure constraints. Apply frameworks such as Dask or Spark to unlock parallelism and optimize compute resource utilization.
Deliver robust, maintainable Python solutions by applying sound software engineering principles, including modular architecture, reusable components, and shared libraries. Ensure code quality and operational resilience through CI/CD best practices and containerized deployments.
Collaborate with data scientists, engineers, and product teams to deliver validated, analytics-ready data that aligns with business requirements. Support team-wide adoption of data modeling standards and efficient data access patterns.
Proactively safeguard data quality and reliability by implementing anomaly detection, validation frameworks, and statistical or ML-based techniques to forecast trends and catch regressions early. Enforce backward compatibility and data contract integrity across pipeline changes.
Document workflows, interfaces, and architectural decisions in a clear and structured manner to support long-term maintainability. Maintain up-to-date data contracts, system runbooks, and onboarding guides for effective cross-team collaboration.

You’ll be a great fit if you have...

4+ years of hands-on experience building and maintaining production-grade data pipelines at scale
Expertise in Python, with strong grasp of data structures, performance optimization, and modern data processing libraries (e.g. pandas, NumPy)
Practical experience with distributed computing frameworks such as Dask or Spark, including performance tuning and memory management
Proficiency in SQL, with a deep understanding of query optimization, analytical functions, and cost-efficient query design
Experience designing and managing transformation logic using dbt, with a focus on modular development, testability, and scalable performance across large datasets
Strong understanding of ETL/ELT architecture, data modeling principles, and data validation
Familiarity with cloud platforms (e.g. GCP, AWS) and modern data storage formats (e.g. Parquet, BigQuery, Delta Lake)
Experience with CI/CD workflows, Docker, and orchestrating workloads in Kubernetes

Nice to Have :

Experience with Dagster or similar workflow orchestration tools
Familiarity with automated testing frameworks for data workflows, such as pytest, Great Expectations, Pandera, Behave, Hamilton, dbt tests or similar
Deep interest in performance optimization and vectorized computation, especially in Dask/pandas -based pipelines
Ability to design cross-client, cost-efficient solutions that prioritize scalability, modularity, and minimal resource consumption
Strong grounding in software architecture best practices, including adherence to SOLID, YANGI, KISS, DRY, CoC, OOP, CoI, LOD principles and code reuse through shared libraries (strong pro)

Fetcherr