AI and ML Ops Engineer (all genders)
Job-ID: 15955; Location(s): Krakow
We are seeking a highly skilled and experienced AI & ML DevOps Engineer (all genders) to join our dynamic technology team. In this crucial role, you will be responsible for designing, implementing, managing, and optimizing our cloud infrastructure, primarily on Microsoft Azure, with a strong emphasis on Kubernetes (AKS). You will champion automation, build robust CI/CD pipelines, enhance system reliability and scalability, and collaborate closely with development teams to streamline the software delivery lifecycle.
This position is exclusively located at our IT Site in Krakow, Poland.
Key Responsibilities:
- Infrastructure as Code (IaC): Design, build, and maintain resilient, scalable, and secure Azure infrastructure using IaC principles (e.g., Terraform, Bicep, ARM Templates)
- Kubernetes Management: Deploy, manage, scale, and troubleshoot Kubernetes clusters (specifically Azure Kubernetes Service - AKS). Implement best practices for cluster security, monitoring, networking, and governance
- CI/CD Pipeline Automation: Develop, manage, and optimize CI/CD pipelines (using tools like Azure DevOps Pipelines, GitHub Actions, Jenkins, GitLab CI, Scalr) to enable rapid, reliable, and automated software releases
- Azure Cloud Services: Manage and configure a wide range of Azure services, including Compute (VMs, App Services), Storage (Blob, Files, Disks), Networking (VNet, Load Balancers, Application Gateway, Firewall), Databases (Azure SQL, Cosmos DB), Monitoring (Azure Monitor, Log Analytics, Application Insights), and Security (Azure AD, Key Vault, Security Center)
- Containerization: Work extensively with Docker for containerizing applications and manage container registries (e.g., Azure Container Registry)
- Monitoring, Logging & Alerting: Implement and manage comprehensive monitoring, logging, and alerting solutions (e.g., Azure Monitor, Prometheus, Grafana, ELK/EFK Stack) to ensure system health, performance, and availability
- Automation & Scripting: Automate manual operational tasks using scripting languages (e.g., PowerShell, Bash, Python)
- Collaboration & Support: Work closely with software developers, QA engineers, and other IT staff to troubleshoot issues, optimize application performance, and ensure smooth deployments. Provide operational support and participate in on-call rotations if required
- Security: Implement and enforce security best practices across the infrastructure and deployment pipelines (DevSecOps principles). Manage secrets and ensure compliance requirements are met
- Cost Optimization: Monitor Azure resource utilization and implement strategies for cost optimization
- Documentation: Maintain clear and accurate documentation for infrastructure configurations, processes, and procedures
Required Qualifications:
- Bachelor's degree in Computer Science, Engineering, or a related field
- 3-5 years of experience in a DevOps, SRE, or similar infrastructure-focused role
- Proven hands-on experience managing production workloads on Microsoft Azure
- Deep understanding and practical experience with Kubernetes (preferably AKS), including cluster administration, Helm charts, ingress controllers, and networking
- Strong experience designing, building, and maintaining automated CI/CD pipelines
- Proficiency with Infrastructure as Code tools (Terraform strongly preferred; Bicep/ARM acceptable)
- Solid experience with containerization technologies (Docker)
- Proficiency in scripting languages such as PowerShell, Bash, or Python
- Experience with monitoring and logging tools (e.g., Azure Monitor, Prometheus, Grafana, ELK)
- Solid understanding of networking concepts (TCP/IP, DNS, VPNs, Load Balancing, Firewalls)
- Familiarity with version control systems (Git)
- Excellent problem-solving and troubleshooting skills
- Strong communication and collaboration skills
- Azure Certifications (e.g., AZ-400: Designing and Implementing Microsoft DevOps Solutions, AZ-104: Microsoft Azure Administrator)
- Kubernetes Certifications (e.g., CKA, CKAD)
- Experience with other cloud platforms (AWS, GCP)
- Experience with configuration management tools (e.g., Ansible, Chef, Puppet)
- Experience implementing DevSecOps practices
- Familiarity with service mesh technologies (e.g., Istio, Linkerd)
- Experience managing databases (SQL and NoSQL)
What We Offer
- A dynamic working environment with exciting projects in the field of artificial intelligence and machine learning
- The opportunity to utilize innovative technologies and actively contribute to their further development
- A motivated team looks forward to working with you. Our strong team spirit helps us achieve our common goals
- Stability and career growth: As a growing family-owned company with an international focus, Viega offers both
- Comprehensive onboarding and training through our Viega Academy to support you in your role and personal development
Your contact person:
Anne Ferchau - Anne.Ferchau@viega.de - +49 (2722) 61 - 5893