Senior DevOps Engineer (NXJ-171)

Category

AWS · Cybersecurity · DevOps

Type

Remote

Location

Ukraine

The Role

The platform runs a large-scale, multi-tenant cyber intelligence system fully operated on AWS, and the infrastructure roadmap is yours to own. The core technical challenge is moving past traditional infrastructure maintenance to design resilience, cost efficiency, and automated intelligence directly into a high-scale environment. This role demands an engineer who will define the automation and reliability goals for the entire R&D organization, utilizing both infrastructure-as-code and emerging AI-driven operations.

About the Product

The product is a global cyber intelligence SaaS platform serving leading enterprises worldwide. It functions as a high-scale, multi-tenant data platform that requires continuous uptime to deliver threat intelligence insights. The system handles complex data streams, meaning the underlying infrastructure must remain highly resilient, secure, and optimized for rapid data processing under real-world security demands.

Technology Stack: The entire platform is built on AWS, heavily relying on EKS for container orchestration and Karpenter for autoscaling. The data and messaging backbone utilizes Kafka, Redis, RDS, S3, and Lambda functions. Everything is managed as code via Terraform, Helm, and CloudFormation, while delivery pipelines run through GitHub Actions, Jenkins, or ArgoCD. Observability is handled natively and via open-source tools like Prometheus, Grafana, Loki, ELK, and CloudWatch, with an increasing focus on integrating AIOps for anomaly detection and alert classification.

What You’ll Be Doing

  • Define the long-term architecture, automation strategy, and reliability goals for the entire R&D infrastructure footprint
  • Optimize the scalability and cost efficiency of the production Kubernetes (EKS) clusters using advanced scaling mechanisms
  • Architect and scale the underlying data streaming and caching layers, focusing on Kafka scaling and Redis clustering
  • Build out advanced observability frameworks using Prometheus and Grafana to establish proactive alerts, SLOs, and anomaly detection
  • Integrate AI-assisted tooling into daily operations to drive automated incident remediation and predictive cost-optimization
  • Establish production-readiness standards by leading root-cause analyses, capacity planning, and incident response operations

What We Expect

Must-have

  • 5+ years of experience in DevOps, SRE, or Infrastructure roles supporting production systems
  • Proven track record of managing high-scale, multi-tenant SaaS environments running on AWS
  • Deep production-level experience with Kubernetes (EKS) architecture, container orchestration, and Karpenter
  • Strong hands-on proficiency with Terraform, Helm, and CI/CD automation pipelines
  • Solid scripting capabilities in Python, Bash, or Go for custom automation tools
  • Practical experience implementing monitoring, logging, and alerting stacks
  • Experience utilizing or integrating AI-assisted tools to improve observability or developer productivity

Nice-to-have

  • Domain experience within cybersecurity or threat intelligence industries
  • Foundational knowledge of AI/ML data pipelines or predictive AIOps concepts
  • Practical experience applying FinOps frameworks to large-scale cloud infrastructure
  • Deep knowledge of AWS service-level tuning

Why This Role Is Worth Your Time

  • You are taking ownership of the critical infrastructure behind a global cyber intelligence platform, where your architectural decisions directly impact live enterprise data security
  • This isn’t a “ticket-taking” operations job—you will have the mandate to experiment with and deploy actual AI-driven operations tools for auto-remediation and prediction
  • You will collaborate with mature architects in a highly technical R&D group that prioritizes advanced infrastructure engineering over manual workarounds

Apply for this position