Newxel Senior DevOps Engineer (NXJ-171)

Senior DevOps Engineer (NXJ-171)

Type

Remote

Location

Ukraine

The Role

The platform runs a large-scale, multi-tenant cyber intelligence system fully operated on AWS, and the infrastructure roadmap is yours to own. The core technical challenge is moving past traditional infrastructure maintenance to design resilience, cost efficiency, and automated intelligence directly into a high-scale environment. This role demands an engineer who will define the automation and reliability goals for the entire R&D organization, utilizing both infrastructure-as-code and emerging AI-driven operations.

About the Product

The product is a global cyber intelligence SaaS platform serving leading enterprises worldwide. It functions as a high-scale, multi-tenant data platform that requires continuous uptime to deliver threat intelligence insights. The system handles complex data streams, meaning the underlying infrastructure must remain highly resilient, secure, and optimized for rapid data processing under real-world security demands.

Technology Stack: The entire platform is built on AWS, heavily relying on EKS for container orchestration and Karpenter for autoscaling. The data and messaging backbone utilizes Kafka, Redis, RDS, S3, and Lambda functions. Everything is managed as code via Terraform, Helm, and CloudFormation, while delivery pipelines run through GitHub Actions, Jenkins, or ArgoCD. Observability is handled natively and via open-source tools like Prometheus, Grafana, Loki, ELK, and CloudWatch, with an increasing focus on integrating AIOps for anomaly detection and alert classification.

What You’ll Be Doing

Define the long-term architecture, automation strategy, and reliability goals for the entire R&D infrastructure footprint
Optimize the scalability and cost efficiency of the production Kubernetes (EKS) clusters using advanced scaling mechanisms
Architect and scale the underlying data streaming and caching layers, focusing on Kafka scaling and Redis clustering
Build out advanced observability frameworks using Prometheus and Grafana to establish proactive alerts, SLOs, and anomaly detection
Integrate AI-assisted tooling into daily operations to drive automated incident remediation and predictive cost-optimization
Establish production-readiness standards by leading root-cause analyses, capacity planning, and incident response operations

What We Expect

Must-have

5+ years of experience in DevOps, SRE, or Infrastructure roles supporting production systems
Proven track record of managing high-scale, multi-tenant SaaS environments running on AWS
Deep production-level experience with Kubernetes (EKS) architecture, container orchestration, and Karpenter
Strong hands-on proficiency with Terraform, Helm, and CI/CD automation pipelines
Solid scripting capabilities in Python, Bash, or Go for custom automation tools
Practical experience implementing monitoring, logging, and alerting stacks
Experience utilizing or integrating AI-assisted tools to improve observability or developer productivity

Nice-to-have

Domain experience within cybersecurity or threat intelligence industries
Foundational knowledge of AI/ML data pipelines or predictive AIOps concepts
Practical experience applying FinOps frameworks to large-scale cloud infrastructure
Deep knowledge of AWS service-level tuning

Why This Role Is Worth Your Time

You are taking ownership of the critical infrastructure behind a global cyber intelligence platform, where your architectural decisions directly impact live enterprise data security
This isn’t a “ticket-taking” operations job—you will have the mandate to experiment with and deploy actual AI-driven operations tools for auto-remediation and prediction
You will collaborate with mature architects in a highly technical R&D group that prioritizes advanced infrastructure engineering over manual workarounds