The Role
Cloud billing data is inherently messy — multi-cloud, multi-structure, with billing models that don’t normalize cleanly and usage signals that don’t behave. You’ll own the data infrastructure that untangles this: the pipelines, models, and backend services that turn raw AWS, GCP, and Azure billing exports into reliable product capabilities. This is a production ownership role — architecture, code, monitoring, and stability all land on you.
About the Product
The platform ingests and processes large-scale cloud billing, usage, and operational data across AWS, Azure, and GCP — and turns it into cost visibility, recommendations, forecasting, and anomaly detection for enterprise customers. The core engineering challenge is scale and reliability: cloud billing structures are complex, volumes are high, and the data directly drives product decisions and customer spend outcomes. This is a product company, not a consulting engagement — the infrastructure you build runs in production and affects real customers.
Technology Stack: The platform works with cloud billing and usage data from AWS, Azure, and GCP — processed through Python and SQL, orchestrated with Airflow, and running across modern data platforms including Spark, ClickHouse, BigQuery, Databricks, and Snowflake. The stack was chosen for scale and cost-awareness: the same discipline applied to customer cloud spend applies internally. AWS is the primary cloud environment.
What You’ll Be Doing
- Design and maintain production ETL/ELT pipelines that ingest, normalize, and model cloud billing and usage data at scale across multiple cloud providers.
- Own the performance, reliability, and cost-efficiency of the data platform — query optimization, storage architecture, processing costs, and production stability.
- Build backend data services in Python and SQL that power product capabilities: cost recommendations, usage forecasting, anomaly detection, and customer-facing insights.
- Work with cloud billing source data including AWS CUR, Azure Cost Management exports, and GCP billing exports — including complex structures like marketplace billing and partner models.
- Architect and improve orchestration flows using Airflow or equivalent, across platforms such as Spark, Databricks, Snowflake, BigQuery, or ClickHouse.
- Own data quality, monitoring, and observability — not just the pipeline, but what comes out of it.
- Review architecture and code, mentor other engineers, and drive engineering standards within the data domain.
- Use AI/LLM tools (Cursor, GitHub Copilot, Claude, ChatGPT, or equivalent) as a daily development accelerator — for coding, debugging, testing, documentation, and technical research — while maintaining full engineering ownership of the output.
What We Expect
Must-have
- 7+ years in data engineering, data platform engineering, or backend engineering with heavy data focus.
- Production-grade Python and SQL — not notebooks, not scripts. Code that runs reliably in production at scale.
- Strong experience building and maintaining ETL/ELT pipelines with real ownership: design, deployment, monitoring, and incident response.
- Experience with large-scale data processing using Spark or equivalent frameworks.
- Workflow orchestration with Apache Airflow or similar.
- Cloud experience, primarily AWS. GCP and/or Azure are a strong advantage.
- Hands-on experience with at least one cloud data warehouse or query engine: Redshift, Athena, BigQuery, Snowflake, Databricks, ClickHouse, or equivalent.
- Strong understanding of data modeling, data quality, and production monitoring.
- Demonstrated experience optimizing query performance, storage usage, or infrastructure costs.
- Ability to lead technical discussions, own domains end to end, and mentor other engineers.
- Hands-on experience using AI/LLM tools as part of the software development workflow — and the engineering judgment to validate what they produce.
Nice to have
- Experience with cloud billing data specifically: AWS CUR, Azure Cost Management, GCP billing exports, marketplace billing, or partner billing models.
- Background in FinOps, cloud cost optimization, or usage-based billing data.
- Experience building data products: recommendations, forecasting flows, dashboards, or anomaly-detection pipelines.
- Experience leading a small team or acting as a technical owner for a data domain.
- AWS services experience: S3, Glue, Lambda, Athena, Redshift, EMR, ECS, EKS.
- Track record in a startup or product-company environment.
Why This Role Is Worth Your Time
- The domain has real technical depth. Cloud billing data is genuinely complex — multi-source, inconsistently structured, high-volume, and directly tied to business outcomes. This isn’t CRUD pipelines over clean data.
- You’ll own infrastructure that shapes the product. The data platform isn’t a support function — it’s the core layer that makes cost recommendations, forecasting, and anomaly detection possible. What you build determines what the product can do.
- AI tooling is a first-class expectation, not a novelty. The team uses AI tools as daily productivity multipliers. You won’t be explaining why you use Cursor or Claude — you’ll be expected to.
- End-to-end ownership. From design to production behavior, you’re accountable. The role suits engineers who want to see their decisions run in the real world, not hand off to someone else.