Team
Join the Global Cloud Services organization as the founding member of our Cloud Analytics & FinOps Engineering Platform team. You will be instrumental in establishing the technical foundation and architectural direction for ServiceNow's next-generation FinOps governance platform.
Role
We are building a modern, secure, and highly scalable multi-cloud data platform infrastructure powering next-generation analytics to support ServiceNow's Cloud and AI growth. As our Senior Staff DevOps Engineer for Cloud Analytics & FinOps Engineering Platform, you will architect, secure, and operationalize our hybrid cloud data platform infrastructure spanning AWS, GCP, Azure, and on-premises systems. You will have
ownership over CI/CD pipelines, infrastructure-as-code, platform security, cost optimization, observability, and data source integrations across our complex ecosystem while navigating ServiceNow's enterprise infrastructure standards and compliance requirements.
This is a unique opportunity to build enterprise-grade platform infrastructure from the ground up, establish DevOps best practices for modern data platforms, and work within a Fortune 500 enterprise environment with global scale requirements.
What you get to do in this role:
Platform Infrastructure & Architecture
- Design and implement secure, scalable Kubernetes clusters across AWS EKS, GCP GKE, and Azure AKS supporting complex data platform workloads.
- Architect hybrid cloud infrastructure with unified management and governance, building infrastructure-as-code solutions using Terraform, AWS CDK, and CloudFormation for repeatable deployments.
- Establish multi-cloud networking including VPC design, cross-cloud connectivity, Transit Gateway configurations, and secure service mesh implementations while navigating ServiceNow enterprise standards and approval processes.
Security & Compliance
- Implement comprehensive security frameworks across multi-cloud data platform stack adhering to enterprise security standards.
- Design identity and access management across cloud providers following principle of least privilege, orchestrate secrets management using cloud-native solutions, and establish security scanning for container images and infrastructure.
- Ensure compliance with SOC2, FedRAMP, and regulatory requirements while working with security teams to implement platform controls and data governance.
CI/CD Pipeline Engineering & GitOps
- Design sophisticated CI/CD pipelines using Jenkins, GitHub Actions, TeamCity, and Argo CD for GitOps workflows.
- Manage artifact repositories with automated image scanning and promotion, create Helm charts for complex data platform services (Trino, Airflow, Lightdash, Grafana), and establish automated testing pipelines for infrastructure changes with drift detection and remediation.
Observability & Site Reliability Engineering
- Architect comprehensive monitoring using Grafana, Prometheus, and CloudWatch with advanced alerting and incident response frameworks.
- Design SLIs/SLOs/SLAs for data platform services with error budget management, establish SRE practices including toil reduction and capacity planning, and create operational dashboards for platform health and performance metrics.
- Implement automated remediation workflows and capacity forecasting with predictive analytics.
Data Platform Operations & Integration
- Design secure data ingestion pipelines from disparate systems across multi-cloud and on-premises environments.
- Implement data source connectors for billing systems, ServiceNow internal systems, SaaS platforms, and ML platforms.
- Manage hybrid cloud connectivity and orchestrate complex data workflows using Apache Airflow with high availability across multiple cloud environments.
Platform Automation & Developer Experience
- Implement automated scaling and resource management across cloud providers.
- Establish Cloud Development Environment (CDE) platform using Coder to provision on-demand development workspaces via Terraform templates for global distributed teams, with enterprise compliance and cost optimization.
Enterprise Navigation & Global Operations
- Work within ServiceNow enterprise processes for technology approvals and infrastructure changes.
- Mentor junior engineers across global time zones on SRE best practices, establish operational runbooks for 24/7 platform support with automated incident response, and implement SRE organizational practices including error budget policies and reliability reviews.