Please Note: This position will include supporting our US Federal & Public Sector customers.
“This position requires passing a ServiceNow background screening, USFedPASS (US Federal Personnel Authorization Screening Standards). This includes a credit check, criminal/misdemeanor check and taking a drug test. Any employment is contingent upon passing the screening. Due to Federal requirements, only US citizens, US naturalized citizens or US Permanent Residents, holding a green card, will be considered.
Keep the ServiceNow cloud running. Write the code that makes it automatic.
Cloud Network Services (CNS) designs, delivers, and operates the global infrastructure behind our products—every data center, every network path, every watt. When something’s hard to scale or keep reliable, we don’t throw people at the problem—we engineer it away with software. You’ll join Network Reliability & Resiliency (NR2): a diverse crew of network, software, hardware, and operations pros who reduce mean time to mitigate and remediate by building automation, not runbooks. We partner across Global Cloud Services (GCS) to deliver safety, security, and seemingly infinite capacity at the lowest possible cost—and we ship boldly, own outcomes end‑to‑end, and learn like mad. If you’re a builder who loves production, incidents, and code that turns chaos into consistency, this is your team.
Location: Flexible within U.S. time zones with periodic data center/site visits as needed
On‑call: Rotating pager for production incidents (including some weekends) with strong focus on automation to reduce pages
Why this role exists
Our network is expanding in scale and complexity. We need a senior leader / builder who can own design through operations, write the software that deploys and heals the network, and coach others to do the same. You’ll turn tribal knowledge into pipelines, toil into bots, and postmortems into platform features.
What you get to do in this role:
- Own the lifecycle of production IP networks—design, engineering, implementation, and operations escalation—with a bias to automate everything you touch.
- Architect and evolve highly available, hyper‑scale network segments; drive from concept through launch and long‑term reliability.
- Lead with code: build tooling in Python/Go/Bash; ship IaC (Terraform/Ansible) and CI/CD to make changes safe, fast, and observable. (The team already leverages PowerShell, shell scripting, Perl/Python, Ansible, Terraform, and CI/CD via DevOps/Bitbucket/GitHub—your job is to raise the bar.)
- Operational excellence: instrument SLIs/SLOs, shrink MTTR, and eliminate classes of incidents via automation, config hygiene, and safe change patterns.
- Incident leadership: run or advise bridges for complex outages; perform detailed diagnosis; drive deep, blameless learnings into automated prevent/restore paths.
- Change management at scale: review, test, and rollout network changes with progressive delivery and automated verification.
- Mentor & multiply: coach engineers across networking and software; model high standards in design docs, code reviews, and post‑incident reviews.
Tech you’ll touch
- Protocols & platforms: BGP, OSPF, ISIS, HSRP/VRRP, IPsec, SNMP; deep TCP/IP analysis (Wireshark, etc.); vendor stacks incl. Cisco IOS and JunOS; Cisco ASA. Container based NGINX ADC (Application Delivery Controller) with Linux and Docker.
- Observability & monitoring: Splunk, Cacti, ThousandEyes (plus the tooling you’ll help us build).
- Cloud & app surfaces: Azure core (Compute/Storage/Networking) and Web Apps; multi‑cloud familiarity a plus.
- Automation & pipelines: PowerShell, Python, Go, Terraform/Ansible; CI/CD with DevOps/Bitbucket/GitHub