Lead & Senior Site Reliability Engineers
5 days ago
Xero is a beautiful, easy-to-use platform that helps small businesses and their accounting and bookkeeping advisors grow and thrive.
At Xero, our purpose is to make life better for people in small business, their advisors, and communities around the world. This purpose sits at the centre of everything we do. We support our people to do the best work of their lives so that they can help small businesses succeed through better tools, information and connections. Because when they succeed they make a difference, and when millions of small businesses are making a difference, the world is a more beautiful place.
**About the team**
Xero’s Incident and Problem Management team are a part of the Site Reliability Engineering (SRE) organization and are responsible for the build, delivery and ongoing maintenance of robust process and tooling around Incident management.
The team is responsible for driving enduring reliability at Xero through robust, consistent and fast response to high severity incidents. They are responsible for building a world class process and ensuring that process matures as the demands of the business grows.
**About the roles**
We're looking to hire multiple roles at Lead Engineer & Senior Engineer level. These positions require experienced SRE professionals with a strong technical background, deep experience in SRE, a passion for building and delivering robust processes, and extensive experience of leading technical response to high severity cloud issues.
They will drive best practice across the business and contribute to the ongoing transformation of the Xero SRE culture. As expert communicators, they will lead technical discussions to identify and track actions associated with and identified during incident situations.
Across our SRE function, we're looking for those who are keen to deep dive into causes of incidents and proactively examine the potential causes of future incidents; working with engineering teams to remove the risk of that failure scenario. Ultimately building playbooks and automation to ensure quick and effective responses. In addition, provide ongoing training across the business to ensure the process is well understood and adhered to.
These roles will form the backbone of a new team, providing a Technical Duty Officer (TDO) function within the business. TDO’s are incident commanders who use SRE skillsets to drive fast mitigation and enduring resolution of impactful events.
**What you'll do**:
- Own the incident management process, ensuring it drives enduring reliability across all products and services within Xero.
- Provide expert leadership during critical outages, coordinating multiple teams to ensure streamlined decision-making and quick resolution.
- Lead and advocate for the transformation to a world-leading SRE organization, promoting SRE principles within the Engineering Department.
- Promote a customer-focused approach by addressing and mitigating global customer environment issues, and fostering a culture of continuous learning and technical excellence within the SRE team.
- Develop and implement scalable process frameworks and observability strategies to ensure rapid problem diagnosis, response, and service reliability.
- Collaborate with product teams to thoroughly analyze failures and integrate insights to improve service reliability, scalability, and operational efficiency.
**What you'll bring**:
- Previous career experience as a Site Reliability Engineer, in an Operations or Engineering environment
- Hands-on experience troubleshooting AWS hosted services
- Networking knowledge and able to troubleshoot TCP/IP, SSL/TLS, DNSSEC, IPsec, and BGP issues.
- Coding experience (preferably Python) building tools, scripting, or automation
- Strong communication (oral & written) skills including the ability to translate technical issues/concepts into agreed actions
**Why Xero?**
Offering very generous paid leave to use however you’d like (plus statutory holidays), dedicated paid leave to care for your physical and mental wellbeing as well as an Employee Assistance Program to access mental health care for you and your family, free medical insurance, wellbeing and sports programmes, employee resource groups, 26 weeks of paid parental leave for primary caregivers, an Employee Share Plan, beautiful offices, flexible working, career development, and many other benefits that reflect our human value, you’ll do the best work of your life at Xero.
-
Site Reliability Engineer
5 days ago
Auckland, Auckland, New Zealand Randstad New Zealand Full time US$1,000,000 - US$1,500,000 per yearSite Reliability Engineer - Gaming Industry - Infrastructure Lead - AWS Focus - ContractSite Reliability Engineering, Automation & Cloud Scaling - Competitive Rates, High-Impact Infrastructure ProjectAbout The RoleJoin a leading studio in the high-growth gaming industry, focused on building, operating, and scaling the reliable, high-performance systems that...
-
Site Reliability Engineer
7 days ago
Auckland CBD, New Zealand Randstad Digital Full time NZ$100,000 - NZ$120,000 per yearSite Reliability Engineer - Gaming Industry - Infrastructure Lead - AWS Focus - ContractSite Reliability Engineering, Automation & Cloud Scaling - Competitive Rates, High-Impact Infrastructure ProjectAbout the RoleJoin a leading studio in the high-growth gaming industry, focused on building, operating, and scaling the reliable, high-performance systems that...
-
Senior Site Reliability Engineer
2 weeks ago
Auckland City, New Zealand Pushpay Full timeWe've got an exciting and rare opportunity for a Senior Site Reliability Engineer to join us. At Pushpay, our dedicated SRE team helps our wider engineering team build and own services that are reliable, available, secure and continuously delivered so we can provide our customers with the best experience, each and every day. **These are the skills we are...
-
Site Reliability Engineer
3 days ago
Auckland CBD, New Zealand Absolute IT Limited Full time NZ$80,000 - NZ$120,000 per yearBuild and maintain resilient, high-performing data platformsWork with cutting-edge cloud and automation technologiesFlexible contract role supporting enterprise-scale solutionsAbout the Organisation Our client is a leading provider of advanced data and AI solutions for large enterprises and government organisations. They deliver innovative platforms and...
-
Manager, Site Reliability Engineering
1 week ago
Auckland City, New Zealand Lightspeed Commerce Full time**Hi there! Thanks for stopping by**: Are you actively looking for a new opportunity? Or just checking the market? Wellyou might just be in the right place! We're looking for a Manager, Site Reliability Engineering to lead a high-performing team of 6-8 cloud platform engineers in Auckland. This team has a broad scope, covering both Site Reliability and...
-
Manager, Site Reliability Engineering
2 weeks ago
Auckland City, New Zealand Lightspeed Full timeHi there! Thanks for stopping by Are you actively looking for a new opportunity? Or just checking the market? Wellyou might just be in the right place! We’re looking for a Manager, Site Reliability Engineering to lead a high-performing team of 6-8 cloud platform engineers in Auckland. This team has a broad scope, covering both Site Reliability and...
-
Lead Engineer
2 weeks ago
Auckland City, New Zealand Xero Full timeXero is a beautiful, easy-to-use platform that helps small businesses and their accounting and bookkeeping advisors grow and thrive. At Xero, our purpose is to make life better for people in small business, their advisors, and communities around the world. This purpose sits at the centre of everything we do. We support our people to do the best work of...
-
Site Reliability Engineer
24 hours ago
Auckland, Auckland, New Zealand SG Consulting Limited Full timeRole OverviewThe Site Reliability Engineer (SRE) is responsible for ensuring the reliability, scalability, and performance of critical IT systems and applications. This role blends software engineering principles with operational excellence to build resilient systems, automate processes, and proactively manage incidents. The SRE will work closely with...
-
Site Reliability Engineer
1 week ago
Auckland City, New Zealand Lightspeed Commerce Full time**Hi there! Thanks for stopping by**: Are you actively looking for a new opportunity? Or just checking the market? Wellyou might just be in the right place! We are looking for a Site Reliability Engineer to join our Hospitality team! Want to work on global systems with a global team? Lightspeed powers tens of thousands of cafes and restaurants in over 100...
-
Site Reliability Engineer
1 week ago
Auckland City, New Zealand Visa Full time**Company Description** Visa is a world leader in payments and technology, with over 310 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more than 200 countries and territories each year. Our mission is to connect the world through the most innovative, convenient, reliable, and...