Intermediate Site Reliability Engineer, Tenant Scale: Tenant Services
GitLab
GitLab is an open-core software company that develops the most comprehensive AI-powered DevSecOps Platform, used by more than 100,000 organizations. Our mission is to enable everyone to contribute to and co-create the software that powers our world. When everyone can contribute, consumers become contributors, significantly accelerating human progress. Our platform unites teams and organizations, breaking down barriers and redefining what's possible in software development. Thanks to products like Duo Enterprise and Duo Agent Platform, customers get AI benefits at every stage of the SDLC.
The same principles built into our products are reflected in how our team works: we embrace AI as a core productivity multiplier, with all team members expected to incorporate AI into their daily workflows to drive efficiency, innovation, and impact. GitLab is where careers accelerate, innovation flourishes, and every voice is valued. Our high-performance culture is driven by our values and continuous knowledge exchange, enabling our team members to reach their full potential while collaborating with industry leaders to solve complex problems. Co-create the future with us as we build technology that transforms how the world develops software.
An overview of this role
As a Site Reliability Engineer (SRE) at GitLab, you keep GitLab.com and other production systems running smoothly for millions of users by combining pragmatic operations with strong software engineering practices. You focus on the systems layer (operating systems, storage, networking) and edge services and Kubernetes workloads, designing and operating highly scalable, reliable, and secure infrastructure that supports one of the largest single-tenancy open source SaaS sites on the Internet. You’ll work across the Infrastructure organization to automate away toil, improve availability and performance, and respond to incidents during your local daytime hours as part of a globally distributed on-call rotation. In this role, you’ll help Tenant Services safeguard and scale customer data while increasing automation so GitLab can continue to grow with enterprise-level expectations for reliability and availability.
What you’ll do
- Design and implement highly scalable infrastructure for GitLab.com to support current and future growth.
- Collaborate with cross-functional teams across the Infrastructure organization to plan and deliver projects that shape GitLab’s platform direction.
- Operate and improve edge services and Kubernetes workloads, acting as a subject matter expert within the infrastructure department.
- Participate in a global on-call rotation during your local daytime hours, respond to production incidents, and contribute to clear, constructive incident reviews.
- Reduce toil by automating operational tasks and building tools that improve reliability, availability, and scalability.
- Apply infrastructure as code and configuration management practices to manage cloud resources and environments consistently.
- Write and maintain production-quality code, preferably in Go or Ruby, to enhance our systems and automation toolchain.
What you’ll bring
- Background working with the Kubernetes ecosystem, including tools such as Helm, and running production workloads.
- Experience operating cloud infrastructure on platforms like Google Cloud Platform or Amazon Web Services, especially networking, hosted Kubernetes services, and scaling.
- Hands-on practice with infrastructure as code and configuration management tools such as Ansible or Chef.
- Strong programming skills in a modern language, preferably Go or Ruby, applied to automation and reliability problems.
- Ability to clearly define problems, think beyond short-term fixes, and design solutions that improve systems over time.
- Consistent focus on reducing toil through automation and thoughtful system design.
- Independent, proactive working style with a bias for action and comfort operating as a “manager of one” in a distributed, asynchronous environment.
- Clear written and verbal communication skills, with openness to candidates who bring transferable experience from related reliability, infrastructure, or platform roles.
About the team
Tenant Services is the team responsible for safeguarding and securing customer data stored by the GitLab application and for setting clear guidelines for how that data is accessed. The team runs the largest GitLab instance in existence, and one of the largest single-tenancy open source SaaS sites on the Internet, which means you’ll work on unique scale and reliability challenges that impact users every day. As an all-remote, globally distributed group, Tenant Services collaborates asynchronously across time zones and leans heavily on automation to meet enterprise expectations for reliability, availability, and data protection while continuing to scale. For more on how this team works, see our Team Handbook page.The Tenant Services team at GitLab is responsible for safeguarding and securing customer data stored by the GitLab application and for setting clear guidelines for how that data is accessed. We run the largest GitLab instance in existence, and one of the largest single-tenancy open source SaaS sites on the Internet, which means you’ll work on unique scale and reliability challenges that impact users every day. As an all-remote, globally distributed group, we collaborate asynchronously across time zones and lean heavily on automation to meet enterprise expectations for reliability, availability, and data protection while continuing to scale. For more on how we work, see our Team Handbook page.
How GitLab will support you
- Benefits to support your health, finances, and well-being
- Flexible Paid Time Off
- Team Member Resource Groups
- Equity Compensation & Employee Stock Purchase Plan
- Growth and Development Fund
- Parental leave
- Home office support
Please note that we welcome interest from candidates with varying levels of experience; many successful candidates do not meet every single requirement. Additionally, studies have shown that people from underrepresented groups are less likely to apply to a job unless they meet every single qualification. If you're excited about this role, please apply and allow our recruiters to assess your application.
Country Hiring Guidelines: GitLab hires new team members in countries around the world. All of our roles are remote, however some roles may carry specific location-based eligibility requirements. Our Talent Acquisition team can help answer any questions about location after starting the recruiting process.
Privacy Policy: Please review our Recruitment Privacy Policy. Your privacy is important to us.
GitLab is proud to be an equal opportunity workplace and is an affirmative action employer. GitLab’s policies and practices relating to recruitment, employment, career development and advancement, promotion, and retirement are based solely on merit, regardless of race, color, religion, ancestry, sex (including pregnancy, lactation, sexual orientation, gender identity, or gender expression), national origin, age, citizenship, marital status, mental or physical disability, genetic information (including family medical history), discharge status from the military, protected veteran status (which includes disabled veterans, recently separated veterans, active duty wartime or campaign badge veterans, and Armed Forces service medal veterans), or any other basis protected by law. GitLab will not tolerate discrimination or harassment based on any of these characteristics. See also GitLab’s EEO Policy and EEO is the Law. If you have a disability or special need that requires accommodation, please let us know during the recruiting process.