Senior Application Site Reliability Engineer
Integral
Software Engineering
United States
Posted on Jul 1, 2025
Senior Site Reliability Engineer (Application SRE)body { font-family: sans-serif; text-align: justify;} h1, h2, h3 { font-weight: bold; }
Senior Site Reliability Engineer (Application SRE)
About The Role
Integral is committed to delivering best-in-class service reliability and performance. As part of this commitment, we are expanding our Site Reliability Engineering (SRE) team to ensure the reliability, performance, and availability of our software applications. We are looking for a highly motivated and technically talented Senior Application SRE to support our 24x7 FX trading environment. This role will focus on application monitoring, automation, and optimization to enhance system stability, minimize downtime, and improve overall user experience. The ideal candidate will bring strong problem-solving skills, experience in large-scale distributed systems, and a deep understanding of software and infrastructure reliability principles.
Responsibilities
Be a key player in shaping Integral’s SRE strategy and improving mission-critical trading systems. Work in a collaborative, fast-paced environment with top engineering talent. Enjoy career growth opportunities in an organization that values technical excellence and innovation. Competitive compensation and benefits package. If you`re passionate about site reliability, automation, and scaling highly available applications, we`d love to hear from you! Apply now and help us build the future of reliable trading technology.
Senior Site Reliability Engineer (Application SRE)
About The Role
Integral is committed to delivering best-in-class service reliability and performance. As part of this commitment, we are expanding our Site Reliability Engineering (SRE) team to ensure the reliability, performance, and availability of our software applications. We are looking for a highly motivated and technically talented Senior Application SRE to support our 24x7 FX trading environment. This role will focus on application monitoring, automation, and optimization to enhance system stability, minimize downtime, and improve overall user experience. The ideal candidate will bring strong problem-solving skills, experience in large-scale distributed systems, and a deep understanding of software and infrastructure reliability principles.
Responsibilities
- Ensure the reliability, performance, and availability of Integral’s applications through proactive monitoring and automation.
- Develop and maintain real-time monitoring, alerting, and logging systems to detect and resolve issues before they impact customers.
- Automate manual operations, including application deployment, configuration, scaling, and recovery.
- Collaborate with software engineering teams to integrate reliability best practices into the development lifecycle.
- Conduct root cause analysis (RCA) and implement preventive measures to mitigate recurring issues.
- Support a 24x7 distributed enterprise environment across multiple global data centers.
- Work closely with Support to enhance incident response processes, ensuring fast and effective resolution of technical escalations.
- Participate in on-call rotations to support critical application issues and outages.
- Maintain and optimize CI,CD pipelines to ensure fast and reliable application releases.
- Enhance system security by managing SSL certificates, encryption, and authentication mechanisms.
- Foster a culture of continuous improvement by evaluating new tools, frameworks, and methodologies to enhance system reliability.
- Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent experience.
- 5+ years of experience in a similar role, focusing on application reliability, automation, and performance optimization.
- Strong expertise in Linux and Windows system administration.
- Proficiency in at least one scripting language (e.g., Python, Shell, Perl, JavaScript).
- Experience with Docker, Kubernetes, or containerization technologies.
- Familiarity with CI,CD tools like Jenkins and deployment automation frameworks.
- Hands-on experience with monitoring and observability tools (e.g., Prometheus, Grafana, ELK Stack, New Relic, Datadog).
- Understanding of networking concepts (TCP,IP, DNS, load balancing, firewalls).
- Experience with configuration management tools like Ansible, Salt, or Puppet.
- Strong debugging and troubleshooting skills across application, database, and infrastructure layers.
- Ability to work in a fast-paced, high-pressure environment with multiple priorities.
- Excellent communication and collaboration skills to work effectively with engineering and support teams.
- Experience in the financial services or trading industry.
- Knowledge of distributed computing, cloud platforms (AWS, GCP, Azure).
- Exposure to security best practices and compliance standards.
- Familiarity with incident management frameworks (ITIL, SRE best practices, or similar methodologies).
Be a key player in shaping Integral’s SRE strategy and improving mission-critical trading systems. Work in a collaborative, fast-paced environment with top engineering talent. Enjoy career growth opportunities in an organization that values technical excellence and innovation. Competitive compensation and benefits package. If you`re passionate about site reliability, automation, and scaling highly available applications, we`d love to hear from you! Apply now and help us build the future of reliable trading technology.