Attention prospective job seekers! Beware of fraudulent offers Read more
Position Overview: As the SRE Lead, you will oversee the reliability and performance of our systems, ensuring they meet the high standards our customers expect. You will lead a team of skilled engineers, guiding them in implementing best practices for reliability, automation, and operational excellence. This role requires a blend of technical expertise, leadership skills, and a strong commitment to continuous improvement.
Key Responsibilities:
Team Leadership: Manage and mentor a team of Site Reliability Engineers, fostering a culture of collaboration, innovation, and accountability.
System Reliability: Drive initiatives to improve system reliability, availability, scalability, and performance.
Incident Management: Lead the response to critical incidents, coordinating efforts across teams to ensure swift resolution and minimal customer impact.
Automation: Champion automation efforts to streamline operational processes, reduce manual intervention, and increase efficiency.
Monitoring and Alerting: Establish and maintain robust monitoring and alerting systems to proactively identify issues and prevent service disruptions.
Capacity Planning: Collaborate with cross-functional teams to forecast capacity requirements and optimize resource utilization.
Continuous Improvement: Promote a culture of continuous improvement through regular retrospectives, post-incident reviews, and knowledge-sharing sessions.
Documentation: Ensure comprehensive documentation of systems, processes, and procedures to facilitate knowledge transfer and training.
Qualifications:
Technical Expertise: Strong background in Linux/Unix systems administration, networking, and cloud infrastructure (AWS, GCP, Azure).
Leadership Skills: Proven experience leading and developing high-performing engineering teams.
Problem Solving: Ability to troubleshoot complex issues, prioritize tasks, and make data-driven decisions under pressure.
Automation Tools: Proficiency in automation tools and configuration management (e.g., Terraform, Ansible, Chef, Puppet).
Monitoring and Logging: Experience with monitoring tools (e.g., Prometheus, Grafana, ELK stack) and log management solutions.
CI/CD: Familiarity with CI/CD pipelines and practices.
Communication: Excellent communication skills with the ability to articulate technical concepts to non-technical stakeholders.
Education and Experience:
Bachelors degree in Computer Science, Engineering, or a related field (or equivalent practical experience).
10+ years of experience in a Site Reliability Engineering or similar role.
5+ years of experience in a leadership or managerial position.
About Virtusa
Teamwork, quality of life, professional and personal development: values that Virtusa is proud to embody. When you join us, you join a team of 30,000 people globally that cares about your growth — one that seeks to provide you with exciting projects, opportunities and work with state of the art technologies throughout your career with us.
Great minds, great potential: it all comes together at Virtusa. We value collaboration and the team environment of our company, and seek to provide great minds with a dynamic place to nurture new ideas and foster excellence.
Virtusa was founded on principles of equal opportunity for all, and so does not discriminate on the basis of race, religion, color, sex, gender identity, sexual orientation, age, non-disqualifying physical or mental disability, national origin, veteran status or any other basis covered by appropriate law. All employment is decided on the basis of qualifications, merit, and business need.
Check your downloads folder for files and implementation instructions.
Assets are now available in your profile for future editing and use.