08 feb
Epam Systems
Nuevo Casas Grandes
.We are looking for an experienced **Site Reliability Engineer (SRE)** to take a leadership role in ensuring the stability, scalability, and performance of our cloud infrastructure on **Google Cloud Platform (GCP)**.
As an SRE, you will be at the forefront of optimizing system reliability, automating processes, and collaborating with engineering teams to enhance operational excellence.
If you're passionate about **infrastructure-as-code, automation, and building resilient systems**, we'd love to hear from you.
**Responsibilities**- Lead reliability initiatives to optimize system performance, scalability, and cost efficiency- Manage and participate in on-call rotations,
providing 24/7 support for critical infrastructure- Troubleshoot incidents, conduct root cause analysis (RCA), and implement long-term solutions- Deploy and manage microservices in alignment with release cycles- Design and maintain infrastructure-as-code solutions using Terraform- Collaborate with development teams to improve system reliability, performance, and cloud resource management- Oversee incident response and ticket management using ServiceNow and Jira- Maintain and expand internal knowledge bases on infrastructure and monitoring**Requirements**:- 5+ years of experience in SRE, DevOps, or system administration roles- Expertise in Google Cloud Platform (GCP) and cloud-native architectures- Hands-on experience with incident management and monitoring tools (ServiceNow, Cloud Monitoring, etc
Muestra tus habilidades a la empresa, rellenar el formulario y deja un toque personal en la carta, ayudará el reclutador en la elección del candidato.