Site Reliability Engineer | [HCY-378]

Site Reliability Engineer | [HCY-378]

03 feb
|
Converge Technology Solutions
|
Xico

03 feb

Converge Technology Solutions

Xico

.The IP4G Site Reliability Engineer is a key member of the team, responsible for ensuring the reliability, scalability, and performance of IP4G workloads.
This role combines expertise in IBM Power Systems, cloud infrastructure, and site reliability engineering principles to design, implement, and maintain resilient and efficient solutions for clients.
This role collaborates closely with cross-functional teams to monitor, optimize, and automate IP4G, striving for continuous improvement and operational excellence.Essential Functions:Design, implement, and maintain highly available and resilient architectures for IP4G workloads on Google Cloud Platform, leveraging fault-tolerant designs and redundancy strategies.Monitor system performance,



availability, and reliability metrics to proactively identify and address potential issues before they impact service uptime or performance.Implement disaster recovery solutions and failover mechanisms to ensure business continuity and minimize service disruptions.Optimize IP4G workloads for performance, scalability, and cost-efficiency in the Google Cloud environment, leveraging auto-scaling, load balancing, and caching strategies.Conduct capacity planning exercises and performance tuning activities to ensure optimal resource utilization and performance of IP4G systems and applications.Collaborate with cloud architects and DevOps teams to implement CI/CD pipelines and automation workflows for seamless deployment and scaling of IP4G workloads.Respond to and resolve critical incidents impacting the availability or performance of IP4G systems and applications on Google Cloud, following established incident response procedures and SLAs.Document incident response procedures, post-mortem reports,



and lessons learned to improve incident management processes and enhance system reliability.Develop automation scripts and infrastructure as code (IaC) templates to automate routine tasks, streamline deployment processes, and improve operational efficiency.Continuously evaluate and adopt emerging technologies and best practices in automation and DevOps to enhance the reliability and scalability of IBM Power environments.Implement comprehensive monitoring and alerting solutions for IP4G workloads on Google Cloud, utilizing monitoring tools such as Stackdriver, Prometheus, and Grafana.Define and configure alerting thresholds, notifications, and escalation policies to ensure timely detection and response to anomalous behavior or performance degradation.Skills and Qualifications:Excellent verbal and written communication skills.Ethical and critical thinking.Excellent interpersonal and customer service skills.Excellent organizational skills and attention to detail.Excellent time management skills with a proven ability to meet deadlines.Strong analytical and problem-solving skills.Strong supervisory and leadership skills.Ability to prioritize tasks and to delegate them when appropriate

El anuncio original lo puedes encontrar en Kit Empleo:
https://www.kitempleo.com.mx/empleo/138455644/site-reliability-engineer-hcy-378-xico/?utm_source=html

Suscribete a esta alerta:
Escribe tu dirección de correo electrónico, te permitirá de estar al tanto de los últimos empleos por: site reliability engineer | [hcy-378]

Postulate a este anuncio

Muestra tus habilidades a la empresa, rellenar el formulario y deja un toque personal en la carta, ayudará el reclutador en la elección del candidato.

Suscribete a esta alerta:
Escribe tu dirección de correo electrónico, te permitirá de estar al tanto de los últimos empleos por: site reliability engineer | [hcy-378]