Site Reliability Engineer | [HCY-378]

03 feb

Converge Technology Solutions

Xico

03 feb

Converge Technology Solutions

Xico

.The IP4G Site Reliability Engineer is a key member of the team, responsible for ensuring the reliability, scalability, and performance of IP4G workloads.
This role combines expertise in IBM Power Systems, cloud infrastructure, and site reliability engineering principles to design, implement, and maintain resilient and efficient solutions for clients.
This role collaborates closely with cross-functional teams to monitor, optimize, and automate IP4G, striving for continuous improvement and operational excellence.Essential Functions:Design, implement, and maintain highly available and resilient architectures for IP4G workloads on Google Cloud Platform, leveraging fault-tolerant designs and redundancy strategies.Monitor system performance,

availability, and reliability metrics to proactively identify and address potential issues before they impact service uptime or performance.Implement disaster recovery solutions and failover mechanisms to ensure business continuity and minimize service disruptions.Optimize IP4G workloads for performance, scalability, and cost-efficiency in the Google Cloud environment, leveraging auto-scaling, load balancing, and caching strategies.Conduct capacity planning exercises and performance tuning activities to ensure optimal resource utilization and performance of IP4G systems and applications.Collaborate with cloud architects and DevOps teams to implement CI/CD pipelines and automation workflows for seamless deployment and scaling of IP4G workloads.Respond to and resolve critical incidents impacting the availability or performance of IP4G systems and applications on Google Cloud, following established incident response procedures and SLAs.Document incident response procedures, post-mortem reports,

and lessons learned to improve incident management processes and enhance system reliability.Develop automation scripts and infrastructure as code (IaC) templates to automate routine tasks, streamline deployment processes, and improve operational efficiency.Continuously evaluate and adopt emerging technologies and best practices in automation and DevOps to enhance the reliability and scalability of IBM Power environments.Implement comprehensive monitoring and alerting solutions for IP4G workloads on Google Cloud, utilizing monitoring tools such as Stackdriver, Prometheus, and Grafana.Define and configure alerting thresholds, notifications, and escalation policies to ensure timely detection and response to anomalous behavior or performance degradation.Skills and Qualifications:Excellent verbal and written communication skills.Ethical and critical thinking.Excellent interpersonal and customer service skills.Excellent organizational skills and attention to detail.Excellent time management skills with a proven ability to meet deadlines.Strong analytical and problem-solving skills.Strong supervisory and leadership skills.Ability to prioritize tasks and to delegate them when appropriate

Postulate a este anuncio

Muestra tus habilidades a la empresa, rellenar el formulario y deja un toque personal en la carta, ayudará el reclutador en la elección del candidato.

Senior Site Reliability Engineer (Hybrid From [M159]

21 feb

Datalogics

Monterrey

21 feb
Datalogics
Xico

**Senior Site Reliability Engineer (Hybrid from Monterrey)** - **MXN $1,020,000 - $1,260,000/year (gross)**: - **Equity and comprehensive health benefits**: - **Hybrid from Monterrey, Mexico**: [...]

Site Reliability Engineer S-320

14 feb

Freshbooks

Monterrey

14 feb
Freshbooks
Xico

**About FreshBooks**: FreshBooks is a leading cloud-based SaaS accounting software designed with one goal: to help small business owners grow. We reached unicorn status after raising our valuation t [...]

[EJ-983] Site Reliability Engineer

14 feb

Freshbooks

Monterrey

14 feb
Freshbooks
Xico

(GLM873) - Site Reliability Engineer

01 feb

Blue Yonder

Monterrey

01 feb
Blue Yonder
Xico

Title: Site Reliability Engineer Location: Remote in Country (Mexico) Blue Yonder is seeking an experienced Site Reliability Engineer (SRE) to join their team. The SRE will collaborate with softwa [...]

Site Reliability Engineer | [HCY-378]