(UQO-657) | Site Reliability Engineer (SRE)

(UQO-657) | Site Reliability Engineer (SRE)

09 ene
|
Empresa líder
|
México

09 ene

Empresa líder

México

Site Reliability Engineer (SRE)
REMOTE / Mexico
Descripción

Fast Dolphin is an international staffing company with over two decades of expereince specializing in recruiting bilingual and multilingual IT experts across the Americas as well as providing payroll and customized staffing solutions. Moreover, we take pride in boosting the careers and dreams of our professionals.

We are currently looking for Site Reliability Engineer (SRE) Cloud & Automation to work in a hybrid position with posible travels to São Paulo Brazil for a 6+ month job opportunity.

Job Title: Site Reliability Engineer (SRE) Cloud & Automation

Location: Remote

Start Date: ASAP

Length: 6+ months

Responsibilities

Site Reliability Engineer (SRE) Role Summary





We are looking for a seasoned Site Reliability Engineer with expertise in Kafka, Kubernetes, and MongoDB to ensure infrastructure reliability, scalability, and performance. The role focuses on designing and maintaining resilient systems, optimizing performance, and automating processes while collaborating with teams to improve application deployment and performance.

Key Responsibilities:

- Design and maintain resilient infrastructure for Kafka, Kubernetes, and MongoDB.
- Monitor system performance using modern observability tools.
- Develop monitoring, alerting, and logging frameworks for effective failure detection.
- Perform root cause analysis and resolve incidents to minimize downtime.
- Automate tasks to enhance efficiency and reduce manual efforts.
- Collaborate on application performance and deployment strategies.
- Define best practices for scaling, capacity planning, and disaster recovery.
- Improve infrastructure as code (IaC) and deployment pipelines continuously.

Required Skills:

- Bachelor's degree in Computer Science or related experience.




- 3+ years in SRE, DevOps, or a similar role.
- Expertise in managing Kafka, Kubernetes, and MongoDB in production.
- Strong knowledge of distributed systems, networking, and database tuning.
- Experience with monitoring tools (e.g., Prometheus, Grafana, ELK Stack).
- Proficiency in scripting (Python, Bash) and CI/CD tools (Jenkins, ArgoCD).
- Familiarity with cloud platforms (AWS, Azure, or GCP).
- Excellent problem-solving, communication, and collaboration skills.

Preferred Skills:

- Familiarity with MongoDB Atlas and Kubernetes Operators.
- Knowledge of RedHat OpenShift and configuration tools like Ansible or Terraform.
- Understanding of SLA, SLO, and error budgets within SRE practices.

- Languages:

- English B2
- Spanish

If you fulfill these requirements and are interested in this position,



please send your resume along with your availability to start in this project, to the following e-mail address: ***********@fastdolphin.com

Rosa Trinidad Romero Mancilla

IT Recruiting Master

Fast Dolphin, Inc.

www.fastdolphin.com

12555 Orange Drive, Suite 4059

Ft. Lauderdale, FL 33330

Phone: +1 (954) 233-0647

WhatsApp +52(554) 164-9564

Skype: rostry2000

Fecha de publicación: 08-01-2025

El anuncio original lo puedes encontrar en Kit Empleo:
https://www.kitempleo.com.mx/empleo/132439995/uqo-657-site-reliability-engineer-sre-mexico/?utm_source=html

Suscribete a esta alerta:
Escribe tu dirección de correo electrónico, te permitirá de estar al tanto de los últimos empleos por: (uqo-657) | site reliability engineer (sre)

Postulate a este anuncio

Muestra tus habilidades a la empresa, rellenar el formulario y deja un toque personal en la carta, ayudará el reclutador en la elección del candidato.

Suscribete a esta alerta:
Escribe tu dirección de correo electrónico, te permitirá de estar al tanto de los últimos empleos por: (uqo-657) | site reliability engineer (sre)