You will be responsible for overseeing the general uptime and availability of all applications owned by SnappBox. The Senior Site Reliability Engineer (SRE) role is integrated within a cross-functional collaboration with the DevOps, Technology, and Security teams. This position also requires weekend shifts.
Responsibilities
- Monitor services
- Manage incidents
- Extend and improve current monitoring systems
- Automate the current monitoring process
- Deploy services to the production environment
- Communicate with other teams to resolve issues
- Troubleshoot system problems
- Improve monitoring systems
- Troubleshoot system problems in production.
- Deploy and modify the production environment
Requirements
- Computer and IT field of study
- Between 4 and 10 years of work experience
- Experience in team management
- Experience as a developer in Java/Spring (more than half of total professional experience)
- Proficiency with Grafana and Prometheus
- Strong understanding of TCP/IP
- Proficiency with Linux (LPIC-2)
- Familiar with log shipment /management tools (elk stack)
- Familiarity with Docker containers
- Familiar with microservice architecture
- Familiar with CI/CD
- Understanding of REST APIs
- Knowledge of Kubernetes
- Knowledge of software engineering and development principles