Name: Xintas
Price range: $

The goal is to ensure the reliability, scalability, monitoring, and performance of our on-premises services. Responsibilities will include designing, implementing best practices, and managing our infrastructure. The role includes working within cross-functional teams to improve systems and processes and ensure uptime and efficiency.

Design and maintain monitoring infrastructure
Create custom dashboards, alerts, and visualization solutions
Implement distributed tracing and log aggregation systems
Establish monitoring best practices and SLI/SLO frameworks
Maintain security compliance for on-premises monitoring tools
Automate deployment and configuration management
Collaborate with development teams on application instrumentation
Participate to on-duty rotations

Requirements

Core Technologies
- Advanced Grafana,
- Prometheus (PromQL),
- OpenTelemetry,
- Elasticsearch
Infrastructure
- Linux administration,
- networking,
- on-premises security
Programming
- Python,
- Bash, or Go for automation
Experience
- 3+ years monitoring/observability,
- 2+ years Grafana/Prometheus in production,
- strong Linux system administration experience,
- proven track record with on-premises infrastructure solutions
Security
- Enterprise security practices,
- compliance requirements
Ability to balance technical trade-offs with business needs and prioritize effectively.
Participation to on-duty rotations (24/7 Incident support)

Key Deliverables

Reduced MTTD/MTTR through effective monitoring
Comprehensive observability across all systems
Automated monitoring, deployment, and management
Security-compliant monitoring practices

Languages

English (C1).
Extra Languages: German, French, Dutch.

Vacature

Site Reliability Engineer

Brussel

Interesse in deze vacature?

Samenwerken?

We're in control of IT!