100% Remote Site Reliability Engineer
Job ID: 24-00045
Location: Array
Basic Qualifications:
- 6+ years of technical experience with Systems Integration.
- Thorough understanding of coding best practices, both in a structured and OOP (Object Oriented Programming) way (e.g., Python, Golang, Ruby, C/C++).
- Proficient in programming languages for automation (e.g., python) and shell scripting (e.g., bash).
- Strong and relevant experience in cloud technologies, cloud services, Cloud IaaS, PaaS, and SaaS offerings.
- Strong experience with automation and CI/CD tools (e.g. Jenkins, Ansible).
- Experience with the development of SLO’s, SLI’s, and SLA’s
- Strong background designing, deploying, and maintaining monitoring tools such as Splunk.
- Experience with system data observability (the ability to collect data about programs’ execution, modules’ internal states, and the communication among components).
Primary Responsibilities:
- Successful candidate will be a member of a cross functional team comprised of well-rounded engineers who can learn new skills rapidly and work across multiple functional domains to carry out end-to-end delivery of infrastructure services.
- Support the full system life-cycle of Splunk across geographically dispersed enterprise datacenters.
- Monitor system stability and performance and ensure system availability, reliability, and usability.
- Troubleshoot complex problems, resolving operational issues, software fault diagnosis, & interacting with vendors, etc.
- Development of SLI, SLO, & SLAs.
- Performs concept exploration and assessment, systems integration, systems of systems integration, performance management, technology assessment, testing and validation.