100% Remote Site Reliability Engineer

Job ID: 24-00045
Location: Array

Basic Qualifications:

  • 6+ years of technical experience with Systems Integration.
  • Thorough understanding of coding best practices, both in a structured and OOP (Object Oriented Programming) way (e.g., Python, Golang, Ruby, C/C++).
  • Proficient in programming languages for automation (e.g., python) and shell scripting (e.g., bash).
  • Strong and relevant experience in cloud technologies, cloud services, Cloud IaaS, PaaS, and SaaS offerings.
  • Strong experience with automation and CI/CD tools (e.g. Jenkins, Ansible).
  • Experience with the development of SLO’s, SLI’s, and SLA’s
  • Strong background designing, deploying, and maintaining monitoring tools such as Splunk.
  • Experience with system data observability (the ability to collect data about programs’ execution, modules’ internal states, and the communication among components).

Primary Responsibilities:

  • Successful candidate will be a member of a cross functional team comprised of well-rounded engineers who can learn new skills rapidly and work across multiple functional domains to carry out end-to-end delivery of infrastructure services.
  • Support the full system life-cycle of Splunk across geographically dispersed enterprise datacenters.
  • Monitor system stability and performance and ensure system availability, reliability, and usability.
  • Troubleshoot complex problems, resolving operational issues, software fault diagnosis, & interacting with vendors, etc.
  • Development of SLI, SLO, & SLAs.
  • Performs concept exploration and assessment, systems integration, systems of systems integration, performance management, technology assessment, testing and validation.


Submit Your Resume For This Position

  • Hidden
  • Hidden
  • Max. file size: 50 MB.
  • This field is for validation purposes and should be left unchanged.