Cloud Systems Engineer
Job ID: 24-00020
The Cloud Systems Engineer works closely with the Program team to manage, maintain, and optimize application's data and infrastructure that support CMS and the public. You will deliver solutions that ultimately ensure that the functions of Medicare, Medicaid, and Marketplace are carried out for the US citizen and contribute to efforts to reduce healthcare costs.
With a "no downtime, zero outages” vision and mantra, we support a range of data center and cloud-based application needs ranging from self-service to white-glove services, all of which are based on our customer's required level of support.
The role of a Systems Engineer Sr. will require you to develop solutions that are highly innovative and achieved through research and integration of best practices. Influence development of solutions that impact strategic project/program goals and business results while also leading work of other technical staff. You will resolve highly complex problems using significant application of technical knowledge, conceptualizing, reasoning, and interpretation. You will interact daily with various technical resources across different vendors which are fulfilling technical requirements for the customer.
The current work environment is remote leveraging various tools such as Slack, Microsoft Teams, and Zoom.
· Successful candidate will be a member of a cross functional team comprised of well-rounded engineers who can learn new skills rapidly and work across multiple functional domains to carry out end-to-end delivery of infrastructure services.
· The Systems Engineer Sr. works closely with the Integrated Service Delivery (ISD) teams to manage, maintain, and optimize application's data and infrastructure that support CMS and public health. You will maintain and deliver solutions that ultimately ensure that the functions of Medicare, Medicaid, and the Healthcare.gov Marketplace are carried out for the US citizen and contribute to efforts to reduce healthcare costs.
· Work closely with Client Engineering and Operations staff as well as the customer's application owners to solve technical problems at the network, system, and application levels.
· Lead the team in all areas of telemetry and observability.
· Help design, test and deploy technical solutions that are innovative and that leverage new both technologies and new methods to shape customer operations.
· You will have the opportunity to grow in your career as you help us to grow in our value and breadth of services to CMS.
· Responsible and accountable for managing and following up on incidents, changes, and application release problems through the management channels.
· Participate in on-call rotation and respond to incident alerts.
· Building software and systems while managing the platform infrastructure and applications.
· Creating and maintaining various continuous integration/continuous development pipeline (CI/CD).
· Focus on proactivity and enablement of self-healing systems.
· Serve as the expert in creation of KPI's and alerting thresholds for meaningful metrics relative to the health and performance of the applications the team manages.
· Ensure availability, reliability, and security and performance of all resources across various applications; and reporting them to owners in a timely manner.
· Must be a team player, but able to work independently on large, complex projects and assignments in fast paced environment.
· Provide leadership in problem determination/analysis, isolating system problems utilizing diagnostic and system management tools.
· Always provide professional and courteous service with excellent verbal and written communications skills.
· Model inclusive leadership to teammates by building diversity into activities and meetings.
Must Have Qualifications:
· BS degree in in computer science or some equivalent, highly technical discipline. Experience may be substituted in lieu of degree.
· 5+ years in technical engineering relative to the responsibilities of the Site Reliability Engineer position.
· Thorough understanding of microservice based architecture.
· Through understanding of coding best practices, including knowing how to code, typically in a variety of languages, both in a structured and OOP way (e.g., Python, Golang, Ruby, C/C++).
· Proficient in programming languages for automation (e.g., python) and shell scripting (e.g., bash).
· Deep knowledge of version control (e.g., Git) and ability to create GitOps practices.
· Extensive experience with configuring and maintaining monitoring and alerting tools such as Nagios, CloudWatch, Grafana, Prometheus, Splunk ITSI.
· Proficient in incident management tools (e.g., Splunk On-Call, PagerDuty)
· Experience with variety of relational and non-relational databases/RDS (e.g., DynamoDB, MongoDB, CosmoDB, PostgreSQL).
· Strong and relevant experience in cloud technologies, cloud services, IaC, cloud storage, cloud networking and cloud security.
· Strong knowledge and experience with Cloud IaaS, PaaS, and SaaS offerings.
· Strong experience with automation and CI/CD tools (e.g., Argo, Jenkins, Travis, Ansible).
· Knowledge of cloud-based security tools, best practices and policies including demonstrated experience protecting all layers of the application stack.
· Knowledge of the Software Delivery Life Cycle (SDLC).
· Excellent writing and verbal communication skills.
· Ability to manage conflict effectively.
· Ability to adapt and be productive in a fast-paced dynamic environment.
· Excellent communication and collaboration skills supporting multiple stakeholders and business operations.
· Self-starter, self-managed, and a team player.
· Cloud certification (e.g., AWS Solutions Architect Associate, Azure Administrator).
· Monitoring certification (e.g., Splunk, Prometheus, DataDog)
· Experience with containerization and orchestration tools (e.g., Kubernetes, Docker).
· Experience with orchestrating ChatOps.
· Experience with setting up self-healing components within an application's infrastructure.
· Agile-based knowledge and skill, including experience with Scrum Ceremonies and work management tools (e.g., (JIRA, Confluence).
· Security Skills—Knowledge of information assurance compliance and information security basics within CMS.
· Ability to obtain a Public Trust clearance.
- All candidates supporting the CMS programs must have lived in the United States at least three (3) out of the last five (5) years prior in order to be considered
Candidates can be a USC, Green Card or H1B Visa.