Job Description
Job Description
Kubernetes Platform Systems Engineer
The Department of Energy facility delivers scientific discoveries and technical breakthroughs necessary to realize solutions in energy and national security, providing economic benefits to the nation. This premier research institution, located near Knoxville in Oak Ridge, TN, addresses national needs through impactful research and world-leading research centers. This position can be remote, but it'd require you to come on-site twice a year.
- Must be eligible for a federal security clearance (US Citizen)
Job Responsibilities :
Use advanced knowledge and experience to ensure our Kubernetes platform remains reliable, available, and fastUse advanced knowledge and experience to identify problems and provide solutions to improve the reliability, scalability, performance, and efficiency of our servicesRespond to, investigate, and fix service issues all the way from bare metal through the OS to the application code, including technically complex problems spanning a diverse set of areasWork as part of a team to define and implement best practices and standards within the organizationCoordinate with vendors to resolve hardware and software problemsParticipate in an on-call rotation providing 24-hour, 7-day support and off-hours maintenance windowsWork with scientific and technical users to help them use KubernetesBasic Qualifications :
5+ years of experience working as an SRE / Systems Administrator / Systems EngineerBachelor’s degree or an equivalent combination of education and experienceAt least five years of relevant technical experiencePreferred Qualifications :
10+ years of experience working as an SRE / Systems Administrator / Systems EngineerExcellent interpersonal / communications skills, and the ability to work as part of a teamExperience with Docker or KubernetesExperiencing using image registries such as Quay or HarborUnderstanding of networked computing environment conceptsWorking knowledge of Unix systems fundamentals and common network protocolsAbility to develop and maintain programs and scripts that aid in the operation and automation of tasks using various shell and scripting languages (primarily bash, Python, and Go)Working knowledge of tools such as Prometheus, Nagios, and Grafana to monitor systems, metrics and create dashboardsWorking knowledge of Infrastructure-as-Code tooling such as Terraform, Helm, and PuppetWorking knowledge of CI / CD tooling and GitOpsExperience with code review and familiarity with tools like git, GitHub and GitLab