Lawrence Berkeley National Lab's ( LBNL ) NERSC Division has an opening for a Dev Ops Engineer to join the team.
In this exciting role, you will serve as a DevOps-oriented System Administrator / Software Engineer (Computer Systems Engineer 3 / 4) at the National Energy Research Scientific Computing Center (NERSC) to help architect, deploy, configure, and operate large scale, leading-edge high-performance computing (HPC) systems. You will work collaboratively to develop and operate large-scale compute and storage platforms to support NERSC's mission of accelerating scientific discovery through high-performance computing and data analysis. Working with teams at NERSC, other national laboratories, HPC vendors and open-source communities you will develop innovative solutions that enable science as well as improve the state of HPC practice on an international stage. Your focus will be to improve and operate NERSC's largest HPC resources, Perlmutter and Doudna, and to work with the rest of the HPC community to develop and maintain world class system software.
The selected candidate(s) will be hired at the Computer Systems Engineer 3 or 4 (CSE3 or CSE4) depending on their level of skills and experience.
What You Will Do if hired at a Level 3 :
Participate in team-oriented agile development and management process for HPC systems using languages like Go, Rust and Python
Develop and maintain APIs to securely expose system functionality to end users
Automate common tasks and processes to continuously improve HPC systems management
Analyze system issues and develop solutions to improve end-user experience
Be part of a team that installs, tests, maintains and manages HPC systems
Assist with technology evaluation of systems and system architecture
Work with vendors to prioritize, develop and enhance their technologies in order to better meet the needs of our users
Be part of team providing on-call rotation for 24x7 HPC system support
Work on and resolve complex issues where analysis of situations or data requires an in-depth evaluation of variable factors.
Exercise judgment in selecting methods, techniques and evaluation criteria for obtaining results.
Determine methods and procedures on new assignments and may coordinate activities of other personnel.
Network with key contacts outside own area of expertise.
In Additional Responsibilities if hired at a Level 4 :
Provide leadership and technical guidance to group members, and members of other groups at NERSC.
Recommend and lead implementation and deployment efforts for system improvements that enhance reliability, stability, usability, performance and security.
Identify and evaluate emerging HPC technologies and explore new features that would create new capabilities and enhance system performance and usability.
Participate in working / user / advocacy groups and represent NERSC and its interests to the broader HPC community.
Work at a higher level of independence while carrying out work assignment.
Work on and solve significant and issues where analysis of situations or data requires an in-depth evaluation of variable factors.
What is Required to be hired at a Level 3 :
Typically requires a minimum of 8 years of related experience with a Bachelor's degree; or 6 years and a Master's degree; or equivalent experience.
Minimum of 2 years of experience with systems programming in Linux environment or management of large-scale Linux-based systems in a high-performance computing, cloud computing, or hyper-scale environment.
Experience with C, bourne shell, and Python3 programming languages.
Additional Requirements to be hired at a Level 4
Typically requires a minimum of 12 years of related experience with a Bachelor's degree; or 8 years and a Master's degree; or equivalent experience.
Demonstrated excellent systems programming skills and strong knowledge of Linux internals.
Demonstrated ability to work independently as well as collaboratively in large projects, and contribute to an active and respectful intellectual environment.
Excellent oral and written communication skills.
Ability to resolve complex issues in creative and effective ways and derive technical solutions in a collaborative environment to meet end user requirements or needs.
Ability to network and collaborate with key contacts outside own area of expertise.
Ability to work on and resolve significant and unique issues where analysis of situations or data requires an evaluation of intangibles.
Ability to exercise independent judgment in methods, techniques and evaluation criteria for obtaining results.
Desired Qualifications :
Development of kubernetes microservices using technologies like helm or loftsman for deployment.
Operations of kubernetes, etcd.
Infrastructure as code solutions like argo, terraform, ansible, puppet, salt.
Rust or Go programming language.
Gitlab or Github Continuous Integration and Project Management.
Agile process, scrum.
Linux kernel interfaces, cgroups, ebpf.
Installation, configuration, monitoring, and tuning of workload management systems such as Slurm, PBSPro, or GridEngine.
Monitoring solutions such grafana, prometheus, ldms.
HPC systems administration.
HPC applications analysis, MPI.
Specialized networking (Infiniband, Slingshot or other high-speed networks).
Lustre, SpectrumScale (GPFS) or other parallel file systems.
Notes :
This is a full-time career appointment, exempt (monthly paid) from overtime pay.
This position will involve access to hardware, commodities, and technical information subject to export control regulations including, but not limited to, the Export Administration Regulations ("EAR") and / or International Traffic in Arms Regulations ("ITAR"). Accordingly, any hiring decision may depend in part on Berkeley Lab's ability to obtain or rely on federal government authorizations as required, if you are not a U.S. citizen, lawful permanent resident of the U.S. ("green card holder"), asylee, refugee, or other qualifying protected individual as defined by 8 U.S.C. 1324b(a)(3).
This position will be hired at a level commensurate with the business needs and the skills, knowledge, and abilities of the successful candidate.
Level 3 : The full salary range of this position is between $129,948.00 - $219,276.00 per year and is expected to pay between a targeted range of $146,184.00 - $178,668.00 per year depending upon candidates' full skills, knowledge, and abilities, including education, certifications, and years of experience.
Level 4 : The full salary range of this position is between $147,984.00 - $249,732.00 per year and is expected to pay between a targeted range of $166,476.00 - $203,484.00 per year depending upon candidates' full skills, knowledge, and abilities, including education, certifications, and years of experience.
This position is subject to a background check. Any convictions will be evaluated to determine if they directly relate to the responsibilities and requirements of the position. Having a conviction history will not automatically disqualify an applicant from being considered for employment.
This position requires substantial on-site presence, but is eligible for a flexible work mode, and hybrid schedules may be considered. Hybrid work is a combination of performing work on-site at Lawrence Berkeley National Lab, 1 Cyclotron Road, Berkeley, CA and some telework. Individuals working a hybrid schedule must reside within 150 miles of Berkeley Lab. Work schedules are dependent on business needs. In rare cases, full-time telework or remote work modes may be considered.
Want to learn more about working at Berkeley Lab? Please visit : careers.lbl.gov
Equal Employment Opportunity Employer : The foundation of Berkeley Lab is our Stewardship Values : Team Science, Service, Trust, Innovation, and Respect; and we strive to build community with these shared values and commitments. Berkeley Lab is an Equal Opportunity Employer. We heartily welcome applications from all who could contribute to the Lab's mission of leading scientific discovery, excellence, and professionalism. In support of our rich global community, all qualified applicants will be considered for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, age, protected veteran status, or other protected categories under State and Federal law.
Berkeley Lab is a University of California employer. It is the policy of the University of California to undertake affirmative action and anti-discrimination efforts, consistent with its obligations as a Federal and State contractor.
Misconduct Disclosure Requirement : As a condition of employment, the finalist will be required to disclose if they are subject to any final administrative or judicial decisions within the last seven years determining that they committed any misconduct, are currently being investigated for misconduct, left a position during an investigation for alleged misconduct, or have filed an appeal with a previous employer.
Dev Ops Engineer • Berkeley, California, United States