Talent.com
Senior HPC Storage Systems Engineer

Senior HPC Storage Systems Engineer

Xcel EngineeringOak Ridge, TN, US
9 days ago
Job type
  • Full-time
Job description

Job Description

Job Description

COMPANY OVERVIEW

XCEL Engineering, Inc. is an award-winning small business that provides trusted information technology, engineering, consulting and project management solutions and services to federal agencies and organizations. Originally founded in 1971 by professional engineers at the University of Tennessee, XCEL was acquired in 2003 by U.S. Army and Navy veterans and in 2023 became a MartinFed company.

XCEL Engineering is a part of IT Lab Partners (ITLP) which was created to support a leading research facility in the East Tennessee region in recruiting the best and the brightest technical talent. Considering joining our impressive team today!

JOB OVERVIEW

Xcel Engineering is seeking a Senior HPC Storage Systems Engineer to design, operate and maintain clusters, servers, and workstations storage supporting services where science happens at ORNL! This position resides in the Emerging Technologies & Computing team in the Research Computing group in the Information Technology Services Directorate at Oak Ridge National Laboratory (ORNL).

The Emerging Technology Computational Group facilitates goals through HPC systems engineering, integration, and support for the research community. By providing design, deployment, optimization, monitoring, and tooling support across multiple clustered storage infrastructures, we facilitate Lab-wide R&D projects. Our HPC clusters range in scope from just a handful of nodes to over fifty-thousand cores.

We partner with ORNL research organizations to enable research excellence and delivery. We work with other clustered computing and HPC groups to help research programs identify the best solutions for their needs. When we build our customer's environments, our team collaborates to design, implement, and maintain the systems from inception to retirement.

ESSENTIAL FUNCTIONS

  • Architect, deploy, and manage large-scale HPC storage systems, including parallel file systems such as Lustre, GPFS / Spectrum Scale, BeeGFS and WEKA
  • Design, implement, and operate large-scale Ceph storage clusters for HPC and research workloads, delivering reliable, high-performance object, block, and file storage services.
  • Ensure the availability, performance, scalability, and security of production storage environments.
  • Administer and optimize enterprise storage platforms such as Qumulo and NetApp in support of HPC and research workloads.
  • Design, deploy, and maintain archival storage solutions including Spectra Logic BlackPearl and large-scale tape libraries to ensure long-term data preservation and accessibility.
  • Integrate high-performance, enterprise, and archival storage layers into cohesive tiered storage architectures that balance cost, scalability, and performance for diverse scientific workflows.
  • Leverage automation and monitoring solutions to minimize day-to-day maintenance while identifying opportunities to optimize system performance and management.
  • Collaborate with researchers and technical POCs to support large data workflows and optimize I / O performance for scientific workloads.
  • Automate storage provisioning, monitoring, and maintenance using scripting and configuration management tools.
  • Diagnose and resolve complex storage and I / O-related issues in high-throughput, low-latency HPC environments.
  • Evaluate emerging storage technologies (NVMe, object storage, hierarchical storage management, burst buffers) and contribute to strategic planning for future HPC systems.
  • Work with 24 / 7 operations staff to streamline monitoring and troubleshooting, significantly reducing the need for off-hours support.
  • Deliver ORNL's mission by aligning behaviors, priorities, and interactions with our core values of Impact, Integrity, Teamwork, Safety, and Service. Promote equal opportunity by fostering a respectful workplace.

BASIC QUALIFICATIONS

  • A BS degree in computer science, computer engineering, information technology, information systems, science, engineering, or related discipline and 8-12 years of relevant professional experience; or an equivalent combination of education and experience.
  • Master's degree holders : 7-10 years of relevant experience.

  • PhD holders : 4-6 years of relevant experience.
  • Five (5) or more years managing UNIX / Linux systems.
  • Demonstrated experience managing HPC storage and large-scale enterprise storage systems.
  • Three (3) or more years working with configuration management and automation tools such as Git, Jenkins, Ansible, or Puppet.
  • Proficiency with at least one scripting language (Bash, Python, Perl, etc.).
  • Strong Linux administration and advanced troubleshooting experience.
  • Experience supporting large data systems and / or HPC scientific workloads.
  • Strong desire to innovate and evaluate new technologies for HPC and storage environments.
  • Collaborative approach and ability to become a trusted advisor to research teams.
  • DESIRED QUALIFICATIONS

  • Active DOE Q, DoD Top Secret, or TS / SCI clearance is strongly preferred.
  • Solid understanding of multiple operating systems and HPC cluster technologies.
  • Experience with Rocky / CentOS / RHEL, Ubuntu, VMware.
  • Understanding of HPC job schedulers (SLURM) and user support workflows.
  • Experience with container technologies in HPC environments.
  • Experience with multiple system deployment mechanisms (Warewulf, PXEboot, Cobbler, Bright).
  • Experience with GPU clusters (NVIDIA, AMD) for AI / ML and scientific workloads.
  • Deep expertise with high-performance parallel file systems (Lustre, GPFS / Spectrum Scale, BeeGFS, WEKA).
  • Knowledge of storage networking (Infiniband, NVMe-oF, SAN / NAS architectures).
  • Familiarity with RAID, ZFS, and object storage technologies.
  • Strong background in performance monitoring, benchmarking, and I / O optimization.
  • Experience with monitoring systems such as Grafana, CheckMK, Nagios, Zabbix, Ganglia.
  • Previous experience working in a government, scientific, or other highly technical environment.
  • Strong documentation skills and ability to prepare web-based documentation.
  • PHYSICAL REQUIREMENTS & ENVIRONMENTAL CONDITIONS

  • Inside office environment.
  • Working on a computer for long periods of time.
  • May involve long period of sitting at a desk.
  • The work environment is fast-paced and sometimes involves extreme deadline pressures.
  • OTHER DUTIES

    This job description is not designed to cover or contain a comprehensive listing of activities, duties or responsibilities that are required of the employee for this job. Duties, responsibilities and activities may change at any time with or without notice.

    Xcel Engineering is an Equal Opportunity / Affirmative Action Employer. All qualified applicants will receive consideration for employment without regards to race, color, religion, religious creed, gender, sexual orientation, gender identity, gender expression, transgender, pregnancy, marital status, national origin, ancestry, citizenship status, age, disability, protected Veteran Status, genetics or any other characteristics protected by applicable federal, state or local law.

    If you are a qualified individual with a disability or disabled veteran, you have the right to request a reasonable accommodation if you are unable or limited in your ability to use or access Xcel Engineering's current openings as a result of your disability. You can request reasonable accommodations by calling 855.212.1810. Thank you for your interest in Xcel Engineering.

    All positions at Xcel Engineering, Inc. are contingent upon passing both a background check and drug screening prior to a start date and are subject to random drug screenings during the employment period. In addition, Xcel Engineering is an E-Verify employer.

    Job Posted by ApplicantPro

    Create a job alert for this search

    Senior Storage Engineer • Oak Ridge, TN, US

    Related jobs
    • Promoted
    Senior Linux HPC Storage Engineer

    Senior Linux HPC Storage Engineer

    ITROak Ridge, TN, US
    Full-time
    Must be able to work a hybrid work schedule in Oak Ridge, TN.Must be eligible for a federal security clearance (US Citizen). Architect, deploy, and manage large-scale HPC storage systems, including ...Show moreLast updated: 10 days ago
    • Promoted
    Systems Engineer II - Storage

    Systems Engineer II - Storage

    VirtualVocationsKnoxville, Tennessee, United States
    Full-time
    A company is looking for a Systems Engineer II - Storage (REMOTE).Key Responsibilities Install and maintain storage devices in data centers and perform capacity planning Troubleshoot storage iss...Show moreLast updated: 1 day ago
    • Promoted
    Senior Linux System Administrator

    Senior Linux System Administrator

    VirtualVocationsKnoxville, Tennessee, United States
    Full-time
    A company is looking for a Senior Linux System Admin - Federal.Key Responsibilities Administer and operate the global cloud infrastructure for a SaaS product Develop automation tools in Python, ...Show moreLast updated: 1 day ago
    • Promoted
    Senior Systems Engineer

    Senior Systems Engineer

    VirtualVocationsKnoxville, Tennessee, United States
    Full-time
    A company is looking for a Senior Systems Engineer.Key Responsibilities Design, support, and manage corporate infrastructure with Microsoft M365 experience Mentor team members and immerse in new...Show moreLast updated: 30+ days ago
    • Promoted
    Cook

    Cook

    SonicOneida, TN, US
    Full-time
    Hot burgers, cold shakes, and little moments of magic right in the neighborhood.At SONIC, we do things a little differently. We find the fun, the moment of chill in the every-day.Working at SONIC, y...Show moreLast updated: 30+ days ago
    • Promoted
    Systems Engineer

    Systems Engineer

    VirtualVocationsKnoxville, Tennessee, United States
    Full-time
    A company is looking for a Systems Scripting Engineer to develop automation solutions for IT operations software.Key Responsibilities Build content for IT operations software products through scr...Show moreLast updated: 30+ days ago
    • Promoted
    Faxcom Systems Engineer

    Faxcom Systems Engineer

    VirtualVocationsKnoxville, Tennessee, United States
    Full-time
    A company is looking for a System Engineer (Faxcom) for a fully remote position.Key Responsibilities Review, troubleshoot, and respond to end-user issues and requests in a timely manner Manage d...Show moreLast updated: 1 day ago
    • Promoted
    Product Owner-Learning Systems

    Product Owner-Learning Systems

    Cirrus Design CorporationAlcoa, TN, US
    Full-time
    SR Series piston aircraft and the Vision Jet™, the world’s first single engine Personal Jet and recipient of the 2017 Robert J. Founded in 1984, the company has redefined performance, co...Show moreLast updated: 6 days ago
    • Promoted
    Carhop

    Carhop

    SonicOneida, TN, US
    Full-time
    Hot burgers, cold shakes, and little moments of magic right in the neighborhood.At SONIC, we do things a little differently. We find the fun, the moment of chill in the every-day.Working at SONIC, y...Show moreLast updated: 3 days ago
    • Promoted
    HPC Support Engineering Manager

    HPC Support Engineering Manager

    VirtualVocationsKnoxville, Tennessee, United States
    Full-time
    A company is looking for a Manager, HPC Support Engineering.Key Responsibilities Lead, coach, and mentor a team of HPC Support Engineers to ensure high-quality customer support Own customer esca...Show moreLast updated: 1 day ago
    • Promoted
    System Support Engineer

    System Support Engineer

    VirtualVocationsKnoxville, Tennessee, United States
    Full-time
    A company is looking for a System Support Engineer located in Pleasanton, CA.Key Responsibilities Ship and receive Liats for depot repair service Induct new Liats into the equipment management s...Show moreLast updated: 1 day ago
    • Promoted
    • New!
    Licensed Principal Engineer - Data Centers

    Licensed Principal Engineer - Data Centers

    VirtualVocationsKnoxville, Tennessee, United States
    Full-time
    A company is looking for an Associate Principal Engineer - Data Centers.Key Responsibilities Lead delivery of complex projects for specific offerings, verticals, and clients Develop best-in-clas...Show moreLast updated: 10 hours ago
    • Promoted
    CDL-A Flatbed Drivers : Earn $1700-$1800+ / Wk incl. Tarp Pay. Deluxe Equip! Knoxville, TN

    CDL-A Flatbed Drivers : Earn $1700-$1800+ / Wk incl. Tarp Pay. Deluxe Equip! Knoxville, TN

    Alabama Motor ExpressHarriman, TN, USA
    Full-time
    Our Flatbed Freight Volumes are BOOMING! If You Want Lots of Miles and to Earn BIG Weekly Paychecks, Contact Us Today!.No Flatbed Experience Necessary. Flatbed Driver Benefits Include : .Top Drivers A...Show moreLast updated: 2 days ago
    • Promoted
    • New!
    System Engineer

    System Engineer

    VirtualVocationsKnoxville, Tennessee, United States
    Full-time
    A company is looking for a System Engineer who is knowledgeable in Microsoft Defender and Endpoint Management.Key Responsibilities Support the migration from CrowdStrike to Microsoft Defender for...Show moreLast updated: 12 hours ago
    • Promoted
    Senior Solutions Engineer

    Senior Solutions Engineer

    VirtualVocationsKnoxville, Tennessee, United States
    Full-time
    A company is looking for a Senior Solution Engineer to join their dynamic APAC Solutions Engineering team.Key Responsibilities Own the technical engagement in pre-sales opportunities and build tr...Show moreLast updated: 30+ days ago
    • Promoted
    Principal Engineer, Opcenter Solutions

    Principal Engineer, Opcenter Solutions

    VirtualVocationsKnoxville, Tennessee, United States
    Full-time
    A company is looking for an Opcenter Solutions Principal Engineer.Key Responsibilities Lead and mentor a global team of support and development engineers Oversee incident ticket lifecycle and ma...Show moreLast updated: 1 day ago
    • Promoted
    • New!
    Systems Engineer - Platform Administrator

    Systems Engineer - Platform Administrator

    VirtualVocationsKnoxville, Tennessee, United States
    Full-time
    A company is looking for a Systems Engineer - Platform Administrator.Key Responsibilities Install, configure, and maintain enterprise servers, networks, and virtualization platforms Manage and s...Show moreLast updated: 6 hours ago
    • Promoted
    CDL-A Drivers : Earn Up To $1500+ / Wk (paid hrly or cpm)! 100% No Touch Knoxville, TN

    CDL-A Drivers : Earn Up To $1500+ / Wk (paid hrly or cpm)! 100% No Touch Knoxville, TN

    Alabama Motor ExpressLa Follette, TN, USA
    Full-time
    Take Advantage of Our Freight Network & Decked Out Trucks To Log Big Miles and Earn Big Paychecks! .AMX Network Driver Benefits Include : . Pay Based on Hourly Rate or CPM .AMX is Committed to Getting...Show moreLast updated: 2 days ago