Talent.com
System Analyst II - Site Reliability Engineer

System Analyst II - Site Reliability Engineer

Duke Clinical Research InstituteDurham, NC, United States
25 days ago
Job type
  • Full-time
Job description

At Duke Health, we're driven by a commitment to compassionate care that changes the lives of patients, their loved ones, and the greater community. No matter where your talents lie, join us and discover how we can advance health together.

About Duke Health Technology Solutions

Pursue your passion for caring and innovation with Duke Heath Technology Solutions, which is dedicated to the transformation, development, and management of enterprise information technology solutions across Duke Health.By harnessing the power of innovative technologies like cloud computing and artificial intelligence - and pairing them with a forward-thinking approach - Duke Health Technology Solutions is revolutionizing the future of health care at Duke Health and beyond.

Occupational Summary

The DHTS Systems Analyst-Site Reliability Engineer (SRE) is responsible for designing, implementing, and maintaining large-scale distributed systems with a focus on reliability, scalability, and performance. The SRE collaborates with development teams to ensure that applications and services are designed and operated to meet reliability targets and scale efficiently. This role involves working with OpenShift for on-premises environments and Azure Kubernetes Service (AKS) for cloud-based solutions.

Essential Tasks / Responsibilities

Level 1 (DHTS System Analyst 1)

Under direct supervision, assist in monitoring and maintaining production systems to ensure high availability and performance, including OpenShift clusters on-premises and AKS in the cloud.

Participate in on-call rotations to respond to system alerts and incidents.

Assist in troubleshooting and resolving system issues and outages across both on-premises and cloud environments.

Help implement and maintain automation scripts for routine tasks and deployments in OpenShift and AKS.

Contribute to the creation and maintenance of documentation for systems and processes.

Assist in capacity planning and performance tuning of systems in both OpenShift and AKS environments.

Participate in post-incident reviews and help implement recommendations.

Learn and apply SRE best practices and methodologies specific to container orchestration platforms.

Collaborate with development teams to improve system reliability and efficiency across on-premises and cloud infrastructures.

Level 2 (DHTS System Analyst 2)

In addition to the duties described for Level 1, the Level 2 SRE will :

  • Independently design and implement monitoring solutions for complex systems in OpenShift

and AKS environments.

  • Lead incident response efforts and coordinate with multiple teams during outages, considering
  • the nuances of both on-premises and cloud infrastructures.

  • Develop and implement automation solutions to improve system reliability and efficiency across
  • OpenShift and AKS platforms.

  • Conduct thorough root cause analysis for incidents and propose long-term solutions that align
  • with the organization's hybrid infrastructure strategy.

  • Contribute to the design and implementation of disaster recovery and business continuity plans,
  • leveraging both on-premises and cloud resources.

  • Mentor junior team members and provide technical guidance on OpenShift and AKS best
  • practices.

  • Participate in the evaluation and implementation of new technologies and tools that
  • complement OpenShift and AKS environments.

  • Collaborate with development teams to define and implement SLIs, SLOs, and SLAs across both
  • platforms.

  • Contribute to the development of architectural improvements to enhance system reliability and
  • scalability in a hybrid infrastructure model.

    Level 3 (DHTS System Analyst 3)

    In addition to the duties described for Level 2, the Level 3 SRE will :

  • Function as a technical leader and subject matter expert in reliability engineering, with deep
  • expertise in both OpenShift and AKS.

  • Lead the design and implementation of large-scale, complex distributed systems across onpremises
  • OpenShift and cloud-based AKS environments.

  • Develop and implement strategies for continual improvement of system reliability,
  • performance, and efficiency in a hybrid infrastructure model.

  • Lead cross-functional projects to improve overall system architecture and reliability, considering
  • the strengths and limitations of both OpenShift and AKS.

  • Provide advanced troubleshooting and problem-solving for critical production issues in both onpremises
  • and cloud environments.

  • Develop and maintain relationships with key stakeholders across the organization to align SRE
  • practices with business objectives.

  • Drive the adoption of SRE best practices and methodologies across the organization, tailored to
  • the specific needs of OpenShift and AKS platforms.

  • Contribute to the definition of technical standards and best practices for the SRE team, ensuring
  • consistency across on-premises and cloud environments.

  • Mentor and provide technical leadership to junior and mid-level SREs in both OpenShift and AKS
  • technologies.

  • Participate in strategic planning for infrastructure and reliability improvements, considering the
  • long-term evolution of the hybrid infrastructure model.

  • Represent the SRE team in high-level technical discussions and decision-making processes
  • related to container orchestration and cloud strategy.

    Advancement to the next level requires employee, at a minimum, successfully attain the following :

    1. Proven ability to work at the next level : This involves demonstrating the skills and competencies

    required for the next level of responsibility. Employees should have demonstrated that they can

    handle tasks and challenges that are typically associated with the higher position.

    2. Potential to serve beyond the next level : This measure looks at the employee's long-term

    potential and their ability to grow within the organization. The employee should have the vision,

    ambition, and capability to take on even greater responsibilities in the future.

    3. Consistently demonstrates a values-based approach in how they work : Employees should

    consistently exhibit behaviors and decision-making processes that align with DUHS values. The

    exhibited values are integrity, teamwork, diversity excellence and safety. Patient-focused is also

    critical to success.

    4. Is considered one of the top performers at their level across the organization : This measure

    evaluates the employee's overall performance and reputation within DHTS. Top performers are

    often recognized for their exceptional contributions, reliability, and ability to exceed expectations.

    We will select the best and not the best available.

    Required Qualifications at this Level

    Education

    Bachelor's degree in a related field is preferred, or equivalent work experience.

    Experience

  • Level 1 (DHTS System Analyst 1) : 0-4 years of software development experience and / or IT
  • solutions engineering.

  • Level 2 (DHTS System Analyst 2) : Minimum 5 years of software development experience and / or
  • IT solutions engineering.

  • Level 3 (DHTS System Analyst 3) : Minimum 10 years of software development experience
  • and / or IT solutions engineering.

    Required Skills and Knowledge

    Level 1 (DHTS System Analyst 1)

  • Basic understanding of Application Development Lifecycle, ideally with DevOps focus
  • Familiarity with script writing (e.g., Ansible Playbooks, Helm Charts)
  • Basic knowledge of containerization and orchestration technologies (Docker, Kubernetes,
  • OpenShift)

  • Familiarity with CI / CD technologies like GitLab CI or GitHub Actions
  • Basic understanding of server administration (preferably Linux)
  • Understanding of networking topologies, firewall rules, and certificate management
  • Ability to analyze customer requirements and translate into effective solutions
  • Critical thinking and problem-solving skills
  • Strong customer service orientation
  • Basic troubleshooting and root cause analysis skills
  • Familiarity with project management and Agile / SCRUM methodologies
  • Proficiency in at least one programming language (e.g., Python, Go, Java)
  • Familiarity with version control systems (e.g., Git)
  • Level 2 (DHTS System Analyst 2)

    All Level 1 skills, plus :

  • Strong experience with Application Development Lifecycle, with a DevOps focus
  • Proficiency in script writing (e.g., Ansible Playbooks, Helm Charts)
  • Extensive experience with containerization and orchestration technologies (Docker, Kubernetes,
  • OpenShift)

  • Strong experience with CI / CD technologies and practices
  • Advanced knowledge of server administration (preferably Linux)
  • Solid understanding of networking topologies, firewall rules, and certificate management
  • Proven ability to analyze complex customer requirements and translate into effective solutions
  • Advanced troubleshooting and root cause analysis skills
  • Strong project management skills, including Agile / SCRUM experience
  • Experience with cloud platforms (AWS, Azure, GCP) and services (SaaS, IaaS, PaaS, FaaS)
  • Knowledge of Enterprise Architecture best practices
  • Familiarity with AI and ML concepts
  • Level 3 (DHTS System Analyst 3)

    All Level 2 skills, plus :

  • Technical leadership in application development with a DevOps / CI focus
  • Technical leadership in automation (Ansible, Terraform, Bash)
  • Extensive experience with Continuous Integration / Continuous Delivery
  • Extensive experience with server administration
  • Expert knowledge of network and security concepts
  • Proven ability to lead and mentor teams in adopting and optimizing container orchestration
  • practices

  • Expert knowledge of cloud platforms (AWS, Azure, GCP) and services (SaaS, IaaS, PaaS, FaaS)
  • Expert knowledge of Enterprise Architecture best practices
  • Advanced knowledge of AI and ML concepts and their application in SRE practices
  • Desired Skills (All Levels)

  • Red Hat OpenShift certifications
  • CKA (Certified Kubernetes Administrator) or CKAD (Certified Kubernetes Application Developer)
  • certifications

  • Experience with multi-cloud environments
  • Knowledge of FHIR APIs and healthcare-specific technologies
  • Excellent time management, organizational, and task prioritization skills
  • Strong presentation skills
  • Ability to communicate effectively with non-technical staff and members of interdisciplinary
  • teams

  • Ability to interact well and effectively communicate with all levels of leadership
  • Experience with data and system flow diagramming
  • Familiarity with vulnerability management and patching for application containers
  • Additional Responsibilities (All Levels)

  • Provide application system support for team apps, including rotating 24x7 support
  • Develop relationships with vendors to ensure customer needs are met in a timely manner
  • Author and update system documentation to share all knowledge acquired in the developer
  • guide

  • Ensure systems conform to Duke Information Security Office policies and procedures
  • Assist in oral and written presentations to project teams, customers, and management
  • Coordinate and perform application testing
  • Follow established Change Management processes
  • Provide feedback on departmental processes and procedures and suggest improvements
  • Plan and coordinate system and application upgrades
  • Identify internal resources to build project teams as required
  • Perform detailed analysis and documentation of customer workflows
  • Collaborate with Administrative, Clinical, and Research customers to understand and meet
  • needs

  • Develop relationships with key customer management representatives
  • Intent :

    The intent of this job description is to provide a representative and level of the types of duties and

    responsibilities that will be required of positions given this title and shall not be construed as a

    declaration of the total of the specific duties and responsibilities of any particular position. Employees

    may be directed to perform job-related tasks other than those specifically presented in this description.

    Equal Opportunity :

    Duke University is an Affirmative Action / Equal Opportunity Employer committed to providing

    employment opportunity without regard to an individual's age, color, disability, gender, gender

    expression, gender identity, genetic information, national origin, race, religion, sex, sexual orientation,

    or veteran status.

    Duke aspires to create a community built on collaboration, innovation, creativity, and belonging. Our

    collective success depends on the robust exchange of ideas-an exchange that is best when the rich

    diversity of our perspectives, backgrounds, and experiences flourishes. To achieve this exchange, it is

    essential that all members of the community feel secure and welcome, that the contributions of all

    individuals are respected, and that all voices are heard. All members of our community have a

    responsibility to uphold these values.

    Essential Job Function :

    Certain jobs at Duke University and Duke University Health System may include essential job functions

    that require specific physical and / or mental abilities. Additional information and provision for requests

    for reasonable accommodation will be provided by each hiring department.

    Duke is an Equal Opportunity Employer committed to providing employment opportunity without regard to an individual's age, color, disability, gender, gender expression, gender identity, genetic information, national origin, race, religion, sex (including pregnancy and pregnancy related conditions), sexual orientation or military status.

    Duke aspires to create a community built on collaboration, innovation, creativity, and belonging. Our collective success depends onthe robust exchange of ideas-an exchange that is best when the rich diversity of our perspectives, backgrounds, and experiences flourishes. To achieve this exchange, it is essential that all members of the community feel secure and welcome, that the contributions of all individuals are respected, and that all voices are heard. All members of our community have a responsibility to uphold these values.

    Essential Physical Job Functions : Certain jobs at Duke University and Duke University Health System may include essential job functions that require specific physical and / or mental abilities. Additional information and provision for requests for reasonable accommodation will be provided by each hiring department.

    Create a job alert for this search

    Site Reliability Engineer • Durham, NC, United States

    Related jobs
    • Promoted
    Systems Engineer II - DataBase Patching Engineer (Remote)

    Systems Engineer II - DataBase Patching Engineer (Remote)

    First Citizens BankRaleigh, NC, US
    Remote
    Full-time
    This is a remote role that may only be hired in the following location(s) : AZ, FL, GA, NC and TX.We are seeking an experienced Oracle Patch Engineer to join our Infrastructure & Database Patch ...Show moreLast updated: 12 days ago
    Engineer –System Controls

    Engineer –System Controls

    GKN AutomotiveMebane, NC, US
    Full-time
    Quick Apply
    GKN is a global engineering group that designs, manufactures, and services systems and components for the world’s leading aircraft, vehicle, and machinery manufacturers. With a legacy of innovation ...Show moreLast updated: 5 days ago
    • Promoted
    Senior System Software Engineer

    Senior System Software Engineer

    SPECTRAFORCERaleigh, NC, US
    Full-time
    Sr Principal Software Systems Engineer.At Client, we are on a mission to revolutionize healthcare through innovative technology. As a Senior Principal Systems Engineer in our software organization, ...Show moreLast updated: 2 days ago
    • Promoted
    Sr. Application Engineer - Remote

    Sr. Application Engineer - Remote

    CBRERaleigh, NC, United States
    Remote
    Full-time
    Administrative, Data & Analytics, Engineering / Maintenance, Project Management.Dallas - Texas - United States of America, Fort Worth - Texas - United States of America, Las Vegas - Nevada - United S...Show moreLast updated: 4 days ago
    Senior Systems Engineer

    Senior Systems Engineer

    Stark Pharma Solutions IncNC, United States
    Full-time
    Quick Apply
    Job Title : Senior Systems Engineer Location : Raleigh, NC Experience : 10+...Show moreLast updated: 12 days ago
    Reliability Technician

    Reliability Technician

    Mentor Technical GroupDurham, NC, US
    Full-time
    Quick Apply
    Mentor Technical Group (MTG) provides a comprehensive portfolio of technical support and solutions for the FDA-regulated industry. As a world leader in life science engineering and technical solutio...Show moreLast updated: 30+ days ago
    • Promoted
    Applications Systems Analyst Sr - Epic - Credentialing

    Applications Systems Analyst Sr - Epic - Credentialing

    UNC Health CareMORRISVILLE, North Carolina, United States
    Full-time
    Become part of an inclusive organization with over 40,000 teammates, whose mission is to improve the health and well-being of the unique communities we serve. Provides a high level of technical supp...Show moreLast updated: 23 days ago
    • Promoted
    Applications Systems Analyst - Epic Beaker

    Applications Systems Analyst - Epic Beaker

    UNC Health CareMORRISVILLE, North Carolina, United States
    Full-time
    Become part of an inclusive organization with over 40,000 teammates, whose mission is to improve the health and well-being of the unique communities we serve. The responsibilities of this role will ...Show moreLast updated: 19 days ago
    • Promoted
    Business Systems Analyst III - Enterprise Payments

    Business Systems Analyst III - Enterprise Payments

    First Citizens BankRaleigh, NC, US
    Full-time
    This is a remote opportunity and can only be hired in the following locations : AZ or NC.This position delivers strategic insight into business systems through complex analysis, reporting, and opera...Show moreLast updated: 30+ days ago
    • Promoted
    Business Systems Analyst II (Remote - North Carolina, Arizona)

    Business Systems Analyst II (Remote - North Carolina, Arizona)

    First Citizens BankRaleigh, NC, US
    Remote
    Full-time
    This is a remote role that may only be hired in the following location(s) : North Carolina, Arizona.Core banking delivery team, which provides technology solutions and integration services in follow...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Lead Integration Analyst II

    Lead Integration Analyst II

    Carpenter Technology USADurham, North Carolina, US
    Full-time
    Carpenter Technology Corporation is a leading producer and distributor of premium specialty alloys, including titanium alloys, nickel and cobalt based superalloys, stainless steels, alloy steels an...Show moreLast updated: 5 hours ago
    • Promoted
    System Data Analyst

    System Data Analyst

    Eliassen GroupDurham, NC, US
    Full-time
    In this role, you will be responsible for Requirements, Analysis, and high-level Design, Development for our Oracle Database Development and Enablement Team and the Operations Experience Product Ar...Show moreLast updated: 4 days ago
    • Promoted
    Sr. Application Engineer - Remote East Region

    Sr. Application Engineer - Remote East Region

    CBRERaleigh, NC, United States
    Remote
    Full-time
    Application Engineer - Remote East Region.Administrative, Data & Analytics, Engineering / Maintenance, Facilities Management, Project Management. Atlanta - Georgia - United States of America, Beverly ...Show moreLast updated: 4 days ago
    Lead Software Systems Engineer

    Lead Software Systems Engineer

    Stark Pharma Solutions IncNC, United States
    Full-time
    Quick Apply
    Job Title : Lead Software Systems Engineer Location : Raleigh, NC Experience : < / b&g...Show moreLast updated: 5 days ago
    Sr. Principal Software Systems Engineer

    Sr. Principal Software Systems Engineer

    Stark Pharma Solutions IncNC, United States
    Full-time
    Quick Apply
    Principal Software Systems Engineer Location : Raleigh, NC Experience : 10+ years<...Show moreLast updated: 12 days ago
    • Promoted
    Remote Syncade MES Enginee

    Remote Syncade MES Enginee

    Insight GlobalHolly Springs, NC, United States
    Remote
    Full-time
    MES Engineer / Syncade Developer – 3 openings.Fully Remote after 1 month of onsite Training (Paid For).Manufacturing Execution Systems. Experience debugging configuration items within the Syncade / ...Show moreLast updated: 3 days ago
    • Promoted
    Staff System Applications Engineer

    Staff System Applications Engineer

    1010 Analog Devices Inc.Durham, NC, United States
    Full-time +1
    NASDAQ : ADI ) is a global semiconductor leader that bridges the physical and digital worlds to enable breakthroughs at the Intelligent Edge. ADI combines analog, digital, and software technologie...Show moreLast updated: 1 day ago
    • Promoted
    Systems Engineer II - Workstation Patch Management (Remote)

    Systems Engineer II - Workstation Patch Management (Remote)

    First Citizens BankRaleigh, NC, US
    Remote
    Full-time
    This is a remote role that may only be hired in the following location(s) : AZ, FL, GA, NC and TX.First Citizens Bank is one of the top financial services providers, recognized and awarded for our c...Show moreLast updated: 8 days ago
    Reliability Engineer

    Reliability Engineer

    Eli LillyNC, US
    Full-time
    At Lilly, we unite caring with discovery to make life better for people around the world.We are a global healthcare leader headquartered in Indianapolis, Indiana. Our employees around the world work...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Information Systems Technician

    Information Systems Technician

    NavySiler City, NC, United States
    Full-time
    ABOUT Effective, secure communication in the cyber domain is essential to the everyday operations of military intelligence in America’s Navy. Information Professionals who oversee the seamless opera...Show moreLast updated: 6 hours ago