Talent.com
Platform Site Reliability Engineer

Platform Site Reliability Engineer

NexthinkPhoenix, AZ, US
30+ days ago
Job type
  • Full-time
Job description

Job Description

Job Description

Company Description

Nexthink is the leader in digital employee experience management software. The company provides IT leaders with unprecedented insight allowing them to see, diagnose and fix issues at scale impacting employees anywhere, with any application or network, before employees notice the issue. As the first solution to allow IT to progress from reactive problem solving to proactive optimization, Nexthink enables its more than 1,200 customers to provide better digital experiences to more than 15 million employees. Dual headquartered in Lausanne, Switzerland and Boston, Massachusetts, Nexthink has 9 offices worldwide.

#LI-Hybrid

Job Description

Nexthink is looking for a strong Platform Engineer with SRE operations experience to strengthen our infrastructure and accelerate our ability to deploy, monitor, and scale systems effectively. As a SaaS provider, our customers rely on us to deliver a seamless, reliable, and scalable experience 24 / 7.  This role needs to be located in West or Mountain Time Zone.

Join Nexthink's vibrant team where cutting-edge technology meets innovation. Be a part of Nexthink's Digital Employee Experience technological revolution, ensuring our global customers enjoy a seamless user experience. Embrace the future with Nexthink in US; apply now and become a key player in our dynamic Platform Engineering / SRE organization.

What You'll Do :

  • Design, build, and maintain the infrastructure powering our multi-tenant SaaS platform with reliability, security, and scalability in mind.
  • Implement and manage cloud-native systems (AWS) using best-in-class tools and automation.
  • Operate and enhance Kubernetes clusters , deployment pipelines, and service meshes to support continuous delivery.
  • Establish and enforce SLOs, SLAs, and error budgets , and proactively address availability and performance issues.
  • Develop infrastructure as code (Terraform or similar) for repeatable and auditable provisioning.
  • Experience in programming solutions for Platform Tools such as for automation, monitoring, provisioning, using programming technologies.
  • Solid understanding of the network stack (TCP / IP, VPN, HTTP, SSL, routing, etc.), cloud topologies (VPC, Virtual Subnets, NACLS, NSG, ILB, ELB, etc.) and storage (S3, EBS, Azure Files etc).
  • Monitor system health, application performance, and user-facing SLAs using tools like Datadog, Prometheus, Grafana ...
  • Be a main actor and improve incident response practices and help reduce mean time to detect (MTTD) and recover (MTTR). Experience in coordinating teams and persons to maintain a SLA.
  • Ability to troubleshoot, narrow down and fix incidents with minimal intervention of other functions.
  • Participate in a shared on-call rotation , responding to incidents, troubleshooting outages, and driving timely resolution and communication.
  • Work closely with software engineers to embed reliability and observability into every service.
  • Develop automated runbooks, health checks, and alerting to support reliable operations with minimal manual intervention.
  • Support automated testing , canary deployments , and rollback strategies to ensure safe, fast, and reliable releases.
  • Contribute to security best practices , compliance automation, and cost optimization.

Qualifications

  • Minimum BS in Computer Science / Engineering
  • 5+ years in an SRE / platform engineering role supporting SaaS platforms.
  • Strong hands-on experience with public cloud services (AWS, GCP, Azure).
  • Proficiency with Kubernetes , container-based deployment and related ecosystems (Helm...), and containerized microservices.
  • Strong programming or scripting skills (Python, Go, Bash...).
  • Experience with CI / CD pipelines (e.g., GitHub Actions, GitLab CI, ArgoCD).
  • Experience with observability stacks (Prometheus, ELK / EFK, Datadog, etc.).
  • Comfort with being part of a rotating on-call schedule , including handling critical incidents and conducting post-incident reviews.
  • Strong system-level troubleshooting skills and a proactive mindset toward incident prevention.
  • Deep understanding of Linux systems , networking, and common troubleshooting practices.
  • Experience supporting multi-tenant microservices architectures .
  • Familiarity with service mesh , e.g., Istio.
  • Knowledge of zero-downtime deployment strategies , blue / green and canary releases.
  • Exposure to compliance standards such as SOC 2, ISO 27001, or HIPAA. FedRAMP experience is a big plus.
  • Experience with chaos engineering or resilience testing practices.
  • Additional Information

    We are the pioneers and trailblazers of a global IT Market Category (DEX) that is shaping the future of how the world works, giving our customers’ IT Teams total digital visibility across their enterprise. Our innovative solutions integrate real-time analytics, automation, and employee feedback across all endpoints. This enables our IT teams to solve complex technical challenges, create ever more productive workplaces, and deliver happy, satisfied employees in the digital workplace.

    With over 1000 employees across 5 continents, Nexthink operates as One Team, connecting, collaborating and innovating to continuously grow. We call our employees ‘Nexthinkers’ and our commitment to diversity, inclusion, and equity is second to none. We currently have over 75 nationalities working with us, from all cultures and backgrounds, speaking many different languages.

    Total Rewards @ Nexthink

    At Nexthink, we offer one of the most comprehensive and generous benefits plans.  Your total rewards compensation package includes base salary and may also include a commission or performance bonus plan.  We provide our US employees with 100% covered company benefits that consist of health, dental, vision as well as access to life insurance, long-term disability, and accidental death / personal loss coverage.

    In addition, we offer :

  • ️ Flexible Hours and unlimited vacation (employees have unlimited paid time off on top of the 15 days of holidays we offer), 11 company-paid holidays, and 3 extra days for volunteering.
  • Hybrid work model that balances office and remote work, with structured onboarding to foster connections and team integration.
  • Free access to professional training platforms to explore your interests and enhance your skills.
  • Up to 16 weeks of paid leave for birthing parents / primary caregivers, 6 weeks for secondary caregivers.
  • Plan for the future with a 401(k) plan featuring up to 4% company matching contributions, vesting immediately, to grow your retirement savings.
  • Bonuses for referring successful hires after three months of continuous employment.
  • Base salary ranges are determined by country, role, level, experience, and skills . The range displayed on each job posting reflects Nexthink’s good faith determination of the minimum and maximum targets for new hire salaries across all US locations. Individual pay is determined by related factors, including job skills, experience, and relevant education or training, which may impact a final offer. Your Talent Acquisition Partner can share more about the specific salary range during the hiring process.

    Create a job alert for this search

    Site Reliability Engineer • Phoenix, AZ, US

    Related jobs
    • Promoted
    Data Center Facility Operations Reliability Engineer

    Data Center Facility Operations Reliability Engineer

    METAPhoenix, AZ, United States
    Full-time
    Meta was built to help people connect and share, and over the last decade, our tools have played a critical part in changing how people around the world communicate with one another.With over two b...Show moreLast updated: 8 days ago
    • Promoted
    System Reliability Engineer

    System Reliability Engineer

    Strategic Staffing SolutionsChandler, AZ, United States
    Full-time
    STRATEGIC STAFFING SOLUTIONS HAS AN OPENING!.This is a Contract Opportunity with our company that MUST be worked on a W2 Only. No C2C eligibility for this position.S3 never asks for money during its...Show moreLast updated: 8 days ago
    • Promoted
    Site Reliability Engineering

    Site Reliability Engineering

    ForhyrePhoenix, AZ, US
    Full-time
    Forhyre is looking for engineers who can bring unique perspectives and innovative ideas to all areas of development and are interested in continuing to improve our platform through the ever-changin...Show moreLast updated: 30+ days ago
    Site Reliability Engineer Charlotte, NC / , AZ / Jersey City, NJ

    Site Reliability Engineer Charlotte, NC / , AZ / Jersey City, NJ

    Career Mentors, LLCChandler, AZ, US
    Full-time
    Pay Rate : upto $75 pr hr on W2.Jersey City, NJ - Near by candidates.Previously functioned in an SRE role within a large production environment, with a focus on automation testing experience.Hands-o...Show moreLast updated: 30+ days ago
    • Promoted
    Onsite Supervisor

    Onsite Supervisor

    Staffmark GroupSurprise, AZ, United States
    Full-time
    We are currently hiring a Bilingual (English / Spanish speaking) Onsite Supervisor.Our ideal candidate will possess a passion for impacting lives and our community. The Onsite Supervisor plays a criti...Show moreLast updated: 20 days ago
    • Promoted
    Air Interdiction Agent

    Air Interdiction Agent

    U.S. Customs and Border ProtectionCarefree, Arizona, US
    Full-time +1
    Pilot CBP Air Interdiction Agent.Considering making an application for this job Check all the details in this job description, and then click on Apply. Air and Marine Operations (AMO), a component o...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Reliability Engineer

    Senior Reliability Engineer

    ZEMLOCK LLCPhoenix, AZ, US
    Full-time
    Where You Will Work .Our global headquarters is in Phoenix, Arizona.Several hundred employees support global operations in finance, human resources, information technology, p...Show moreLast updated: 30+ days ago
    • Promoted
    Core Operations Technician

    Core Operations Technician

    EdgeCore Digital InfrastructureMesa, AZ, United States
    Full-time
    We’re looking for curious, proactive professionals who thrive in environments where safety, precision, and teamwork are everything. If you’re passionate about complex systems, committed to doing thi...Show moreLast updated: 30+ days ago
    • Promoted
    Reliability Engineer

    Reliability Engineer

    Commercial MetalsMesa, AZ, United States
    Full-time
    There's more to CMC than our products and the buildings, structures, and roads they go into.At CMC, it's the people inside our recycling centers, fabrication plants, manufacturing facilities, steel...Show moreLast updated: 8 days ago
    • Promoted
    Site Engineer II

    Site Engineer II

    Digital RealtyChandler, AZ, United States
    Full-time
    Position Title : Site Engineer II.The Site Engineer I position is a contributing member to the site level Data Center Operations team assigned to one or more of our data center properties reporting ...Show moreLast updated: 5 days ago
    • Promoted
    Site Engineer

    Site Engineer

    Insight GlobalPhoenix, AZ, United States
    Full-time
    Insight Global is looking for a Site Engineer for an energy and automation digital solutions company.This position will sit remotely for 1 week out of the month and on project site for 3 weeks out ...Show moreLast updated: 2 days ago
    • Promoted
    Site-Reliability Engineer

    Site-Reliability Engineer

    Axiom Software Solutions LimitedPhoenix, AZ, US
    Full-time
    Min 3-5 years of experience writing automation scripts and building dashboards for Application Performance management to manage Transaction journeys. Experience working with Programming languages su...Show moreLast updated: 30+ days ago
    • Promoted
    Reliability Engineer, Early Career

    Reliability Engineer, Early Career

    ViasatTempe, AZ, United States
    Full-time
    At Viasat, we're on a mission to deliver connections with the capacity to change the world.For more than 35 years, Viasat has helped shape how consumers, businesses, governments and militaries arou...Show moreLast updated: 4 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    iSoftTek Solutions IncPhoenix, AZ, US
    Full-time
    As a Site Reliability Engineer at iSoftTek Solutions Inc, you will be responsible for ensuring the reliability, performance, and scalability of our applications and infrastructure.You will work clo...Show moreLast updated: 30+ days ago
    • Promoted
    Production Team Member

    Production Team Member

    Staffmark GroupMaricopa, AZ, United States
    Full-time
    Temp-to-Hire : The Smart Way to Build Your Career!.Staffmark is hiring a Production Team Member in Casa Grande, AZ!.Secure a position in a climate-controlled (A / C) warehouse.If you're a hands-on pro...Show moreLast updated: 30+ days ago
    • Promoted
    Reliability Engineer

    Reliability Engineer

    CMC Metals LLCMesa, AZ, United States
    Full-time
    CMC provides an excellent opportunity to learn the steel, construction reinforcement and ground stabilization industries and to grow in your career. Whether you will spend your day brainstorming in ...Show moreLast updated: 2 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    KUBRATempe, AZ, US
    Permanent
    Are you passionate about transforming and optimizing complex infrastructures? Do you thrive on solving challenging technical problems and ensuring high availability, security, and performance in cl...Show moreLast updated: 20 days ago
    • Promoted
    Inside Solutions Advisor (Outbound), Cox Business

    Inside Solutions Advisor (Outbound), Cox Business

    Cox CommunicationsLitchfield Park, AZ, United States
    Full-time
    As an Inside Solutions Advisor at Cox, you'll be the bridge that brings it all together.At Cox Business, we help companies adopt new technologies that deliver mobility, scalability and growth.Our s...Show moreLast updated: 30+ days ago