Talent.com
Senior Site Reliability Engineer
Senior Site Reliability EngineerSHEIN Technology LLC • San Diego, CA, United States
Senior Site Reliability Engineer

Senior Site Reliability Engineer

SHEIN Technology LLC • San Diego, CA, United States
1 day ago
Job type
  • Full-time
Job description

This range is provided by SHEIN Technology LLC. Your actual pay will be based on your skills and experience talk with your recruiter to learn more.

Base pay range

$107,600.00 / yr - $180,200.00 / yr

Direct message the job poster from SHEIN Technology LLC

Senior Talent Acquisition Partner (Technology)

About SHEIN

SHEIN is a global online fashion and lifestyle retailer, offering SHEIN branded apparel and products from a global network of vendors, all at affordable prices. Headquartered in Singapore, with more than 15,000 employees operating from offices around the world, SHEIN is committed to making the beauty of fashion accessible to all, promoting its industry-leading, on-demand production methodology, for a smarter, future-ready industry.

Position Summary

We are looking for an experienced Senior Site Reliability Engineer (Official title : Senior Site Reliability Engineer I) for our San Diego, CA-based corporate office. Site Reliability Engineers at SHEIN are hybrid software / systems engineers whose overarching goal is to ensure that production services are always on. They strive to build the most reliable and performant systems on the planet.

SREs work closely with cross-functional teams to ensure we have the right set of tools to generate, collect, analyze, visualize and alert on operational data; knowing exactly what happens across the ecosystem, identifying problems before they occur and addressing them as quickly as possible. They are also responsible for improving operational efficiency, utilization and system resiliency of the platform. They own critical open-source software that our platform relies on and they are core participants in every significant engineering effort underway in the platform.

Additionally, SREs are tasked with driving forward the operability of the platform to reduce the number of incidents while reducing MTTR. To accomplish this, the team combines software development, networking and systems engineering expertise along with a desire to be challenged by issues of scale and complexity to make our service better for our customers.

Job Responsibilities

  • Participate in an on-call rotation to ensure 24 / 7 / 365 availability of SHEIN's production system.
  • Supervise capacity & utilization and work closely with cross-functional teams to orchestrate scale up / down of the services.
  • Own and operate critical open-source services like Elasticsearch, Kafka, RabbitMQ, Redis.
  • Build tools and design processes that help improve observability and system resiliency of the platform.
  • Triage site availability incidents and proactively work towards reducing MTTR for customer impacting incidents.
  • Partner with service owners to implement service level metrics and service level objectives that act as service level health indicators.
  • Establish design patterns for monitoring, benchmarking and deploying new features for the backend services.
  • Develop and maintain technical documentation, network diagrams, runbooks, and procedures.
  • Drive initiatives to evolve our current platform to increase efficiency and keep it in line with current standards and best practices.
  • Respond to production incidents by leveraging experience in software development, systems engineering, and networking to proactively prevent recurring issues.
  • Provide relief and sustainable resolution to issues within our infrastructure.
  • Drive initiatives with partner teams to improve the reliability and performance of the infrastructure through improved system design.
  • Join a culture of intolerance to manual activity which results in a highly automated environment delivering scalable solutions.
  • Drive efficiencies through software improvement and root cause analysis resulting in service delivery, maturity, and scalability.

Job Requirements

  • Bachelor's degree in Computer Science or Information Systems or equivalent technical discipline.
  • 5+ years of working experience in an enterprise 24 / 7 production environment supporting mission-critical, real-time, high-traffic applications, especially in cloud environments.
  • Systematic problem-solving approach, combined with a sense of ownership and drive.
  • Full-stack debugging and performance optimization ability, including knowledge of Cloud systems (load balancing, caching, content distribution, etc.), continuous integration / build systems, Java, SQL and NoSQL databases.
  • Track record monitoring and analyzing system performance, isolating issues or bottlenecks that could impact reliability, performance and scalability.
  • Strong experience with observability tools such as Grafana, Prometheus, Zabbix, etc.
  • Experience in any of the scripting / programming languages such as Python, GoLang, etc.
  • Familiar with container technology, such as Docker, Kubernetes, Mesos, etc.
  • Strong verbal and written communication skills; able to work effectively with geographically remote teams.
  • Experience with one or more OSS technologies like Elasticsearch, Kafka and Redis.
  • Proficient with SRE concepts and practices, including being an advocate for the elimination of toil and drive simple solutions.
  • Nice to Have

  • Experience with big data related component operation and maintenance experience (Hadoop / Yarn / Hbase / Hive / Spark, etc.)
  • Solid understanding of Linux system.
  • Bonus and RSU eligible.
  • Health Savings Account with Employer Funding
  • Flexible Spending Accounts (Healthcare and Dependent care)
  • Company-Paid Basic Life / AD&D insurance
  • Company-Paid Short-Term and Long-Term Disability
  • Voluntary Benefit Offerings (Voluntary Life / AD&D, Hospital Indemnity, Critical Illness, and Accident)
  • Employee Assistance Program
  • Business Travel Accident Insurance
  • 401(k) Savings Plan with discretionary company match and access to a financial advisor
  • Vacation, paid holidays, floating holiday and sick days
  • Free weekly catered lunch
  • Dog-friendly office (available at select locations)
  • Free gym access (available at select locations)
  • Free swag giveaways
  • Annual Holiday Party
  • Invitations to pop-ups and other company events
  • Complimentary daily office snacks and beverages
  • Pay Range : $107,600 USD - $180,200 USD

    Seniority level

  • Mid-Senior level
  • Employment type

  • Full-time
  • Job function

  • Engineering and Information Technology
  • Industries
  • Computer and Network Security, Software Development, and Retail Apparel and Fashion
  • Referrals increase your chances of interviewing at SHEIN Technology LLC by 2x

    Inferred from the description for this job

    Medical insurance

    Vision insurance

    401(k)

    Disability insurance

    Get notified about new Site Reliability Engineer jobs in San Diego, CA .

    Senior Site Reliability Engineer Day Shift

    Systems Reliability Engineer - Urgent Need (Onsite)

    Were unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

    #J-18808-Ljbffr

    Create a job alert for this search

    Senior Site Reliability Engineer • San Diego, CA, United States

    Related jobs
    Site Reliability Engineer (SRE), Data Analytics

    Site Reliability Engineer (SRE), Data Analytics

    Apple • San Diego, CA, United States
    Full-time
    At Apple, our Data Analytics team focuses on improving the user experience by improving operating system stability, gathering feature usage telemetry, and evaluating device performance.This require...Show more
    Last updated: 1 day ago • Promoted
    Senior Site Reliability Engineer (Golang)

    Senior Site Reliability Engineer (Golang)

    Apple • San Diego, CA, United States
    Full-time
    At Apple, we strive every single day to craft products that enrich people’s lives.Our successes are the result of skilled domain experts working in an environment which encourages creativity, colla...Show more
    Last updated: 1 day ago • Promoted
    Jr Sediment Remediation Engineer

    Jr Sediment Remediation Engineer

    CDM Smith • Carlsbad, CA, United States
    Full-time
    CDM Smith is seeking candidates for a jr.Sediment Remediation Engineer position.The candidate must have demonstrated expertise working with a team of engineers and scientists on sediment remediatio...Show more
    Last updated: 30+ days ago • Promoted
    Reliability Engineer Lead Custom Silicon Management

    Reliability Engineer Lead Custom Silicon Management

    Apple • San Diego, CA, United States
    Full-time
    Reliability Engineer Lead Custom Silicon Management.Do you love crafting sophisticated solutions to highly complex challenges? Do you intrinsically see the importance in every detail? As part of ou...Show more
    Last updated: 1 day ago • Promoted
    Reliability Engineer

    Reliability Engineer

    Diverse Lynx • San Diego, CA, United States
    Full-time
    Deep understanding of mechanical assemblies and industry test standards (JEDEC, ASTM, IEEE).Experience in reliability testing, including mechanical stress tests (shock, drop, vibration), environmen...Show more
    Last updated: 30+ days ago • Promoted
    Associate Site Reliability Engineer

    Associate Site Reliability Engineer

    SHEIN Technology LLC • San Diego, CA, United States
    Full-time
    Associate Site Reliability Engineer.This range is provided by SHEIN Technology LLC.Your actual pay will be based on your skills and experience talk with your recruiter to learn more.Were seeking a ...Show more
    Last updated: 1 day ago • Promoted
    Site Reliability Engineer (Automation)

    Site Reliability Engineer (Automation)

    Motorola Solutions • San Diego, CA, United States
    Full-time
    At Motorola Solutions, we believe that everything starts with our people.We're a global close-knit community, united by the relentless pursuit to help keep people safer everywhere.Our critical comm...Show more
    Last updated: 1 day ago • Promoted
    Sediment Remediation Engineer 5

    Sediment Remediation Engineer 5

    CDM Smith • Carlsbad, CA, United States
    Full-time
    CDM Smith is seeking candidates for a senior level Sediment Remediation Engineer position.The candidate must have demonstrated expertise working independently as well as leading a team of engineers...Show more
    Last updated: 30+ days ago • Promoted
    Reliability Engineer II

    Reliability Engineer II

    Ajinomoto Bio-Pharma Services • San Diego, CA, United States
    Full-time
    Why work for PCI Pharma Services?.At PCI, we have an uncompromising focus on providing quality and operational excellence and providing the industry leading customer experience.Our people make all ...Show more
    Last updated: 1 day ago • Promoted
    Reliability Engineer

    Reliability Engineer

    Jones Lang LaSalle IP, Inc. • San Diego, CA, United States
    Full-time
    JLL empowers you to shape a brighter way.Our people at JLL and JLL Technologies are shaping the future of real estate for a better world by combining world class services, advisory and technology f...Show more
    Last updated: 1 day ago • Promoted
    SR. Systems AND Reliability Engineer

    SR. Systems AND Reliability Engineer

    Insight Global • San Diego, CA, United States
    Full-time
    This Systems / Reliability Engineer will manage the verification and traceability of product requirements and specifications for new hardware and software products, aid in product validation plannin...Show more
    Last updated: 1 day ago • Promoted
    Systems Reliability Engineer - Urgent Need (Onsite)

    Systems Reliability Engineer - Urgent Need (Onsite)

    MILLENNIUMSOFT • San Diego, CA, United States
    Full-time
    Systems Reliability Engineer - Urgent Need (Onsite).Systems Reliability Engineer - Urgent Need (Onsite).Systems Reliability Engineer - Urgent Need (Onsite). Be among the first 25 applicants.Systems ...Show more
    Last updated: 1 day ago • Promoted
    Reliability Engineer

    Reliability Engineer

    Actalent • Poway, CA, United States
    Full-time
    Be among the first 25 applicants.Get AI-powered advice on this job and more exclusive features.We are seeking an experienced Reliability Engineer to conduct thorough reliability modeling and analys...Show more
    Last updated: 1 day ago • Promoted
    Reliability Engineer

    Reliability Engineer

    TEKsystems • San Diego, CA, United States
    Full-time
    Be among the first 25 applicants.Get AI-powered advice on this job and more exclusive features.This is a hybrid position in San Diego, CA. Preparing concise and detailed reliability test plans and r...Show more
    Last updated: 1 day ago • Promoted
    Site Reliability Engineer (Middle / Senior) ID38916

    Site Reliability Engineer (Middle / Senior) ID38916

    AgileEngine • Rosarito, Baja California, Mexico
    Remote
    Fortune 500 brands and trailblazing startups across 17+ industries.We rank among the leaders in areas like application development and AI / ML, and our people-first culture has earned us multiple Bes...Show more
    Last updated: 4 days ago • Promoted
    Site Reliability Engineer II - Data

    Site Reliability Engineer II - Data

    AppFolio • San Diego, CA, United States
    Full-time
    Site Reliability Engineer II - Data at AppFolio.The role contributes to our growing Data Engineering and Operations team to ingest data from disparate sources and route them to various target stora...Show more
    Last updated: 1 day ago • Promoted
    Supervisor, Site Reliability Engineering - Federal - 2nd Shift

    Supervisor, Site Reliability Engineering - Federal - 2nd Shift

    ServiceNow • San Diego, CA, United States
    Permanent
    It all started in sunny San Diego, California in 2004 when a visionary engineer, Fred Luddy, saw the potential to transform how we work. Fast forward to today ServiceNow stands as a global market le...Show more
    Last updated: 1 day ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Canonical • Tijuana, Baja California, Mexico
    Remote
    Senior Site Reliability Engineer.Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used i...Show more
    Last updated: 30+ days ago • Promoted