Talent.com
Site Reliability Engineering
Site Reliability EngineeringTechniPros • Atlanta, Georgia, USA
Site Reliability Engineering

Site Reliability Engineering

TechniPros • Atlanta, Georgia, USA
9 days ago
Job type
  • Full-time
Job description

Job Title : Site Reliability Engineering (SRE) Architect

Location : Atlanta Georgia (Hybrid)

Long Term Contract

Looking for W2 Candidates. No C2C

Job Description : Role Summary :

As an SRE Architect you will be a pivotal technical leader responsible for designing building and evolving the foundational systems and practices that ensure the reliability scalability performance and efficiency of our critical services. Moving beyond day-to-day operations you will focus on the strategic architectural direction of SRE function defining standards blueprints and frameworks that enable development teams and fellow SRE operations team to build and operate highly resilient systems. Leverage deep expertise in software engineering distributed systems cloud infrastructure and SRE principles to influence technology choices establish best practices and foster a proactive culture of reliability across the organization and much beyond observability pillar.

Key Responsibilities :

Reliability Strategy & Design :

  • Architect and design highly available scalable secure and cost-effective infrastructure and application patterns on AWS
  • Define and evangelize SRE best practices standards and blueprints for service design deployment monitoring and operational readiness across the engineering organization
  • Review current observability implementation to identify gaps and define steps to reach next level maturity of observability setup to provide deep insights into system health and behaviour
  • With overall maturity lead the definition and implementation strategy for Service Level Indicators (SLIs) Service Level Objectives (SLOs) and Error Budgets for critical services

Platform Architecture & Automation :

  • Design solutions to systematically reduce operational toil through automation and improved system design
  • Evaluate current SRE tools and automation frameworks (e.g. CI / CD pipelines Infrastructure as Code modules automated incident remediation chaos engineering platforms) and suggest enhancement that will help overall enhancement of capability
  • Evaluate prototype and recommend new technologies tools and methodologies to enhance system reliability developer productivity and operational efficiency
  • Technical Leadership & Consultation :

  • Act as a senior technical advisor and subject matter expert on reliability scalability and performance for development and platform teams
  • Provide architectural guidance during the design phase of new services and features to ensure reliability principles are embedded early (shift-left)
  • Mentor and coach other SREs and engineers fostering technical excellence and adherence to SRE principles
  • Lead architectural reviews and production readiness assessments for critical systems
  • Resilience :

  • Lead blameless postmortems for significant incidents ensuring root causes are identified and systemic architectural improvements are prioritized and implemented
  • Architect and advocate for resilience patterns (e.g. circuit breaking rate limiting graceful degradation chaos engineering) within applications and infrastructure
  • Required Qualifications :

  • Proven experience in an architectural role designing solutions for reliability scalability and performance
  • Deep understanding and practical application of SRE principles (SLIs / SLOs error budgets toil reduction automation incident management postmortems)
  • Expertise in cloud computing platforms (e.g. AWS) including infrastructure networking and security services
  • Strong experience with containerization and orchestration technologies (Kubernetes Docker serverless computing)
  • Solid experience designing and implementing observability solutions (e.g. Dynatrace Prometheus Grafana ELK / EFK Stack Jaeger OpenTelemetry)
  • Strong programming / scripting skills (e.g. Python Go Bash) for automation and tool development
  • Excellent analytical problem-solving and strategic thinking skills.
  • Strong communication collaboration and leadership skills with the ability to influence technical direction across teams
  • Preferred Qualifications :

  • Experience designing and implementing chaos engineering practices and platforms
  • Best Regards : Jahnavi G

    Phone : 1-

    Email : Key Skills

    Kubernetes,FMEA,Continuous Improvement,Elasticsearch,Go,Root cause Analysis,Maximo,CMMS,Maintenance,Mechanical Engineering,Manufacturing,Troubleshooting

    Employment Type : Full Time

    Experience : years

    Vacancy : 1

    Create a job alert for this search

    Site Reliability Engineering • Atlanta, Georgia, USA

    Related jobs
    Principal Site Reliability Engineer

    Principal Site Reliability Engineer

    Qgenda • Atlanta, Georgia, United States
    Full-time +1
    QGenda is redefining healthcare workforce management everywhere care is delivered.We're on a mission to empower the healthcare industry to better onboarding, deploy, and manage their workforce.Over...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Cloudious LLC • Atlanta, Georgia, USA
    Full-time
    Senior Site Reliability Engineer.Manage and optimize data streaming and API components in OpenShift Onpremise and AWS.Proactively review the applications APIs and processes to identify opportunitie...Show more
    Last updated: 14 days ago • Promoted
    Senior Project Manager- Land & Site Development

    Senior Project Manager- Land & Site Development

    Rochester | DCCM • Fayetteville, GA, US
    Full-time
    Our Fayetteville, Georgia office is looking for a talented Senior Project Manager to join our team.In this role you will get to manage and plan detailed phases of engineering work for residential p...Show more
    Last updated: 30+ days ago • Promoted
    Aerospace Site Manager

    Aerospace Site Manager

    PPG • Atlanta, Georgia, USA
    Full-time
    Kennesaw GA site that supports both Business and Operational strategies.You will be responsible for providing strategic developmental and tactical direction of the facility to drive Operational Exc...Show more
    Last updated: 21 days ago • Promoted
    Site Reliability Engineer I

    Site Reliability Engineer I

    Axon • Atlanta, Georgia, USA
    Full-time
    Join Axon and be a Force for Good.At Axon were on a mission to Protect Life.Were explorers pursuing societys most critical safety and justice issues with our ecosystem of devices and cloud software...Show more
    Last updated: 23 days ago • Promoted
    Cell Leader Turbine

    Cell Leader Turbine

    GE Vernova • Chamblee, Georgia, USA
    Full-time
    As a member of the site leadership team you will be an active contributor to the Safety Quality Delivery and Cost (SQDC) goals for the business unit. This role will require you to lead the Steam Tur...Show more
    Last updated: 8 days ago • Promoted
    Site Reliability Engineer (SRE)

    Site Reliability Engineer (SRE)

    Kanshe Infotech • Alpharetta, Georgia, USA
    Full-time
    Job Title : Site Reliability Engineer (SRE).We are looking for an experienced.Site Reliability Engineer (SRE).The ideal candidate will have a strong background in. DevOps cloud infrastructure automa...Show more
    Last updated: 24 days ago • Promoted
    Site Reliability Engineering (SRE) Architect

    Site Reliability Engineering (SRE) Architect

    QTech • Atlanta, Georgia, USA
    Full-time
    Job Title : Site Reliability Engineering (SRE) Architect.Location : Atlanta Georgia (Hybrid).As an SRE Architect you will be a pivotal technical leader responsible for designing building and evolving...Show more
    Last updated: 9 days ago • Promoted
    Principal Site Reliability Engineer - Federal Team

    Principal Site Reliability Engineer - Federal Team

    Saviynt • Atlanta, Georgia, United States
    Full-time
    Saviynt's AI-powered identity platform manages and governs human and non-human access to all of an organization's applications, data, and business processes. Customers trust Saviynt to safeguard the...Show more
    Last updated: 30+ days ago • Promoted
    Manager Site Reliability Engineering

    Manager Site Reliability Engineering

    RELX • Alpharetta, GA, US
    Full-time
    Are you an experienced site reliability engineering leader ready to shape strategy, inspire teams, and drive innovation at scale? Are you looking to lead a high-impact sre team where your leadershi...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineering ManagerRemote

    Site Reliability Engineering ManagerRemote

    Wikimedia • Atlanta, GA, US
    Full-time
    Site Reliability Engineering Manager.The Wikimedia Foundation is looking for an Engineering Manager to join our SRE team, reporting to the Director of Site Reliability Engineering.As Engineering Ma...Show more
    Last updated: 22 days ago • Promoted
    Senior Site Reliability Engineer - Featurespace

    Senior Site Reliability Engineer - Featurespace

    Visa • Atlanta, Georgia, United States
    Full-time
    Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more t...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Donato Technologies, Inc • Atlanta, Georgia, USA
    Full-time
    Senior Site Reliability Engineer.Manage and optimize data streaming and API components in OpenShift Onpremise and AWS.Proactively review the applications APIs and processes to identify opport...Show more
    Last updated: 15 days ago • Promoted
    Lead Engineer

    Lead Engineer

    Chesapeake Utilities Corporation • Norcross, GA, United States
    Full-time
    Remote Within Service Territory -.DE, PA, OH, GA, NC, VA, MD or FL).The Lead Engineer plays a pivotal role in training and process improvement, developing and leading training programs for the Engi...Show more
    Last updated: 30+ days ago • Promoted
    Systems Engineer

    Systems Engineer

    Delta Dental of California • Alpharetta, GA, United States
    Full-time
    EMPLOYER : Delta Dental Insurance Company.Location : 1130 Sanctuary Pkwy, Alpharetta, GA 30009; Must live within reasonable distance from HQ and appear in office as required.Monitor and work ticket q...Show more
    Last updated: 28 days ago • Promoted
    Software Engineering / Applications Leadership Advisory - Executive Technology Services for Global Enterprises

    Software Engineering / Applications Leadership Advisory - Executive Technology Services for Global Enterprises

    Gartner • Sandy Springs, GA, United States
    Full-time
    Enterprise IT Leaders - Global Enterprise - Software Engineering / Applications Advisory.Enterprise IT Leaders (EITL) is an executive-level advisory service that delivers expert insight and guidance ...Show more
    Last updated: 30+ days ago • Promoted
    Engineering Tech

    Engineering Tech

    Leidos • Atlanta, GA, United States
    Full-time
    We empower our teams, contribute to our communities, and operate sustainably.Everything we do is built on a commitment to do the right thing for our customers, our people, and our community.Our Mis...Show more
    Last updated: 25 days ago • Promoted
    Senior Systems Reliability Engineer - Now Hiring!

    Senior Systems Reliability Engineer - Now Hiring!

    ADP • Alpharetta, GA, United States
    Full-time
    Senior Systems Reliability Engineer in our Alpharetta, GA location.Are you empathetic to client needs and inspired by transformation and impacting the lives of millions of people every day?.Are you...Show more
    Last updated: 8 days ago