Talent.com
Site Reliability Engineer

Site Reliability Engineer

Tax AnalystsFalls Church, VA, US
30+ days ago
Job type
  • Full-time
Job description

Job Description

Job Description

Tax Analysts is seeking a Site Reliability Engineer (SRE) to help establish and shape our reliability engineering practice from the ground up. This is a unique opportunity to join a mission-driven organization and play a key role in ensuring the reliability, scalability, and performance of our AWS-hosted business applications.

As part of a cross-functional engineering team, you will work to improve observability, automate operational processes, and lead incident response and continuous improvement efforts. This role is ideal for a mid-level engineer with cloud and software engineering experience who is eager to deepen their expertise in site reliability engineering, learn from senior staff, and help build a culture of reliability.

ESSENTIAL DUTIES AND RESPONSIBILITIES :

  • Help define and implement service-level indicators (SLIs) and objectives (SLOs) for cloud-based applications.
  • Build, configure, and maintain monitoring, alerting, and dashboarding solutions using AWS CloudWatch, X-Ray, and third-party tools such as DataDome.
  • Leverage advanced AWS observability tools (e.g., CloudWatch Synthetics, Contributor Insights) to proactively monitor system health.
  • Contribute to the development and implementation of a structured on-call support process as our reliability practice evolves.
  • Implement monitoring, and maintain site protection and bot mitigation solutions, including DataDome, to defend against automated attacks and ensure application availability, and analyze performance during postmortems of incidents.
  • Investigate incidents, security events, and operational anomalies, resolve, perform root cause analysis, and run a postmortem process.
  • Identify repetitive or manual operational tasks (‘toil’) and design scripts or automations using AWS Lambda and CloudFormation to improve efficiency and reliability.
  • Assist in the maintenance and enhancement of CI / CD pipelines and automated deployment processes.
  • Work closely with development, QA, cloud, and DevOps teams to ensure reliability, scalability, and security are integrated into system and application designs.
  • Contribute to the documentation of systems, processes, incident learnings, compliance, and reliability best practices.
  • Stay current with emerging AWS, SRE, and observability technologies, and make recommendations to adopt new tools or approaches that improve system resilience and operational excellence.
  • Participate in the evaluation and rollout of new AWS services and features that can benefit system reliability or team efficiency.
  • Perform other related duties as assigned to support the team and organizational objectives.

KNOWLEDGE & SKILLS :

  • Strong analytical, troubleshooting, and problem-solving abilities.
  • Hands-on experience with AWS CloudWatch (metrics, logs, dashboards, alarms) for proactive monitoring and alerting.
  • Familiarity with AWS X-Ray for distributed tracing and in-depth troubleshooting of microservices architectures.
  • Experience leveraging tools like CloudWatch Synthetics and Contributor Insights for canary testing and log analytics.
  • Knowledge of AWS CloudTrail for auditing and investigating API calls and security events.
  • Experience using AWS Athena for ad-hoc querying and analysis of logs during incident investigations and postmortems.
  • Proficiency with AWS CloudFormation for reliable and repeatable infrastructure provisioning.
  • Experience automating operational tasks and workflows using AWS Lambda or similar event-driven services.
  • Understanding of AWS services such as API Gateway, CloudFront, and Elastic Load Balancer (ELB) to ensure availability, scalability, and optimal performance of distributed systems.
  • Experience working with site protection and bot mitigation solutions (such as DataDome or Cloudflare).
  • Working knowledge of scripting or programming languages such as Python, Bash, or Node.js for automation and tooling.
  • Excellent communication and documentation skills; ability to collaborate effectively with cross-functional teams.
  • Eagerness to learn and adopt new tools, technologies, and best practices in cloud reliability and operations.
  • Requirements

  • Bachelor’s degree in computer science, engineering, or a related field; equivalent professional experience considered.
  • 3+ years of professional experience in cloud engineering, DevOps, infrastructure, or observability roles (AWS required).
  • Experience implementing SRE principles (prior work in an SRE role is a plus).
  • Experience with monitoring, incident response, or reliability work in a production environment.
  • Experience working in an Agile development environment, collaborating within cross-functional teams.
  • Eagerness to help establish and improve site reliability practices while learning and applying best practices.
  • Benefits

  • Health / Dental / Vision
  • 401K : Immediately vested
  • Tuition assistance
  • Qualified employer under the Public Service Loan Forgiveness program (PFSL)
  • Generous Paid Time Off
  • Dog-friendly office
  • Private gym onsite
  • Medical, Dental, Vision Insurance
  • Health Savings Account (HSA)
  • Flexible Spending Account (FSA)
  • Employee Assistance Program (EAP)
  • Life and AD&D Insurance
  • Disability Insurance
  • Pet Insurance
  • Tuition Assistance
  • Trade Publication / News Subscription Reimbursement
  • Exercise Room
  • Paid Holidays
  • Vacation and Sick Leave
  • Parental Leave
  • Tax Analysts is an Equal Employment Opportunity Employer.

    Create a job alert for this search

    Site Reliability Engineer • Falls Church, VA, US

    Related jobs
    • Promoted
    Senior Software Engineer, Site Reliability

    Senior Software Engineer, Site Reliability

    Capital OnePimmit, VA, US
    Full-time +1
    Senior Software Engineer, Site Reliability Do you love building and pioneering in the technology space? Do you enjoy solving complex business problems in a fast-paced, collaborative, inclusive , an...Show moreLast updated: 1 day ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CSCI ConsultingQuantico, VA, United States
    Full-time
    CSCI Consulting is looking for a.Site Reliability Engineer (SRE).This role combines deep systems engineering knowledge with DevOps automation, proactive monitoring, and incident response practices....Show moreLast updated: 3 days ago
    • Promoted
    Site Reliability Engineer - Redmond WA

    Site Reliability Engineer - Redmond WA

    Redis EnterpriseWashington, DC, United States
    Full-time
    We built the product that runs the fast apps our world runs on.If you checked the weather, used your credit card, or looked at your flight status online today, you’re welcome.At Redis, you’ll work ...Show moreLast updated: 14 days ago
    Site Reliability Engineer (req-174)

    Site Reliability Engineer (req-174)

    CATHEXISTysons, VA, US
    Full-time
    Quick Apply
    Team CATHEXIS elevates the government contracting experience through rapid response, deep skill, and thoughtful problem-solving and communication. Our core capabilities are our top-tier program and ...Show moreLast updated: 18 days ago
    • Promoted
    Sr. Manager - Site Reliability Engineer

    Sr. Manager - Site Reliability Engineer

    VisaAshburn, VA, United States
    Full-time
    Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more t...Show moreLast updated: 3 days ago
    • Promoted
    Principal Site Reliability Engineer - Cloud (Remote)

    Principal Site Reliability Engineer - Cloud (Remote)

    Donnelley Financial, LLCRockville, MD, United States
    Remote
    Full-time
    Join a dynamic team at the pulse of global markets, where we deliver innovative software and service solutions for essential financial reporting and capital markets transactions.At DFIN, we are a v...Show moreLast updated: 3 days ago
    • Promoted
    Full Stack (REACT) Engineer

    Full Stack (REACT) Engineer

    Centurion Consulting Group, LLCLeesburg, VA, United States
    Full-time
    Job Title : Agile Project Manager.Location : Hybrid - Remote with initial onsite work in Leesburg, VA (NCR candidates preferred). Start Date : Immediate (ideally by Monday).Department of Veterans Affai...Show moreLast updated: 3 days ago
    • Promoted
    Lead Software Engineer, Site Reliability

    Lead Software Engineer, Site Reliability

    Capital OneWashington, D.C., US
    Full-time +1
    Lead Software Engineer, Site Reliability Do you love building and pioneering in the technology space? Do you enjoy solving complex business problems in a fast-paced, collaborative, inclusive , and ...Show moreLast updated: 1 day ago
    • Promoted
    Site Reliability Engineer, Home

    Site Reliability Engineer, Home

    Google Inc.Washington, DC, United States
    Full-time
    Experience completing work as directed, and collaborating with teammates; developing knowledge of relevant concepts and processes. At Google, we have a vision of empowerment and equitable opportunit...Show moreLast updated: 20 days ago
    • Promoted
    • New!
    V2X Test and Development Engineer

    V2X Test and Development Engineer

    LeidosGermantown, MD, US
    Full-time
    Are you interested in shaping the future of transportation? Consider joining the Leidos team operating.FHWA’s Saxton Transportation Operations Laboratory (STOL),. USDOT research lab focused on...Show moreLast updated: 6 hours ago
    Site Reliability Engineer

    Site Reliability Engineer

    Tax AnalystsFalls Church, VA, US
    Full-time
    Quick Apply
    Tax Analysts is seeking a Site Reliability Engineer (SRE) to help establish and shape our reliability engineering practice from the ground up. This is a unique opportunity to join a mission-driven o...Show moreLast updated: 30+ days ago
    • Promoted
    Full Stack Software Engineer SME

    Full Stack Software Engineer SME

    LeidosAldie, VA, US
    Full-time
    National Security Sector combines technology-enabled services and mission software capabilities in the areas of cyber, logistics, security operations, and decision analytics to support our defense ...Show moreLast updated: 30+ days ago
    • Promoted
    Systems Engineer I

    Systems Engineer I

    SimVentions, Inc - Glassdoor 4.6Woodbridge, VA, US
    Full-time
    SimVentions is searching for System Engineers to join our team in support of Tactical Tomahawk Weapons Control System (TTWCS). The candidate will work on a diverse team with fellow SimVentions' syst...Show moreLast updated: 30+ days ago
    • Promoted
    Reliability Engineer, Electrical Systems, NA

    Reliability Engineer, Electrical Systems, NA

    Vantage Data CentersAshburn, VA, United States
    Full-time
    Vantage Data Centers powers, cools, protects and connects the technology of the world's well-known hyperscalers, cloud providers and large enterprises. Developing and operating across North America,...Show moreLast updated: 3 days ago
    • Promoted
    Propulsion Reliability Engineer

    Propulsion Reliability Engineer

    BOOZ, ALLEN & HAMILTON, INC.Arlington, VA, US
    Full-time +1
    Propulsion Reliability Engineer.Are you looking for an opportunity to combine your technical skills with big picture thinking to make an impact as a reliability engineer driving the performance of ...Show moreLast updated: 24 days ago
    • Promoted
    Section Engineer - BGE T&S Strategic Proj Eng

    Section Engineer - BGE T&S Strategic Proj Eng

    ExelonCooksville, MD, United States
    Full-time
    Who We Are : We're powering a cleaner, brighter future.Exelon is leading the energy transformation, and we're calling all problem solvers, innovators, community builders and change makers.Work with ...Show moreLast updated: 15 days ago
    • Promoted
    Technical Lead

    Technical Lead

    Nationwide IT ServicesWarrenton, VA, United States
    Full-time
    US Coast Guard Yard, Baltimore, Maryland.Ability to pass a basic background check.Nationwide IT Services (NIS) is seeking a highly skilled . US Coast Guard Yard in Baltimore, Maryland.The Technical ...Show moreLast updated: 3 days ago
    • Promoted
    Technical Lead

    Technical Lead

    Nationwide IT Services, Inc.Warrenton, VA, United States
    Full-time
    US Coast Guard Yard, Baltimore, Maryland.Ability to pass a basic background check.Nationwide IT Services (NIS) is seeking a highly skilled. US Coast Guard Yard in Baltimore, Maryland.The Technical L...Show moreLast updated: 3 days ago
    • Promoted
    Chief Engineer

    Chief Engineer

    VTGManassas, VA, US
    Full-time
    We are seeking a seasoned and technically proficient.AN / BYG-1 Submarine Combat Control System, a critical open-architecture system supporting U. Navy undersea warfare missions.The Chief Engineer wil...Show moreLast updated: 28 days ago
    • Promoted
    Facilities - Contractor - L4

    Facilities - Contractor - L4

    Innova SolutionsColumbia, MD, US
    Full-time
    A client of Innova Solutions is immediately hiring a.Position Type : Full time Contract, Possible Contract to Hire.Develop and implement space planning strategies that accommodate current and f...Show moreLast updated: 1 day ago