Senior Site Reliability Engineer

Mindlance
Reston, VA
Full-time

Position Title : Senior Site Reliability Engineer

Location : Reston, VA / Fully Remote

Duration : Long Term Assignment

Must Have Skills :

  • APIs and Microservices.
  • Atlassian Suite of products (Jira, Confluence, Bitbucket, Crowd).
  • AWS Code Pipeline.
  • Bachelor’s Degree - CS or Engineering.
  • DevOps.
  • Git and Bitbucket, including branching workflows.
  • Infrastructure as code using AWS CDK, Cloud Formation, or similar scripting techniques.
  • IP Networking, VPCs, DNS, Load Balancing, and firewalls.
  • Linux-based systems administration.
  • programming language and it’s frameworks and design patterns.
  • On call rotations.
  • Scalable production environment.
  • Scripting experience in a Cloud-based environment.

Nice To Have :

Monitoring suites (Ex : New Relic, Splunk, Sumo Logic)

ESSENTIAL FUNCTIONS / RESPONSIBILITIES :

  • Design, develop and implement automated solutions, based on a set of standards and processes which establish consistency across the enterprise, to reduce risk and promote efficiencies in support of the organization’s goals and objectives.
  • Responsible for the quality of your work; will develop and implement a set of quality criteria and the associated validation methods to ensure that any deliverable meets the expected quality levels of our customers, use quality management standards / metrics to ensure quality levels are maintained, seek new approaches and techniques to improve quality levels and analyze the impact of quality control and quality assurance on project performance.
  • Actively review Observability custom and COTS products and implement improvements seen within the industry to drive continuous improvement of the Observability products’ efficiency, scalability, and quality.
  • Managing and resolving incidents, conducting incident reviews, and managing problems with a focus on proactivity.
  • Incident management - Act in key response roles during major incidents. Participate in an on-call rotation with other team members.

Participate in the post-mortem review of incidents for Root Cause Analysis (RCA).

  • Participate in system design consulting, AWS platform management, and capacity planning.
  • Provide support (coaching and mentoring) for teammate's work activities on a regular basis.
  • Use product SLAs, enterprise standards / metrics to ensure product availability and user experience quality levels are maintained, seek innovative approaches and techniques to improve quality levels and analyze the impact of the product changes on application performance and availability.
  • Design and develop tools and processes to aid in improving infrastructure reliability and allow for monitoring and reporting.
  • Write complex code, building infrastructure as code, work with serverless based cloud environments and build the supporting automated toolsets necessary to support the continuous metric collection pipeline.
  • Integrate COTS products across the continuous delivery pipeline to provide a comprehensive automated system from epic definition, development, test and deploy of CB applications within our data center and Amazon.
  • A hands-on engineer who leads by doing. Take responsibility for creating design specifications, unit testing, and preparing technical documentation.

Develop solutions from business initiation through operational integrity.

  • Support the development of Observability standards by creating templates for ease of use and increase of Observability capabilities’ adoption.
  • Foster and build a community of practice for collective learning of the Observability tools and systems across all development teams.
  • Be in an on-call rotation to respond to incidents that impact Client's availability and provide support for Development team engineers with customer related incidents.
  • Use your on-call experiences to analyze and prevent incidents from ever happening.

Qualifications needed for the role :

  • A bachelor’s degree preferably in Computer Science, Engineering or MIS.
  • years of experience in software systems, programming, and infrastructure development and administration.

Preferred skills and attributes for the role :

  • Strong, proven experience as a DevOps engineer in a scalable production environment administrating one or more of the following : Atlassian Suite of products (Jira, Confluence, Bitbucket, Crowd).
  • Ability to operate in a high-pressure environment, quickly troubleshoot complex issues and successfully handle multiple priorities
  • Strong practical Linux-based systems administration skills and scripting experience in a Cloud-based environment.
  • Experience with programming language and it’s frameworks and design patterns.
  • Experience working with APIs and Microservices.
  • Working knowledge of IP Networking, VPCs, DNS, Load Balancing, and Firewalls.
  • Experience building infrastructure as code using AWS CDK, Cloud Formation, or similar scripting techniques.
  • Experience managing releases into production using AWS Code Pipeline.
  • Expertise with Git and Bitbucket, including branching workflows.
  • Experience with monitoring suites (Ex : New Relic, Splunk, Sumo Logic) is a plus.
  • Excellent interpersonal and collaboration skills with the ability to work with a diverse set of colleagues.
  • Strong decision-making, problem-solving skills, critical thinking, and testing skills.
  • Self-starter with the ability to set priorities, work independently, and attain goals.
  • The ethos of continuous improvement and interest in learning new things.
  • Strong ability to understand and internalize the big picture and broader implications.
  • 29 days ago
Related jobs
Promoted
VirtualVocations
Manassas, Virginia

A company is looking for a Senior Site Reliability Engineer. ...

Promoted
Comcast Corporation
Sterling, Virginia

In most cases, Comcast prefers to have employees on-site collaborating unless the team has been designated as virtual due to the nature of their work. Please visit the compensation and benefits summary on our careers site for more details. ...

Promoted
VirtualVocations
Manassas, Virginia

Key Responsibilities:Operate and Maintain shared components impacting overall production reliabilityDefine and Measure System Reliability Goals and engage in collaborative developmentIdentify and Eliminate Toil through automation and continuously monitor system performanceRequired Qualifications:3+ ...

Alarm.com
Tysons, Virginia

Senior Software Engineer (Site Reliability Engineer). If the above holds true for you, then we would love to talk to you! is looking for a versatile Site Reliability Engineer to work on our Platform team. Bachelor’s in Computer Science, Computer Engineering, or a related field . SRE, and/or Software...

Promoted
VirtualVocations
Manassas, Virginia

A company is looking for a Site Reliability Engineer in Developer Enablement. ...

Oracle
Reston, Virginia

We’re looking for Site Reliability Engineers (SRE’s) to help build highly distributed systems, platform services and tools for a highly distributed multi-tenant cloud environment at massive scale. When not working on operations the SRE is working on software engineering tasks such as design and deve...

ARFA Solutions, LLC
Ashburn, Virginia
Remote

Performance on the site, jump in front end JavaScript (lighthouse - performance) what is causing performance slowing on the site, report back to the team to improve site performance. Collaborate with cross-functional teams to ensure the reliability, security, and performance of systems. ...

Beyond SOF
Vienna, Virginia

Minimum of 8 years of experience as a Site Reliability Engineer  with a strong understanding of SRE principles for highly scalable and reliable systems. Ability to work in downtown Washington, DC on client site at least 3 days per week. ...

Peraton
Chantilly, Virginia

Peraton is a next-generation national security company that drives missions of consequence spanning the globe and extending to the farthest reaches of the galaxy.As the world’s leading mission capability integrator and transformative enterprise IT provider, we deliver trusted, highly differentiated ...

Red Gate Group
Reston, Virginia

Site Reliability Engineer . As a Site Reliability Engineer on our team, you’ll work with the DoD on the development of more robust systems by building a resilient infrastructure. Ability to prioritize tasks, communicate effectively, and can raise risks to senior leadership and balance team...