Description
Overview
As a DevOps Site Reliability Engineer (SRE), you will be responsible for designing, implementing, and maintaining our application infrastructure to ensure that these systems are highly available, scalable, and reliable.
You will work closely with our development and operations teams to implement automation, monitor performance, and identify and resolve issues before they affect our users and customers. In this role, you will be directly supporting cutting-edge software solutions for Rogue Fitness from our retail website to systems that support our manufacturing and warehousing systems.
Rogue Fitness is the leading manufacturer of strength & conditioning equipment and Official supplier to the CrossFit Games, USA Weightlifting, the World’s Strongest Man and the Arnold Classic.
The DevOps Site Reliability Engineer is a fully onsite role in Columbus, Ohio. Remote work is not available.
Applicants must be authorized to work in the United States for any employer.
Responsibilities
Design, implement, and maintain our infrastructure and applications to ensure they are highly available, scalable, and reliable
Collaborate with our development and operations teams to implement automation, monitor performance, and identify and resolve issues before they affect our customers
Implement best practices for application deployment, configuration, management, and security
Plan and coordinate deployment processes for infrastructure upgrades with minimum downtime
Monitor and analyze system performance metrics to identify and address issues
Develop and maintain infrastructure as code using tools like Terraform.
Troubleshooting, determine the root cause of issues, and conduct post mortem analysis
Implement and maintain CI / CD pipelines for our applications
Support disaster recovery and business continuity planning
Provide coverage to respond to production issues and incidents
Qualifications
Bachelor Degree in Computer Science, Information Systems, Computer Engineering, or related area
5+ years of experience in a DevOps and / or SRE role
Expert-level knowledge of containerization and orchestration tools like Docker, Kubernetes, and Helm
Prior experience with automation tools like Azure Devops or Jenkins
Required experience in GCP and Azure.
Utilization of monitoring tools like Prometheus, Grafana, Application Insights, GCP Cloud Monitoring
Scripting competencies with Bash, Powershell, and other scripting languages
Demonstrated ability to apply programming skills for automation tools and processes.
Knowledge of GIT, Bitbucket, DevOps, and other source / version control platforms
Strong networking knowledge including firewalls, load balancing, and reverse proxy products
Cloudflare configuration and zero trust implementations are a plus.
Strong and proactive communication skills are required along with a team-oriented mindset
By applying to Rogue, regardless of the platform you choose to use, you are agreeing to Rogue's preferred methods of communication (i.e. text message). Submitting an application, through whatever online forum is ultimately used, constitutes a knowing and voluntary agreement to send and receive text messages during the recruitment process.
Site Reliability Engineer • Columbus, Ohio