Search jobs > Salt Lake City, UT > Site reliability engineer

Aumni - Site Reliability Engineer III - MLOPS

JPMorgan Chase & Co.
Salt Lake City, UT, United States
Full-time

There’s nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems.

As a Site Reliability Engineer III at JPMorgan Chase within the Digital Private Markets / Aumni (A JP Morgan Chase Company), you will solve complex and broad business problems with simple and straightforward solutions.

Through code and cloud infrastructure, you will configure, maintain, monitor, and optimize applications and their associated infrastructure to independently decompose and iteratively improve on existing solutions.

You are a significant contributor to your team by sharing your knowledge of end-to-end operations, availability, reliability, and scalability of your application or platform.

As MLops Engineer, you will solve complex and broad business problems with simple and straightforward solutions. Through code and cloud infrastructure, you will configure, maintain, monitor, and optimize the models produced by our data science teams and their associated.

You are a significant contributor to your team by sharing your knowledge of end-to-end operations, availability, reliability, and scalability in the AI / ML space.

Job responsibilities

  • Guides and assists others in the areas of designing and deploying new AI / ML models in the cloud, gaining consensus from peers where appropriate
  • Designs and implements automated continuous integration and continuous delivery pipelines for the Data Science teams to develop and train AI / ML models
  • Writes and deploys infrastructure as code for the models and pipelines you support
  • Collaborates with technical experts, key stakeholders, and team members to resolve complex technical problems
  • Understands the importance of monitoring and observability in the AI / ML space . service level indicators and utilizes service level objectives
  • Proactively resolve issues before they impact internal and external stakeholders of deployed models
  • Supports the adoption of MLops best practices within your team

Required qualifications, capabilities, and skills

  • Formal training or certification on site reliability engineering concepts and 3+ years applied experience
  • Understanding of MLops culture and principles and familiarity with how to implement associated concepts at scale
  • Domain knowledge of machine learning applications and technical processes within the AWS ecosystem
  • Experience with infrastructure as code tooling such as Terraform, Cloudformation
  • Experience with container and container orchestration such as ECS, Kubernetes, and Docker
  • Knowledge of continuous integration and continuous delivery tools like Jenkins, GitLab, or Github Actions
  • Proficiency in the following programming languages : Python, Bash
  • Hands-on knowledge of Linux and networking internals
  • Understanding of the different roles served by data engineers, data scientists, machine learning engineers, and system architects, and how MLops contributes to each of these workstreams
  • Ability to identify new technologies and relevant solutions to ensure design constraints are met by the Data Science and Machine Learning teams

Preferred qualifications, capabilities, and skills

  • Experience with Model training and deployment pipelines, managing scoring endpoints
  • Familiarity with observability concepts and telemetry collection using tools such as Datadog, Grafana, Prometheus, Splunk, and others
  • Understanding of data engineering platforms such as Databricks or Snowflake, and machine learning platforms such as AWS Sagemaker
  • Comfortable troubleshooting common containerization technologies and issues
  • Ability to proactively recognize road blocks and demonstrates interest in learning technology that facilitates innovation
  • 30+ days ago
Related jobs
JPMorgan Chase Bank, N.A.
Cottonwood Heights, Utah

Proactively resolve issues before they impact internal and external stakeholders of deployed models * Supports the adoption of MLops best practices within your team Required qualifications, capabilities, and skills * Formal training or certification on site reliability engin...

JPMorgan Chase & Co.
Salt Lake City, Utah

As a Site Reliability Engineer III at JPMorgan Chase within the Digital Private Markets /Aumni (A JP Morgan Chase Company), you will solve complex and broad business problems with simple and straightforward solutions. Formal training or certification on site reliability engineering concepts and 3+ y...

Electronic Arts
Utah, USA

Work as a technical liaison with development teams to address build issues and improvements.Create, modify, and maintain pipelines and workflow tools.Write application code to enhance various tools in the system.Collaborate with team-mates to maintain and enhance an automation pipeline.Monitor autom...

CIRCLE
Salt Lake City, Utah

Senior Site Reliability Engineer (III). Senior Site Reliability Engineer (III). As a Senior Site Reliability Engineer at Circle, you will design, build, and maintain Circle’s infrastructure estate to meet the growing worldwide customer base on public cloud providers across multiple regions. Staff Si...

JPMorgan Chase & Co.
Salt Lake City, Utah

As a Software Engineer III at JPMorgan Chase within the Digital Private Markets, Aumni a (JP Morgan Chase Company), you serve as a seasoned member of an agile team to design and deliver trusted market-leading technology products in a secure, stable, and scalable way. We have an exciting and rewardin...

CIRCLE
Salt Lake City, Utah

Senior Site Reliability Engineer (III). Senior Site Reliability Engineer (III). As a Senior Site Reliability Engineer at Circle, you will design, build, and maintain Circle’s infrastructure estate to meet the growing worldwide customer base on public cloud providers across multiple regions. Staff Si...

CIRCLE
Salt Lake City, Utah

Senior Site Reliability Engineer (III). Senior Site Reliability Engineer (III). As a Senior Site Reliability Engineer at Circle, you will design, build, and maintain Circle’s infrastructure estate to meet the growing worldwide customer base on public cloud providers across multiple regions. Staff Si...

Promoted
Peraton
Sandy, Utah

Systems Administrators confer with senior administrators, systems and applications engineers, and storage personnel to correct errors requiring a change of instructions or sequences of operations. Providing Systems Administrator support to systems hosting complex operational databases, software conf...

Promoted
Strategic Systems Inc
Salt Lake City, Utah
Remote

Job Title: Microsoft Identity Manager Architect/Sr Engineer. Product Upgrade / Migration Assessments • Systems Performance Reviews. ...

Promoted
TWO95 International, Inc
Salt Lake City, Utah
Remote

Title: Cloud Infrastructure DevOps Engineer. Have an engineering and an automation first mindset. ...