Search jobs > Santa Clara, CA > Principal network engineer

Senior Principal Software Engineer - Cluster Networks (JoinOCI-SDE)

Oracle
Santa Clara, CA, United States
$96.8K-$251.6K a year
Full-time

Cloud Engineering Infrastructure Development

Oracle Cloud Infrastructure (OCI) Cluster Networking team is building an ultra-high performance network required to support AI / ML / HPC workloads.

This is your opportunity to join the AI revolution and designing systems which allow customers to scale from tens to thousands of GPU without compromising on performance.

This team will be responsible for designing, developing and performance tuning the networking stack required to run distributed AI / ML / HPC workload across thousands of GPUs leveraging technologies like RoCE or Infiniband.

This is your opportunity to build innovative solutions for our customers from the ground up. These are exciting times and our team is still young and growing fast, working on ambitious new initiatives.

We are looking for adaptable, self-motivated engineers with ability to learn quickly. You should be both a rock solid developer and a distributed systems generalist, able to dive deep into any part of the stack and low-level systems, as well as design broad distributed system interactions.

You should value simplicity and scale, work comfortably in a collaborative, agile environment, and be excited to learn.

Career Level -

Career Level -

Required Qualifications :

  • 10+ years of experience with software (systems / application) development
  • 3+ years of experience with RDMA over Infiniband network (including setup, troubleshooting, tuning and scaling).
  • 3+ years of experience with collective communications libraries like NCCL, RCCL, MPI and GPU frameworks like CUDA and ROCm.
  • Proficient with data structures, algorithms, operating systems
  • Excellent organizational, verbal, and written communication skills
  • Bachelors in computer science and Engineering or related engineering fields

Preferred Qualifications :

  • Masters / PhD degree in Computer Science or related engineering fields
  • Experience with distributed workload managers like Slurm or K8s
  • Experience with ML training frameworks like PyTorch, TensorFlow
  • Experience with Linux Performance tools
  • Experience in SDN, NFV, Cloud Networking
  • Experience in Infrastructure-as-a-Service, viz. OpenStack, AWS, GCP, Azure

Disclaimer :

Certain US customer or client-facing roles may be required to comply with applicable requirements, such as immunization and occupational health mandates.

Range and benefit information provided in this posting are specific to the stated locations only

US : Hiring Range : from $96,800 to $251,600 per annum. May be eligible for bonus, equity, and compensation deferral.

Oracle maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, market conditions and locations, as well as reflect Oracle’s differing products, industries and lines of business.

Candidates are typically placed into the range based on the preceding factors as well as internal peer equity.

Oracle US offers a comprehensive benefits package which includes the following :

1. Medical, dental, and vision insurance, including expert medical opinion

2. Short term disability and long term disability

3. Life insurance and AD&D

4. Supplemental life insurance (Employee / Spouse / Child)

5. Health care and dependent care Flexible Spending Accounts

6. Pre-tax commuter and parking benefits

7. 401(k) Savings and Investment Plan with company match

8. Paid time off : Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position.

Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years of employment.

Vacation accrual is prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation.

9. 11 paid holidays

10. Paid sick leave : 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours.

11. Paid parental leave

12. Adoption assistance

13. Employee Stock Purchase Plan

14. Financial planning and group legal

15. Voluntary benefits including auto, homeowner and pet insurance

The role will generally accept applications for at least three calendar days from the posting date or as long as the job remains posted.

30+ days ago
Related jobs
Promoted
VirtualVocations
Santa Clara, California
Remote

A company is looking for a Senior Software Engineer, Data - RemoteKey Responsibilities:Build a cutting-edge Cloud Native platform on top of the public cloudImprove the metrics pipeline and build algorithms for autoscaling statisticsWork on autoscale and Kubernetes operator for seamless Vertical and ...

Promoted
Apple
Sunnyvale, California

Apple Online Store is looking for a highly motivated senior software engineer to join the Apple Online Store Engineering team. As part of the AOS Application Engineering team, your responsibilities include building high throughput, dedication, reliable, server-side web services and features, working...

Promoted
VirtualVocations
Santa Clara, California

A company is looking for a Senior Software Engineer - Data Technology. ...

Promoted
MKS Instruments
Milpitas, California

BS degree in Electrical Engineering, Software Engineering, Physics, or related field. We are looking for an exceptional Software Engineer. In this position, you will contribute to the design and development of new free-space and fiber-based laser systems, providing firmware and software to enable ne...

Promoted
ESR Healthcare
San Jose, California

Skills : SIGINT, Electrical Engineering, Systems Engineering, Computer Engineering, TS/SCI Clearance, Engineering. The Senior Principal Systems Engineer is the essential link between our Customers and engineering teams to define concept of operations, architectures, and requirements for our ground-b...

Promoted
MX
San Jose, California

Our software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with their finances. You’ll work alongside the best and the brightest engineering talent in the industry. As a core participant of your team, you’ll estimate engineering e...

Oracle
Santa Clara, California

As a member of the software engineering division, you will apply intermediate to advanced knowledge of software architecture to perform software development tasks associated with developing, debugging or designing software applications or operating systems according to provided design specifications...

CommScope Inc.
Sunnyvale, California

Our team is hiring for an intermediate level  Software Engineer who can design and develop Cloud native Network Function , Virtual Network Function in IP Networking, Datapath, Control Path, Routing Protocols, Network Security in Network Space. Design and develop new software features and enhancement...

Oracle
Santa Clara, California

Design, develop, fix and debug software programs for databases, applications, tools, networks etc. As a member of the software engineering division, you will take an active role in the definition and evolution of standard practices and procedures. You will be responsible for defining and developing ...

Lumicity
Mountain View, California

Groundbreaking and high-growth automotive startup is seeking a Senior Embedded Software Engineer for their team near Mountain View, CA. BS in relevant engineering degree. At least 3+ years' experience with on device embedded software development. ...