###
Mercor is engaging advanced CUDA experts who specialize in GPU kernel optimization, performance profiling, and numerical efficiency. These professionals possess a deep mental model of how modern GPU architectures execute deep learning workloads. They are comfortable translating algorithmic concepts into finely tuned kernels that maximize throughput while maintaining correctness and reproducibility, ###
2) Key ResponsibilitiesDevelop, tune, and benchmark CUDA kernels for tensor and operator workloads. - Optimize for occupancy, memory coalescing, instruction-level parallelism, and warp scheduling. - Profile and diagnose performance bottlenecks using Nsight Systems, Nsight Compute, and comparable tools. - Report performance metrics, analyze speedups, and propose architectural improvements. - Collaborate asynchronously with PyTorch Operator Specialists to integrate kernels into production frameworks. - Produce well-documented, reproducible benchmarks and performance write-ups. ###3) Ideal QualificationsDeep expertise in CUDA programming, GPU architecture, and memory optimization. - Proven ability to achieve quantifiable performance improvements across hardware generations. - Proficiency with mixed precision, Tensor Core usage, and low-level numerical stability considerations. - Familiarity with frameworks like PyTorch, TensorFlow, or Triton (not required but beneficial). - Strong communication skills and independent problem-solving ability. - Demonstrated open-source, research, or performance benchmarking contributions. ###4) More About the OpportunityIdeal for independent contractors who thrive in performance-critical, systems-level work. - Engagements focus on measurable, high-impact kernel optimizations and scalability studies. - Work is fully remote and asynchronous; deliverables are outcome-driven. - Access to shared benchmarking infrastructure and reproducibility tooling via Mercor support resources. ###5) Compensation & Contract TermsTypical range :$120–$250 / hour, depending on scope, specialization, and results achieved. Payments will be based on accepted task output over flat hourly. - Structured as acontract-based engagement, not an employment relationship. - Compensation tied to measurable deliverables or agreed milestones. - Confidentiality, IP, and NDA terms as defined per engagement. ###6) Application ProcessSubmit a brief overview of prior CUDA optimization experience, profiling results, or performance reports. - Include links to relevant GitHub repos, papers, or benchmarks if available. - Indicate your hourly rate, time availability, and preferred engagement length. - Selected experts may complete a small, paid pilot kernel optimization project ###7) About MercorMercorconnects domain experts with top AI research and technology organizations through project-based contracts. - Contractors operate independently, with full flexibility over methods, timelines, and tools. - Our mission is to help top engineers and researchers access frontier technical work without rigid employment structures.