Pantomath is an automated data operations platform that addresses the challenges organizations face with data reliability, where manual, time-consuming processes and reliance on tribal knowledge often hinder effective problem resolution.
By automating data monitoring and impact analysis, Pantomath streamlines operations and improves data confidence, quality, and reliability. Enterprises using Pantomath significantly reduce mean time to acknowledgement, mean time to root cause, and mean time to resolution of all data issues across their entire data ecosystem.
The Platform Engineer role at Pantomath is responsible for leading the development, maintenance, administration, and continuous evolution of our core infrastructure and platform layer. In addition to continuous operational upkeep, this position encompasses strategic initiatives focused on significantly enhancing scalability, rigorously bolstering security protocols, meticulously optimizing cost-efficiency, and dramatically accelerating developer velocity across the entire organization.
The ideal candidate will possess a deep understanding of modern cloud architectures, infrastructure-as-code principles, and a proven track record of designing and implementing robust, reliable, and performant platform solutions. This role offers a unique opportunity to shape the future of our technology stack, empowering our development teams to deliver innovative products faster and more securely, while ensuring optimal uptime and efficient resource utilization.
Key Responsibilities
Platform & Infrastructure Operations
- Own and maintain Pantomath’s cloud infrastructure (AWS), including EC2, EKS, IAM, ALB, RDS, and S3
- Automate, optimize and fine-tune infrastructure using IaC (Terraform & CDK) and CI / CD workflows (GitHub Actions, NX)
- Support cross-functional teams (Dev, Customer Service, QA) in resolving infra-related issues
- Manage BAU operations : backups, credential rotation, log retention, sysadmin tasks, etc.
- Respond to platform-related incidents and own incident resolution runbooks
- Partner with leadership to ensure SOC2-compliant infrastructure practices
- Ensure security and compliance adherence by applying least privilege and zero trust design patterns to authorization, authentication, networking, and runtime threat detection
Cost Optimization & Efficiency
Set up cost dashboards, run bi-weekly reviews, and identify inefficienciesImplement cost-saving strategies (resource right-sizing, idle shutdowns, soft limit avoidance)Lead migration to shared ALB patterns and optimize EKS autoscalingDeveloper Experience & Tooling
Refactor and clean up IaC repositories (remove unused resources, consolidate stacks)Migrate manual provisioning to fully automated CI / CD flowsReduce friction in local development and dev / staging environmentsObservability & Monitoring
Evaluate and implement observability tools (e.g., Datadog, CloudWatch, Prometheus)Help define standards for logging, metrics, and alerting across the platformSupport agent observability for connector servicesPlatform Strategy & Scalability
Contribute to our multi-region readiness strategy and AWS scaling plansProactively address AWS service soft limits and scalability bottlenecksDrive infrastructure roadmap and platform strategy in partnership with leadershipQualifications
Education and Experience
Bachelors Degree or equivalent, preferably in Computer Science, Information Systems or a related field4+ years of experience in platform, DevOps, or SRE roles; ideally with a startupRequired Skills and Competencies
Strong AWS expertise across core servicesDeep hands-on experience with Terraform or similar IaC toolsSolid CI / CD knowledge (preferably GitHub Actions or similar)Experience with observability tooling (e.g., Datadog, Prometheus, CloudWatch)Strong understanding of security best practices (least-privilege, secret management, etc.)Ability to own initiatives end-to-end and work cross-functionallyPreferred Skills and Competencies
Experience with multi-region architecturesPrior work in a SOC2-compliant environmentTrack record of reducing cloud spend at scaleFamiliarity with container networking, ALB / NGINX routing, and EKS tuning