We are seeking a skilled Datadog Engineer with expertise in instrumentation across Infrastructure, Application Performance Monitoring (APM), Synthetic monitoring, Database monitoring, and Real User Monitoring (RUM). The ideal candidate will have a strong background in Cloud (AWS and Azure), Terraform, and CI / CD implementation for Datadog, along with proficiency in creating dashboards, monitors, and log pipelines. Knowledge of other monitoring tools and concepts, as well as experience with ServiceNow and ITIL, is highly desirable.
Job Title : Sr. Software Engineer
Job Location : Richardson, Texas.
Start Date : As soon as possible
Key Responsibilities :
- Work closely with development, operations, and product teams to ensure monitoring solutions align with business goals.
- Create and maintain scripts and automation tools to streamline monitoring and alerting processes
- Produce and maintain clear documentation on monitoring setups, best practices, and troubleshooting procedures.
- Train team members and stakeholders on effective use and management of Datadog tools and features.
Incident Management :
Follow incident management process, ensuring timely resolution and minimizing service disruptions.Conduct root cause analysis and implement preventive measures to reduce recurring incidents.Develop and maintain incident response procedures and communication protocols.Change Management :
Manage the change management process, ensuring controlled and efficient implementation of changesAssess the impact of proposed changes and mitigate potential risks.Ensure compliance with change management policies and procedures.Metrics and Reporting :
Generate regular reports and dashboards to provide insights into service performance.Use data-driven insights to identify trends and drive continuous improvement.Transformation and Automation :
Identify opportunities for process automation and implement solutions to improve efficiency.Evaluate and implement new monitoring toolsKey Requirements :
Proven experience as a Datadog Engineer or in a similar role.Bachelor's degree in Computer Science, Information Technology, or a related field.Minimum of 5 years of experience in monitoring.Proven experience in incident management, change management, and problem management.Strong understanding of ITIL frameworks and best practices.Proven expertise in Datadog instrumentation and monitoring.Implement and manage Datadog instrumentation for infrastructure, APM, synthetic monitoring, database monitoring, and RUM.Create and manage Datadog dashboards, monitors, and log pipelines.Collaborate with cross-functional teams to ensure comprehensive monitoring coverage.Develop and maintain Terraform scripts for Datadog configuration.Design and implement CI / CD pipelines for Datadog integrations.Provide expertise in other monitoring tools and concepts.Proficiency in creating Datadog dashboards, monitors, and log pipelines.Familiarity with other monitoring tools and concepts.Experience with automation tools and technologies.Excellent analytical and problem-solving skills.Strong communication and interpersonal skills.Experience with cloud-based enterprise applications.Must have Skills :
Excellent analytical and troubleshooting skills to diagnose and resolve complex issues.Effective communication skills to collaborate with cross-functional teams and convey technical information clearly.Ability to thrive in a fast-paced environment, managing multiple tasks and projects simultaneously.Previous experience in a similar role or relevant industry experience is highly preferred. Knowledge of cloud platforms like AWS, Azure, or Google CloudPreferred Qualifications :
Certifications in AWS, Azure, or Datadog, ITIL / ITSMExperience in a DevOps or SRE role.Familiarity with scripting languages such as Python or Bash.