Talent.com
Datadog SME

Datadog SME

Kutir TechnologiesNorfolk, Massachusetts, USA
1 day ago
Job type
  • Full-time
Job description

Role Title : Datadog SME

Location : Norfolk VA / Richmond VA / Atlanta GA / Texas (Hybrid)

Job Description

Monitoring Strategy & Implementation Design and deploy Datadog dashboards for both application and database domains.

Configure alerting logic using Datadog monitors composite alerts and anomaly detection.

Develop custom scripts (Python Bash PowerShell) to support dynamic alerting and data enrichment.

Application Monitoring

Integrate Datadog APM with microservices ( ) and front-end layers.

Monitor service health latency error rates and throughput.

Enable distributed tracing and log correlation for root cause analysis.

Configure synthetic tests and real-user monitoring (RUM) for critical endpoints.

Database Monitoring (Oracle & SQL Server)

Set up Datadog integrations for Oracle and SQL Server to track :

Query performance blocking sessions deadlocks.

Connection pool usage replication lag backup status.

Tablespace utilization I / O latency and cache hit ratios.

Automate alerting for threshold breaches and unusual patterns using scripting.

Scripting & Automation

Build reusable scripts to :

Generate dynamic dashboards based on metadata.

Auto-adjust alert thresholds based on historical trends.

Integrate Datadog with ServiceNow for incident creation.

Maintain version-controlled script repositories and CI / CD pipelines for observability assets.

Stakeholder Collaboration

Engage with application and database teams to gather monitoring requirements.

Conduct workshops and KT sessions for support teams on dashboard usage and alert triage.

Partner with SREs to align monitoring with SLIs / SLOs and reliability goals.

Reporting & Governance

Generate weekly / monthly reliability reports using Datadog analytics.

Maintain SOPs runbooks and RCA documentation.

Ensure compliance with enterprise monitoring standards and audit readiness.

SkillSet

5 years in cloud monitoring and reliability engineering.

Proven expertise in Datadog (APM Infrastructure Logs Monitors Dashboards).

Hands-on experience with Oracle and SQL Server monitoring.

Strong SQL and PL / SQL skills for Oracle and SQL Server.

Proficiency in scripting (Python Bash PowerShell) for automation

Experience in configuring complex alerting logic using Datadog monitors.

Understanding of application architectures ( ).

Familiarity with ITSM tools (ServiceNow) CI / CD (Jenkins Git) and cloud platforms (Azure AWS).

Excellent communication and documentation capabilities.

Nice-to-Have Skills :

Datadog certification.

Knowledge of healthcare domain and compliance standards.

Experience with CI / CD tools and infrastructure as code (Terraform Ansible).

Employment Type : Full Time

Experience : years

Vacancy : 1

Create a job alert for this search

Sme • Norfolk, Massachusetts, USA