Job Description
We are seeking a detail-oriented and technically skilled Observability Engineer to join our Reliability Engineering & Operations (REO) team. In this role, you will be responsible for configuring, maintaining, and continuously improving the observability stack — spanning monitoring, dashboarding, alerting, and application performance management (APM) across our AWS-hosted production environments. You will lead the standup of our observability capabilities during an active proof-of-concept (PoC) period, establishing the tooling, instrumentation patterns, and alerting standards that the broader REO organization will rely on. This is a practitioner role for someone who is passionate about making complex systems legible — turning raw telemetry into actionable insight.
We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to HR@insightglobal.com.To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/.
Required Skills & Experience
Required Qualifications
• 2–4 years of experience in an observability, monitoring, SRE, or DevOps engineering role with a strong focus on instrumentation and telemetry
• Hands-on experience with AWS observability tooling including CloudWatch (metrics, logs, alarms, dashboards) and X-Ray (distributed tracing)
• Experience configuring and administering at least one third-party APM or observability platform (e.g., Datadog, Dynatrace, New Relic, Grafana, or similar)
• Working knowledge of log management and aggregation (e.g., CloudWatch Logs, OpenSearch, Splunk, or similar)
• Experience building operational dashboards that serve diverse audiences including NOC, engineering, and leadership
• Familiarity with distributed systems, microservices architectures, and the instrumentation challenges they present
• Scripting proficiency in Python, Bash, or similar for automation and telemetry configuration
• Strong analytical skills and a methodical approach to signal-to-noise optimization in alerting
• Experience working in a HIPAA-regulated or compliance-driven environment
Nice to Have Skills & Experience
• AWS certifications (Cloud Practitioner, SysOps Administrator, or equivalent)
• Experience with OpenTelemetry (OTel) for vendor-agnostic instrumentation
• Familiarity with infrastructure-as-code tooling (Terraform, CloudFormation) for managing observability configuration as code
• Experience supporting Java or .NET-based enterprise SaaS application observability
• Exposure to SLI/SLO frameworks and reliability engineering practices
• Experience in healthcare IT or payer technology environments
Benefit packages for this role will start on the 1st day of employment and include medical, dental, and vision insurance, as well as HSA, FSA, and DCFSA account options, and 401k retirement account access with employer matching. Employees in this role are also entitled to paid sick leave and/or other paid time off as provided by applicable law.