Job Description
Insight Global is seeking an SRE (Site Reliability Engineer) for a top data and analytics client. This role focuses on ensuring platforms and services are truly production-ready by identifying gaps, risks, and opportunities for improvement as solutions move toward release. The SRE will provide deep visibility into system health through dashboards and observability tooling, enabling teams to clearly understand how platforms and services are performing in production. The ideal candidate brings strong experience with monitoring and reliability, a clear point of view on operational excellence, and the ability to communicate findings effectively across engineering and leadership teams.
We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to HR@insightglobal.com.To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/.
Required Skills & Experience
5+ years of experience as an SRE, Platform Engineer, or Reliability-focused role
Strong experience supporting production environments
Ability to identify gaps and risks when moving solutions from pre-prod to production
Experience evaluating what is done well vs. not well in production readiness
Strong experience building and maintaining observability dashboards
Hands-on experience with monitoring, logging, and alerting
Experience with Coralogix (default tech)
Strong communication skills; able to clearly articulate findings and risks
Experience working with platforms and shared services, not just single applications
Solid understanding of reliability, scalability, and operational best practices
Nice to Have Skills & Experience
Experience with Grafana
Openness and experience with alternative observability tools that increase platform visibility
Experience with distributed systems and microservices
Exposure to cloud-based production environments
Experience working alongside engineering and QA teams during release cycles
Background supporting high-traffic, enterprise-scale systems
Benefit packages for this role will start on the 1st day of employment and include medical, dental, and vision insurance, as well as HSA, FSA, and DCFSA account options, and 401k retirement account access with employer matching. Employees in this role are also entitled to paid sick leave and/or other paid time off as provided by applicable law.