Job Description
Site Reliability Engineer (SRE) responsible for ensuring the reliability, availability, and performance of large-scale, cloud‑native services operating within a Google Cloud Platform (GCP) environment. This role partners closely with engineering teams to design resilient systems, define and measure service reliability using SLOs and SLIs, and manage error budgets to balance innovation with stability. The SRE leads incident management efforts, including on‑call response, incident coordination, root cause analysis, and post‑incident reviews, with a strong focus on reducing mean time to recovery and preventing recurrence through automation and engineering improvements. The ideal candidate brings deep experience in GCP services, infrastructure as code, monitoring and observability, and a calm, structured approach to operating high‑availability systems under pressure.
We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to HR@insightglobal.com.To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/.
Required Skills & Experience
3+ years of experience with DevOps and SRE principles, including CI/CD pipelines, infrastructure as code, and proactive monitoring.
3+ years of experience with Google Cloud Platform services (preferably data-related services like BigQuery, DataProc, composer and Cloud Storage).
3+ years of experience with scripting and automation tools (e.g., Python, Bash).
Benefit packages for this role will start on the 1st day of employment and include medical, dental, and vision insurance, as well as HSA, FSA, and DCFSA account options, and 401k retirement account access with employer matching. Employees in this role are also entitled to paid sick leave and/or other paid time off as provided by applicable law.