Job Description
Insight Global is looking for a Site Reliability Engineer (on prem) to join one of our largest clients in the Bay Area. This person will:
-Provide day-to-day operational support for production and pre-production environments
-Administer and support Jenkins (user management, pipeline reliability, upgrades, troubleshooting)
-Manage and operate Kubernetes clusters at scale (deployments, scaling, upgrades, monitoring)
-Maintain system reliability, uptime, and performance through proactive monitoring and incident response
-Participate in on-call rotations to support critical systems and resolve production issues
-Work closely with engineering teams to support CI/CD and platform reliability
-Support on-site operations 3–4 days per week (collaboration with infra and hardware teams)
This role can pay between $60-$75/hour depending on years of experience + skillset.
We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to HR@insightglobal.com.To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/.
Required Skills & Experience
-5+ years of experience in SRE, DevOps, or Systems Engineering roles
-Strong hands-on experience with Kubernetes in production
-Strong hands-on experience with Jenkins administration and CI/CD operations
-Experience supporting Linux-based systems in high-availability environments
-Comfort operating and troubleshooting complex infrastructure under SLA pressure
Nice to Have Skills & Experience
-Experience supporting GPU-based infrastructure (NVIDIA GPUs or similar)
-Hands-on exposure to GPU scheduling, health monitoring, and workload reliability
-Supporting ML/AI, compute-intensive, or accelerator-based workloads is a strong plus
-Familiarity with GPU drivers, firmware, and integration within Kubernetes environments preferred
Benefit packages for this role will start on the 1st day of employment and include medical, dental, and vision insurance, as well as HSA, FSA, and DCFSA account options, and 401k retirement account access with employer matching. Employees in this role are also entitled to paid sick leave and/or other paid time off as provided by applicable law.