Site Reliability Engineer

Post Date

Jun 11, 2026

Location

Chandler,
Arizona

ZIP/Postal Code

85224
US
Aug 09, 2026 Insight Global

Job Type

Contract

Category

Computer Engineering

Req #

CLT-c05ae9dd-5315-41e1-91e5-4c310cc1900c

Pay Rate

$56 - $70 (hourly estimate)

Job Description

Site Reliability Engineer (SRE) for Google Cloud Platform (GCP), focused on building, maintaining, and improving the reliability, scalability, and performance of cloud infrastructure using Infrastructure as Code (IaC) and Terraform Enterprise. Supports the delivery of secure, compliant, and highly available cloud environments aligned with enterprise standards and regulatory requirements.
Works closely with engineering and platform teams to develop and maintain reusable IaC modules, Terraform configurations, and automated cloud services, enabling consistent and efficient infrastructure provisioning. Contributes to the implementation of standardized platform patterns, including networking, identity, logging, and monitoring capabilities.
Participates in the end-to-end lifecycle of cloud infrastructure, including deployment, monitoring, incident response, and continuous improvement. Helps implement and maintain CI/CD pipelines, policy-as-code frameworks, and automation solutions to ensure reliable and repeatable deployments.
Applies SRE principles and practices, including monitoring, alerting, incident management, and root cause analysis, to improve system reliability and reduce operational risk. Supports the definition and tracking of service performance through metrics such as availability and latency.
Collaborates with architecture, security, and engineering teams to ensure infrastructure is secure, compliant, and operationally resilient. Contributes to DevSecOps practices by integrating security and compliance controls into automated workflows.
Continuously identifies opportunities to improve system reliability, reduce manual effort, and enhance automation. Leverages emerging tools and technologies, including AI/ML where applicable, to support proactive operations, observability, and platform stability.
Key Responsibilities
• Design, develop, and maintain Google Cloud Platform (GCP) infrastructure using Infrastructure as Code (IaC) with Terraform Enterprise
• Contribute to the implementation of scalable, secure, and compliant cloud solutions aligned with enterprise standards
• Develop and maintain reusable Terraform modules and standardized infrastructure patterns to enable consistent and automated provisioning of GCP resources
• Follow and contribute to code quality standards, design patterns, and peer review practices to ensure reliable and maintainable infrastructure code
• Support the adoption and use of Terraform Enterprise for automated provisioning, policy enforcement, and infrastructure governance
• Implement and maintain cloud automation workflows, including provisioning, configuration management, and environment setup
• Build and enhance CI/CD pipelines for infrastructure delivery, ensuring automated testing, validation, and compliance checks
• Implement policy-as-code and security controls, ensuring infrastructure meets regulatory and enterprise compliance requirements
• Participate in the end-to-end lifecycle of infrastructure delivery, including deployment, monitoring, and continuous improvement
• Collaborate with architecture, security, and engineering teams to ensure secure, resilient, and compliant cloud configurations
• Apply DevSecOps and cloud-native practices to improve automation, security, and deployment efficiency
• Contribute to observability, logging, and monitoring solutions to support proactive incident detection and response
• Execute testing and validation of IaC modules, including integration and deployment verification
• Identify opportunities to automate manual processes and improve operational efficiency
• Support reliability, scalability, and performance of cloud platforms through automation and standardization
• Troubleshoot and resolve infrastructure and platform issues, contributing to root cause analysis and continuous improvement
• Work with stakeholders to implement infrastructure solutions that meet technical and business requirements
• Evaluate and adopt emerging tools and technologies to enhance automation, reliability, and platform performance
• Conduct performance testing and capacity planning to ensure systems scale reliably under load.
• Optimize system performance, latency, and resource utilization across cloud environments
• Design and implement observability solutions, including metrics, logs, traces, and alerting strategies.
• Reduce alert fatigue and improve signal quality through meaningful alert design and tuning.
• Develop dashboards and monitoring frameworks aligned to SLOs."

We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to HR@insightglobal.com.To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/.

Required Skills & Experience

• 5+ years of experience in cloud infrastructure engineering, site reliability engineering, or platform engineering, with a strong focus on Google Cloud Platform (GCP)
• Proven experience delivering and supporting large-scale, highly available cloud systems, with a focus on reliability, scalability, and performance
• Deep hands-on expertise in Infrastructure as Code (IaC), with strong proficiency in Terraform Enterprise and automated infrastructure provisioning
• Strong background in software engineering practices, including code quality, modular design, version control, and automated testing for infrastructure code
• Extensive experience designing and implementing reusable Terraform modules and standardized infrastructure patterns
• Strong experience building and maintaining CI/CD pipelines for infrastructure, including automated validation, testing, and deployment
• Demonstrated expertise in SRE and DevSecOps practices, including monitoring, alerting, incident response, and embedding security and compliance into automated workflows
• Advanced knowledge of GCP services, cloud architecture patterns, and networking concepts (VPCs, IAM, load balancing, hybrid connectivity)
• Experience implementing policy-as-code, governance frameworks, and compliance controls in regulated enterprise environments (financial services preferred)
• Strong understanding of observability, monitoring, logging, and performance tuning to ensure system reliability and operational excellence
• Hands-on experience managing production incidents, performing root cause analysis, and implementing long-term reliability improvements
• Proven ability to drive automation, standardization, and reliability improvements across cloud platforms
• Experience mentoring and supporting engineers, contributing to team capability and engineering best practices
• Ability to collaborate effectively with engineering, architecture, security, and risk teams to deliver secure and resilient solutions
• Strong analytical and problem-solving skills, with the ability to make sound technical decisions under pressure
• Excellent communication skills, with the ability to articulate technical issues, trade-offs, and solutions to diverse stakeholders
• Experience evaluating and integrating emerging technologies (including AI/ML-driven operations) to enhance reliability, automation, and platform efficiency

Benefit packages for this role will start on the 1st day of employment and include medical, dental, and vision insurance, as well as HSA, FSA, and DCFSA account options, and 401k retirement account access with employer matching. Employees in this role are also entitled to paid sick leave and/or other paid time off as provided by applicable law.