Site Reliability Engineer

Post Date

Sep 03, 2024

Location

San Diego,
California

ZIP/Postal Code

92108

Job Type

Perm

Job Description

We are looking for a Senior Site Reliability Engineer with experience working in large-scale mission-critical environments with zero downtime. This SRE team is a mix of DBA/SRE/DBRE oriented folks whose overarching goal is to provide highly available data services at scale. They strive to build an extremely reliable, performant, and secure database infrastructure through the skillful use of automation. This team is responsible for providing new architectures and scalability solutions to ever-growing business and data processing needs.
Job Responsibilities
Work closely with cross-functional teams to ensure the company has the right set of tools to generate, collect, analyze, visualize and alert on operational data.
Participate in an on-call rotation to ensure 24/7/365 availability of company's production system. Own & operate critical open-source services like Elasticsearch,
Kafka, RabbitMQ, Redis. Build tools and design processes that help improve observability and system resiliency of the platform.
Triage Site Availability Incidents and proactively work towards reducing MTTR for customer impacting incidents.
Partner with Service owners to implement Service Level Metrics & Service Level Objectives. Establish design patterns for monitoring, benchmarking and deploying new features for the backend services.
Develop and maintain technical documentation, network diagrams, runbooks, and procedures. Increase efficiency, respond to production incidents and prevent repeatable issues, improve the reliability and performance of the infrastructure.

We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to HR@insightglobal.com.

To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/ .

Required Skills & Experience

MUST SPEAK MANDARIN
Bachelors degree or a foreign equivalent in Computer Science, Information Systems or a related field, plus 3 years of experience in the job offered or as a Computer Systems Engineer, Software Engineer or related job titles.
3 years of experience with supporting mission-critical, real-time, high-traffic applications in cloud environments
Knowledge of Cloud systems, continuous integration/build systems, Java, SQL and NoSQL databases

Nice to Have Skills & Experience

Experience with observability tools such as Grafana, Prometheus, Zabbix
Proficiency scripting/programming languages (Python or GoLang)
One or more OSS technologies (Elasticsearch, Kafka or Redis);
Experience with container technology like Docker, Kubernetes, Mesos.

Benefit packages for this role will start on the 1st day of employment and include medical, dental, and vision insurance, as well as HSA, FSA, and DCFSA account options, and 401k retirement account access with employer matching. Employees in this role are also entitled to paid sick leave and/or other paid time off as provided by applicable law.