Service Reliability Engineer - 16547 (BBBH325) Sunnyvale, California

Salary: Market Rates


Do you have a passion for solving technical problems, from the network layer to the application? Spend time trying to figure out how something works, not stopping with knowing just that it does? Want to make real web applications and back-end systems faster, more reliable, more efficient?

The Service Reliability Engineering team is seeking a talented Rapid Response Engineer to play a vital role in a team that runs critical operations and systems engineering for our most popular internet sites. This position requires an aggressive troubleshooter who can multitask on problems of varying difficulty, priority and time-sensitivity. This versatile position requires familiarity with all the support concepts of busy web sites: Systems and database administration; Networking; Process troubleshooting; QA and rollout automation.

Responsibilities:

  • Identify the priority and criticality of incoming alerts and prioritize appropriately. Diagnose & repair issues using critical knowledge of Apache, UNIX processes, MySQL and related technologies within the OSI stack.
  • Track issues through the ticketing systems and follow through to resolution
  • Utilize monitoring tools to proactively identify issues and trends
  • Write clear and concise operational runbooks
  • Escalate significant issues to service, network or other operations engineers
  • Lead by example, deliver results and eliminate missed opportunities
  • Ideal candidate will possess a broad range of computer science skills. The candidate must be persistent, result oriented, and a self starter.


Basic skills:

  • The candidate should have 2 or more years experience in technical operations and additional exposure to tool/product development.
  • Knowledge of Unix/Linux, Apache, performance tuning concepts, and web applications is a must.
  • SQL experience (mysql, Oracle) is a plus