A leading Global Financial digital consulting firm is looking for a Site Reliability Engineer who will be responsible for keeping all user-facing services and other production systems running smoothly.
The Satisfaction Engineering Team exists to ensure and deliver satisfaction across the company solutions and services, and the reliability of our service offerings is the foundation customer satisfaction and trust in our brand.
You will be a problem-solver, with the following responsibilities:
- Use your shift to prevent incidents from happening.
- Document actions taken, so your findings turn into repeatable actions–and then into automation.
- Design, build and maintain core infrastructure pieces that enable support of hundreds of thousands of concurrent users.
- Debug production issues across services and levels of the stack.
- Contribute improvements to the codebase to resolve issues
- Identify changes for the product architecture from the reliability, performance and availability perspective with a data driven approach.
- Identify parts of the system that do not scale, apply immediate palliative measures and drive long term resolution of these incidents.
- Identify Service Level Indicators (SLIs) that will align the team to meet the availability and latency objectives.
- Know a domain really well and radiate that knowledge through recorded demos, discussions in DNA meetings, or Incident Reviews
- Perform and run blameless RCAs on incidents and outages aggressively looking for answers that can prevent the incident from ever happening again.
- Be able to de-escalate conflicts inside the team
【会社概要 | Company Details】
Our client is a global consulting firm that established a Japanese corporation in 2014. They have strengths in M&A and business strategies, boasting an error rate close to zero.
【就業時間 | Working Hours】
9:15 - 17:15（Mon - Fri）
(Working from home during the Corona emergency situation.)
【休日休暇 | Holidays】
Saturday, Sunday, and National Holidays, Year-end and New Year Holidays, Paid Holidays, Other Special Holidays
【待遇・福利厚生 | Services / Benefits】
各種社会保険完備（厚生年金保険、健康保険、労災保険、雇用保険）、 屋内原則禁煙（屋外に喫煙所あり）、 通勤交通費支給等
Social insurance, Transportation Fee, No smoking indoors allowed (Designated smoking area), Casual clothes are acceptable, etc.
- Experience as a Cloud, DevOps or Reliability engineer
- Work closely with engineering teams to create and improve containerized technologies
- Able to collaborate in a global team environment, actively engage subject matter experts, and follow through on commitments
- Strong problem solving (debugging) skills. The ability the dissect, divide and conquer platform problems and find root cause
- Knowledge of Microsoft Azure and / or AWS and / or GCP is a must (Azure preferred)
- Scripting knowledge in PowerShell and / or Python
- Version control experience (Git)
- Knowledge of container orchestration technologies (Kubernetes)
- Knowledge of container technologies (Docker)
- CI/CD knowledge