スキルハウスの採用情報

Lead Site Reliability Engineer (Incentive Platform)

役職：

Lead Site Reliability Engineer (Incentive Platform)

雇用形態:

正社員

給与:

12000000.00

勤務時間:

Japanese Level - None,English Level - Advanced (TOEIC 860)

職務内容

A Global IT Service firm is looking for a Lead Site Reliability Engineer for the Loyalty PointsPlatform Department.
As a Lead Site Reliability Engineer (Lead SRE), you will set the technical direction for the stable operation and continuous improvement of our mission-critical services.
You will define and drive initiatives to enhance service reliability, scalability, and performance across the team, spanning incident response, automation, monitoring, capacity planning, and observability strategy.

Responsibilities:
- Service Quality Definition & Achievement: Define Service Level Objectives (SLOs) and Service Level Agreements (SLAs). Lead the planning and execution of improvement activities to achieve them
- Performance & Latency Improvement: Identify bottlenecks in service performance and latency. Lead the team in proposing and implementing solutions, setting technical standards for performance work
- Incident Management & Troubleshooting: Act as incident commander during production outages, leading rapid restoration efforts. Drive Root Cause Analysis (RCA) processes and the implementation of systemic preventative measures
- Operational Efficiency & Automation: Champion the automation of operational processes to reduce toil. Architect scalable operational frameworks and establish best practices for the team
- Technical Leadership & Mentorship: Provide technical guidance and mentorship to SRE team members. Conduct technical design reviews, define engineering standards, and contribute to the overall skill development of the team
- Cross-functional Collaboration: Lead collaboration with product development teams, infrastructure teams, security teams, and other relevant departments. Foster a DevOps culture and drive alignment on reliability goals across the organization
- On-call: Participate in and help shape the 24/7 on-call rotation, including refining escalation paths and runbooks

Required Skills:
- More than 5 years of hands-on experience in SRE, infrastructure engineering, or a related field, with demonstrated technical leadership experience
- Experience building and operating production systems in public cloud (AWS, GCP, Azure, etc.) or private cloud environments
- Extensive experience designing, building, operating, and scaling Kubernetes environments
- Deep knowledge and hands-on experience building and operating modern monitoring, alerting, and logging tools (e.g., Prometheus, Grafana, ELK Stack, Datadog
- In-depth knowledge of UNIX-like operating system internals and/or networking
- Deep knowledge of IP network systems and protocols (TCP/IP, HTTP, etc.) and hands-on troubleshooting experience
- Experience building automated workflows using CI/CD tools (e.g., Jenkins, CircleCI, GitLab, CI/CD)
- Experience developing operational automation tools and scripts using scripting languages such as Shell, Python, etc.
- Proven track record of leading production incident handling end-to-end (detection, triage, short-term / long-term fix, root cause analysis)
- Experience in system performance tuning and capacity planning
- Proficiency with Git and GitHub for version control and collaboration
- Strong communication, negotiation, and collaboration skills to articulate complex technical issues and align with internal and external stakeholders

Why should you apply:
- Great opportunity to be able to work for well-established company
- Cross-departmental communication is possible in an international environment
- Drive innovation and maintain high standards for products utilized on a huge scale

Company Overview: This is a global company with a strong presence in multiple business sectors. It has achieved sustained growth both domestically and internationally, including in the U.S. and Europe. The company prides itself on its diverse and international environment, providing ample career opportunities and a commitment to equal opportunity. With a wide range of business activities, the company also works with various technologies. You can choose your preferred working environment, whether Windows or Mac! Meals at the employee cafeteria are free, and the chef regularly comes up with new menus, ensuring you never get bored of the meals!

Work Hours : 9:00 AM – 5:30 PM (Flextime or staggered working hours possible)
Work Style : Hybrid (Typically 4 days in the office, 1 day working from home)
Holidays : Saturdays, Sundays, public holidays, New Year holidays, paid leave, bereavement leave, and other special leave
Benefits: Full social insurance (employee pension insurance, health insurance, worker’s accident compensation insurance, unemployment insurance), DC pension plan, transportation allowance, childcare and caregiving support, cafeteria, retirement benefit system, welfare services (Relo Club), health counseling services, relocation support (visa support, moving), employee discounts (moving, language classes, etc.), and more
Interview Process: 3~4 times

スキルハウスで共に成長し、学び、成功を目指しませんか。

スキルハウスのこのポジションにご興味のある方は、ご連絡先をご記入の上、履歴書を添付してください。

　※右記個人情報は、採用選考のみに利用されます

東京都港区虎ノ門3-8-27巴町アネックス2号館

03-5408-5070

internalcareers@skillhouse.co.jp

スキルハウスの採用情報

Lead Site Reliability Engineer (Incentive Platform)

職務内容

スキルハウスで共に成長し、学び、成功を目指しませんか。

Internal Vacancy Form