The team is responsible for infrastructure systems, including Storage/Computing/DB. We aim to be the leading SRE team across the industry. In the SRE team, you will have the opportunity to manage the complex challenges of scale, while using expertise in coding, algorithms, complexity analysis, and large-scale system design. We embrace a culture of diversity, intellectual curiosity, openness, and problem-solving. We also encourage ownership, self-governance and independence to work on various projects, and an environment that provides the support and mentorship needed to learn and grow as an engineer.
Responsibilities
- Reliability: Ensuring the reliability and efficiency of our core infrastructure, focusing on system capacity and stability; setting up reliability standards and recovery SOP.
- Reliability: Troubleshooting and locating technical issues, bottleneck analysis, managing system high availability architecture transformation and upgrading.
- Efficiency: Building automated operation solutions for large-scale systems; partnering with system development teams for system iteration.
- Efficiency: Designing and implementing software platforms and monitoring frameworks for efficient, automated, and intelligent service-oriented architecture (SOA) governance.
- Cost: There are millions of CPUs. We should build delivery standards, and monitor and budget systems to optimize the cost of the company.
- Compliance: Designing and setting up new IDC; designing and implementing a data protection plan to meet the standard requirement.
Report job