Site Reliability Engineer

Walmart Stores SUNNYVALE, CA

About the Job

Position Description


Site Reliability Engineers are hybrid systems and software engineers who are responsible and take ownership for reliability, scalability, automation, and other issues related to uptime and availability of VUDU service. Our goal is to build, scale and guard the systems that delights the customers.
• Running a 24/7 high-quality video streaming service in a fast paced startup-like environment.
• Designing, writing and building of tools to improve the reliability, security, availability and scalability.
• Having a mindset to prevent re-occurrence of problems by identifying and fixing root causes.
• Participation in capacity planning, performance analysis and systems tuning.
• Developing a deep understanding of the various services and applications that come together to deliver VUDU service.
• Participation in influence, design and creation of new architectures, standards and methods.
• Participation in on-call rotation.
• Experience with configuration management tools such as Ansible, Saltstack, Chef or Puppet.
• Building and driving automated systems that maintain system health.

Minimum Qualifications


• Bachelor’s degree in Computer Science or a related field and at least 4 years of experience in large scale infrastructure management and automation.
• Programming knowledge – proven Python experience.
• Master of Linux shell and system internals.
• Hands-on experience with container technologies (Docker, Mesos etc).
• Hands-on experience with databases.
• Hands-on experience with network internals.
• Love for debugging: troubleshooting and debugging Tomcat, HAproxy, Nginx, Apache Software, Java, databases etc.
• Dive into and navigate unfamiliar codebase.
• Good analytics, troubleshooting skills and intuition about probable root cause.

Additional Preferred Qualifications



Company Summary


The Walmart eCommerce team is rapidly innovating to evolve and define the future state of shopping. As the world’s largest retailer, we are on a mission to help people save money and live better.  With the help of some of the brightest minds in technology, merchandising, marketing, supply chain, talent and more, we are reimagining the intersection of digital and physical shopping to help achieve that mission.

Position Summary


Site Reliability Engineers are hybrid systems and software engineers who are responsible and take ownership for reliability, scalability, automation, and other issues related to uptime and availability of VUDU service. Our goal is to build, scale and guard the systems that delights the customers.
• Running a 24/7 high-quality video streaming service in a fast paced startup-like environment.
• Designing, writing and building of tools to improve the reliability, security, availability and scalability.
• Having a mindset to prevent re-occurrence of problems by identifying and fixing root causes.
• Participation in capacity planning, performance analysis and systems tuning.
• Developing a deep understanding of the various services and applications that come together to deliver VUDU service.
• Participation in influence, design and creation of new architectures, standards and methods.
• Participation in on-call rotation.
• Experience with configuration management tools such as Ansible, Saltstack, Chef or Puppet.
• Building and driving automated systems that maintain system health.