Site Reliability Engineer
NeuralFabric
Please note this posting is to advertise potential job opportunities. This exact role may not be open today but could open in the near future. When you apply, a Cisco representative may contact you directly if a relevant position opens.
Meet the Team
Cisco’s Collaboration Business Unit empowers people and organizations across the globe to connect, communicate, and innovate seamlessly. As an SRE supporting the backend services of the BU’s SaaS products, you will play a critical role in ensuring reliable, scalable, and resilient collaboration experiences for calling, messaging, meetings and contact center. Your work will directly impact on the quality, availability, and performance of our services, helping deliver outstanding collaboration experiences to millions of users worldwide.
You will be working with a team of software engineers, responsible for delivering world-class collaboration experiences to millions of users globally. Our SRE team ensures the reliability, scalability, and performance of the backend services deployed globally. We offer you outstanding professional software product development experience in a friendly atmosphere of a startup-like team culture.
Webex is powering the shift to the hybrid workforce! People spend more and more time using our digital collaboration tools, and we want to make sure that they feel truly connected! We hire great people who are eager to contribute to our culture, and we empower them to do just that. We take pride in thinking beyond our day-to-day job descriptions and encourage you to actively seek opportunities to build the type of work environment that you want to be a part of.
Your Impact
You are a natural problem solver who thrives in fast-paced, incident response situations, thinking on your feet to resolve issues quickly and effectively. You excel at monitoring complex systems, interpreting metrics, and proactively identifying opportunities to improve reliability and performance. Creative and resourceful, you are passionate about building innovative tools—leveraging AI-first approaches—to streamline SRE functions. Friendly, independent, and eager to learn, you bring both technical excellence and a collaborative spirit to every challenge.
On a day-to-day basis this role involves:
- Deploy, and operate collaboration core services in cloud and hybrid environments.
- Develop, maintain, and execute automated CI/CD pipelines.
- Monitor and troubleshoot production systems, ensuring high availability and rapid incident response (including on-call rotations via PagerDuty).
- Perform root cause analysis, proactively address system reliability and performance issues.
- Create, interpret, and act on system and application metrics, logs, and traces.
- Capacity planning: Forecast usage trends, optimize scaling, and manage resource allocations.
- Author and maintain runbooks, best practices, and operational documentation.
- Leverage AI-first development tools and prompt-based design/implementation strategies to build tools and automation for deployment, monitoring, testing, and incident response.
- Championing a culture of reliability, continuous improvement, and operational excellence.
Minimum Qualifications
- Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience) with 3+ years of experience in Site Reliability Engineering, Cloud Operations, or Systems Engineering.
- Hands-on experience with deploying and managing services using containers (Docker), orchestration (Kubernetes) and virtualized platforms
- Strong scripting or programming skills (Python, Bash, Go, etc.).
- Expertise in monitoring, observability, and alerting tools (Prometheus, Grafana, ELK, etc.).
- Solid understanding of networking, security, distributed systems, Linux systems, databases (SQL/NoSQL/In Memory), and web application architectures.
- Experience with Infrastructure as code (Terraform), CI/CD pipelines, Git.
- Experience in managing on-call rotations, incident response, and post-mortems.
Preferred Qualifications
- Knowledge of AI-first development workflows and ability to leverage prompt engineering for faster solutioning.
Why Cisco?
At Cisco, we’re revolutionizing how data and infrastructure connect and protect organizations in the AI era – and beyond. We’ve been innovating fearlessly for 40 years to create solutions that power how humans and technology work together across the physical and digital worlds. These solutions provide customers with unparalleled security, visibility, and insights across the entire digital footprint.
Fueled by the depth and breadth of our technology, we experiment and create meaningful solutions. Add to that our worldwide network of doers and experts, and you’ll see that the opportunities to grow and build are limitless. We work as a team, collaborating with empathy to make really big things happen on a global scale. Because our solutions are everywhere, our impact is everywhere.
We are Cisco, and our power starts with you.