Site Reliability Engineer II

NeuralFabric

NeuralFabric

Software Engineering, Other Engineering
Galway, Ireland
Posted on Feb 5, 2026

Meet the Team

Cisco’s Collaboration Business Unit empowers people and organizations worldwide to connect, communicate, and innovate seamlessly.

You will collaborate with a global team of software engineers and SREs responsible for delivering world-class collaboration experiences at scale. Our team supports backend services deployed worldwide and works closely with development, product, and operations partners to ensure reliability and performance.

Webex is powering the shift to the hybrid workforce, helping people stay connected in a rapidly evolving digital world. We foster a startup-like culture that values innovation, ownership, and collaboration, while offering the scale and impact of a global technology leader.

Your impact

As a Site Reliability Engineer (SRE) supporting backend services for Cisco’s SaaS collaboration products, you will play a critical role in delivering reliable, scalable, and resilient experiences across calling, messaging, meetings, and contact center solutions.

Your work will directly impact the availability, performance, and quality of services used by millions of users globally.

Specifically:

  • Own the deployment and operation of critical collaboration services across cloud and hybrid environments, driving reliability and scalability.

  • Design, evolve, and optimize CI/CD pipelines and automation, including AI‑first tooling for deployment, monitoring, and incident response.

  • Lead incident response for complex production issues, perform root cause analysis, and drive systemic reliability and performance improvements.

  • Use observability data to guide capacity planning, scaling strategies, and resource optimization across services.

  • Define and champion operational best practices, documentation standards, and a culture of reliability and operational excellence.

Minimum Qualifications

  • Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience) with 3–5 years in Site Reliability Engineering, Cloud Operations, or Systems Engineering.

  • Strong hands-on experience operating production services using Docker and Kubernetes in cloud or hybrid environments.

  • Proficiency in one or more programming or scripting languages (e.g., Python, Go, Bash) to build automation and operational tooling.

  • Experience with monitoring, observability, and incident response in production environments, including on-call participation and post-incident reviews.

  • Working knowledge of Linux systems, networking, distributed systems, CI/CD pipelines, infrastructure-as-code, and Git-based workflows.

Preferred Qualifications

  • Experience operating large-scale, globally distributed SaaS platforms.

  • Familiarity with hybrid cloud environments and multi-region deployments.

  • Experience applying AI-assisted or automation-first approaches to SRE tooling and workflows.

  • Strong written communication skills for creating clear operational documentation and runbooks.

Why Cisco?

At Cisco, we’re revolutionizing how data and infrastructure connect and protect organizations in the AI era – and beyond. We’ve been innovating fearlessly for 40 years to create solutions that power how humans and technology work together across the physical and digital worlds. These solutions provide customers with unparalleled security, visibility, and insights across the entire digital footprint.

Fueled by the depth and breadth of our technology, we experiment and create meaningful solutions. Add to that our worldwide network of doers and experts, and you’ll see that the opportunities to grow and build are limitless. We work as a team, collaborating with empathy to make really big things happen on a global scale. Because our solutions are everywhere, our impact is everywhere.

We are Cisco, and our power starts with you.