Software Engineer

NeuralFabric

NeuralFabric

Software Engineering
Bengaluru, Karnataka, India
Posted on Jan 22, 2026

Meet the Team

The AI Software & Platform team delivers AI products and platforms for Cisco Secure portfolios, enabling customers to defend against threats and safeguard their most critical assets with security resilience. Our passion lies in simplifying security with zero compromise—delivering secure, scalable, and reliable AI-driven experiences across Cisco.

Your Impact

As a Performance Engineer, you will be the technical driver for performance, scalability, and reliability across our AI platforms, cloud-native services, and user-facing experiences. You will partner deeply with engineering, SRE/DevOps, QA, and AI/ML teams to identify bottlenecks, define performance standards, and build reusable tooling that enables teams to ship fast without compromising latency, throughput, cost, or UI responsiveness. You’ll be hands-on in diagnosing complex distributed systems, building capacity models, and creating self-serve performance engineering capabilities that scale across teams.

  • Own end-to-end performance strategy across backend services, AI/ML workflows, and UI surfaces, including load, stress, soak, scalability, and reliability validation in production-like environments.
  • Lead capacity engineering by building demand forecasts and capacity models (RPS/concurrency, queue depth, saturation signals), defining headroom targets, and guiding scaling plans for GA readiness, peak events, and customer growth.
  • Drive deep performance analysis and profiling (CPU, memory, I/O, network) to identify bottlenecks and optimize application code, runtime behavior, and infrastructure efficiency across distributed systems.
  • Build and standardize performance observability using Splunk, SignalFx, Datadog, Grafana, and OpenTelemetry (metrics/logs/traces), creating dashboards that surface latency breakdowns, saturation signals, error rates, and critical dependency health.
  • Lead UI performance engineering by establishing measurable standards and regression checks for web apps (e.g., Core Web Vitals: LCP/INP/CLS), using browser profiling tools, and correlating UI responsiveness with backend latency and API behavior.
  • Drive Kubernetes and AWS performance tuning, including autoscaling behavior, resource requests/limits, node sizing, networking, and cost/performance tradeoffs.
  • Design and implement performance test frameworks (e.g., Python/Locust, k6/JMeter optional) with reusable scenarios, datasets, and workflow libraries for realistic end-to-end user journeys (UI → API → event-driven backend → data stores).
  • Evaluate performance of distributed workflows including microservices and event-driven pipelines (e.g., Kafka producers/consumers, retries/backoff, queue lag), caching layers (e.g., Redis), and multi-tenant databases (e.g., PostgreSQL/MySQL) with a focus on latency and throughput at scale.
  • Create internal tools and self-service platforms (Performance-as-a-Service) to automate benchmarks, integrate performance regression checks into CI/CD, and generate standardized reports, dashboards, and runbooks for engineering teams.
  • Develop AI/ML performance testing practices for LLM/agentic workflows: stage-level latency decomposition, token/throughput behavior, streaming response performance, rate-limit handling, retries/backoff tuning, and dependency bottleneck analysis.
  • Partner in design and architecture reviews, providing performance/capacity guidance on new services, data flows, authentication/authorization paths, and tenant isolation behaviors that impact scalability and reliability.
  • Communicate insights and recommendations to technical and leadership stakeholders through clear reports, postmortems, performance reviews, and data-backed prioritization.

Minimum Qualifications

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or related field (or equivalent practical experience).
  • 8+ years of experience in performance engineering, systems engineering, SRE, or related roles supporting distributed, cloud-native systems.
  • Strong experience with AWS and performance tuning in cloud environments (compute, networking, storage, load balancing).
  • Hands-on experience with Kubernetes, microservices architectures, and Docker.
  • Proven experience with observability/monitoring tools such as Splunk, SignalFx, Datadog, Grafana (metrics, logs, traces, dashboards, alerting).
  • Strong programming/scripting ability (e.g., Python; familiarity with services written in Go/Java is a plus).
  • Demonstrated ability to drive cross-team technical execution, influence architecture decisions, and operate effectively with geographically distributed stakeholders.

Preferred Qualifications

  • Experience testing and optimizing UI performance for enterprise web applications, including performance budgets, regression checks, and profiling using browser dev tools (and/or RUM where available).
  • Experience with capacity planning and forecasting, including headroom targets, scaling playbooks, and performance risk assessment.
  • Experience testing and optimizing AI/ML systems, including LLM-based workflows, orchestration layers, and RAG/agent pipelines.
  • Experience with OpenTelemetry instrumentation and end-to-end distributed tracing in microservices.
  • Familiarity with production profiling approaches such as pprof (Go), JFR/async-profiler (Java), py-spy (Python), flamegraphs, and/or eBPF-based profiling.
  • Experience with event-driven and data-intensive architectures (e.g., Kafka, async processing, streaming pipelines) and performance characteristics at scale.
  • Experience building internal developer platforms and frameworks that increase productivity (standard templates, test harnesses, automated reporting, dashboard generators).
  • Familiarity with multi-tenant SaaS concerns that affect performance (authN/authZ flows, tenant isolation, noisy-neighbor prevention).

Tooling & Technology Expectations (for the role)

  • Cloud & Runtime: AWS, Kubernetes, Docker, Linux
  • Observability: Splunk, SignalFx, Datadog, Grafana, OpenTelemetry
  • Performance Testing: Locust (Python), k6/JMeter (optional), synthetic + E2E testing
  • UI Performance: Core Web Vitals (LCP/INP/CLS), browser profiling, performance budgets, API-to-UI correlation
  • Distributed Systems: microservices, queues/streaming (Kafka), caches (Redis), databases (PostgreSQL/MySQL)
  • AI/ML Performance: LLM/agent workflows, token/throughput monitoring, rate limits, retries, streaming latency decomposition

Why Cisco?

At Cisco, we’re revolutionizing how data and infrastructure connect and protect organizations in the AI era – and beyond. We’ve been innovating fearlessly for 40 years to create solutions that power how humans and technology work together across the physical and digital worlds. These solutions provide customers with unparalleled security, visibility, and insights across the entire digital footprint.

Fueled by the depth and breadth of our technology, we experiment and create meaningful solutions. Add to that our worldwide network of doers and experts, and you’ll see that the opportunities to grow and build are limitless. We work as a team, collaborating with empathy to make really big things happen on a global scale. Because our solutions are everywhere, our impact is everywhere.

We are Cisco, and our power starts with you.