NextOps
Slack-native runbooks for DB access, job termination, feature toggles, and on-call — with live RBAC and Temporal-backed execution.
Open demoMember of Technical Staff at Salesforce with 5.5+ years building production ops platforms. I architected NextOps (Slack ChatOps), Cosmic AI (manifest-driven LLM agents on Temporal), SSP (governed change execution), and Ops-Sage (alert auto-remediation) — replacing ticket queues with safe, auditable self-service. Deep in Kubernetes, Terraform, Temporal, and FinOps-aware platform design.
These are not concept slides — they are interactive replicas of the systems I delivered in production. Explore how I combine Slack-native ChatOps, Temporal orchestration, LLM agents, and governed GitOps to replace ticket queues with safe, auditable self-service.
What I bring to your team: faster incident resolution, 50% faster emergency releases, FinOps-aware infra design, and platforms engineers actually adopt — because they live in Slack and Teams, not another portal.
Live demos are built for desktop — open this site on a laptop or wider screen to explore the interactive playgrounds.
Four products I shipped — NextOps and Cosmic AI for intake, SSP for governed execution, Ops-Sage for automated alert response — on Temporal, RBAC, and a complete audit trail.
Slack-native runbooks for DB access, job termination, feature toggles, and on-call — with live RBAC and Temporal-backed execution.
Open demoYAML-defined agents compiled into Temporal workers — engineers ask in plain English, LLM generates safe queries with capability-level RBAC.
Open demoChange requests flow through JIRA and CAB approval before any production mutation — approvals gate Temporal workflows and deployment pipelines.
Open demoMatches each on-call alert to a runbook, validates live context, executes remediation through SSP, and only pages a human when validation fails.
Open demoA Slack bot that lets engineers and support staff safely run production operations on the CloudStack platform — provision database users, kill stuck jobs, toggle features, and triage incidents — all from Slack, with role-based access and a full audit trail. No direct production access required.
Run a quick action to see a simulated workflow card with validation steps, progress, and audit confirmation.
| Workflow ID | Type | Status | Duration | Operator | Time |
|---|---|---|---|---|---|
| wf-8812 | Take Thread Dump | Complete | 3.4s | Priya Nair | 4 hrs ago |
| wf-8794 | Release DB Lock | Complete | 1.8s | Sam Chen | 1 day ago |
An AI-powered "senior SRE" that lives in Slack. Talk to it in plain English about any infrastructure or incident problem and it investigates, reasons step-by-step, runs the right operations, and remembers everything — available 24/7. Powered by GPT-5.
Welcome to the Cosmic AI diagnostics terminal. I can check pod performance, query status, run safe operations, and recall past incident notes. What should we investigate?
The governed automation engine behind the scenes. Every operational action is validated, gets a JIRA change ticket, waits for the right approvals (Change Advisory Board), then executes against production and notifies everyone — fully audited, every time.
Governed Feature Toggle execution triggered via SSP. Verification required.
The requested workflow has been approved and executed successfully. Output files have been compiled and encrypted using standard Fernet symmetric keys. Access credentials will expire in 6 hours.
gAAAAABmX_k9R...
To extract the contents locally on a terminal with python installed, run the following decryption parameters:
python decrypt.py logs.enc <fernet_key>
Ops-Sage watches incoming on-call alerts and acts on them automatically. For each alert type you configure a validation checklist and an approved remediation action. When an alert fires, Ops-Sage validates the signal against live context, runs the action through the governed SSP engine if checks pass, and only escalates to a human when validation fails — removing pager fatigue for known failure patterns.
Simulated alert queue. Click Process to watch Ops-Sage validate an alert and execute the configured action without paging on-call.
| Alert | Severity | Pod | Runbook | Status | |
|---|---|---|---|---|---|
| JVM heap utilization above 90% | High | cloudstack-prod-use2 | heap-scale-v2 | Waiting | |
| API latency p99 > 2500ms on profiling endpoints | High | cloudstack-prod-use2 | conn-pool-scale | Waiting | |
| Disk usage > 85% on worker node pool | Medium | cloudstack-prod-apse1 | log-rotate-sweep | Waiting |
Recent automated remediations executed by Ops-Sage (simulated audit trail).
| Time | Alert | Action | Pod | Result | On-call paged? |
|---|---|---|---|---|---|
| 06:42 UTC | Stale deployment lock on org org-acme-8842 | release-lock | cloudstack-stg-pod1 | Complete | No |
| 05:18 UTC | Certificate expiry warning (< 14 days) | ssl-renotify | cloudstack-prod-use2 | Complete | No |
Each alert type maps to validation checks and an approved action. Ops-Sage only escalates when a check fails.
Validate: confirm single pod spike, no active deploy, heap trend > 5 min.
Action: apply JVM heap multiplier via SSP workflow.
Validate: DB pool at capacity, no Sev-1 open.
Action: scale connection pool ConfigMap + rolling restart.
Validate: log volume growth, not data disk.
Action: trigger log rotation job on node pool.
A Slack-native production operations self-service automating operations with high reliability.
A manifest-driven AI agent engine automating multi-tenant cloud infrastructure management via chat interfaces.
Chennai Institute of Technology
Graduated: Jan 2020IIPE Laxmi Raman Higher Secondary School
Completed: Jan 2016Engaged in local community empowerment, educational drives, and youth guidance programs.
Contributed updates and enhancements to JsonQ, a query-like library for JSON data structures in Python/Golang.