Executive Summary

Member of Technical Staff at Salesforce with 5.5+ years building production ops platforms. I architected NextOps (Slack ChatOps), Cosmic AI (manifest-driven LLM agents on Temporal), SSP (governed change execution), and Ops-Sage (alert auto-remediation) — replacing ticket queues with safe, auditable self-service. Deep in Kubernetes, Terraform, Temporal, and FinOps-aware platform design.

Shipped at Salesforce & Informatica

Production platforms I architected and built

These are not concept slides — they are interactive replicas of the systems I delivered in production. Explore how I combine Slack-native ChatOps, Temporal orchestration, LLM agents, and governed GitOps to replace ticket queues with safe, auditable self-service.

  • NextOps Self-service prod ops from Slack — RBAC, runbooks, on-call automation
  • Cosmic AI YAML-defined agents that run infra actions via natural language
  • SSP JIRA + CAB approvals wired into GitOps change execution
  • Ops-Sage Config-driven alert validation and auto-remediation

What I bring to your team: faster incident resolution, 50% faster emergency releases, FinOps-aware infra design, and platforms engineers actually adopt — because they live in Slack and Teams, not another portal.

Live demos are built for desktop — open this site on a laptop or wider screen to explore the interactive playgrounds.

My Production Ops Portfolio v2.4-stable
k8s-active temporal-connected gpt5-linked
Interactive replicas of production systems I built — no live cloud infrastructure is connected.
Live portfolio demos
4 production systems Enterprise scale

CloudStack Ops Automation Suite

Four products I shipped — NextOps and Cosmic AI for intake, SSP for governed execution, Ops-Sage for automated alert response — on Temporal, RBAC, and a complete audit trail.

02 · ChatOps Self-service in seconds

NextOps

Slack-native runbooks for DB access, job termination, feature toggles, and on-call — with live RBAC and Temporal-backed execution.

TemporalChatOpsRBAC
Open demo
02 · ChatOps Natural-language infra ops

Cosmic AI

YAML-defined agents compiled into Temporal workers — engineers ask in plain English, LLM generates safe queries with capability-level RBAC.

Azure OpenAIKubernetesGitOps
Open demo
03 · Govern JIRA + CAB wired to GitOps

SSP

Change requests flow through JIRA and CAB approval before any production mutation — approvals gate Temporal workflows and deployment pipelines.

JIRAGitOpsCAB
Open demo
05 · Respond On-call relief in seconds

Ops-Sage

Matches each on-call alert to a runbook, validates live context, executes remediation through SSP, and only pages a human when validation fails.

AlertingRunbooksSSP
Open demo
Shared foundation
Slack-native Temporal MongoDB audits Azure OpenAI JIRA RBAC Full audit trail

NextOps Bot

Secure Self-Service

What it is / Why it matters

A Slack bot that lets engineers and support staff safely run production operations on the CloudStack platform — provision database users, kill stuck jobs, toggle features, and triage incidents — all from Slack, with role-based access and a full audit trail. No direct production access required.

BUSINESS IMPACT
Self-service in seconds, not tickets
Default-deny RBAC checks
Every action fully audited
Zero direct prod terminal access
NextOps Console
NextOps Automation Center
Good evening, Alex Rivera · Admin ·
Role: Operator Admin · Workspace: demo-ops
Active Users 16 Users
Security Groups 3 Groups
Registered Services 24 Services
Temp Access Grants 2 Active
DB operational
Temporal connected
3 pending
On call: US-West · 4 engineers
Quick Actions
Live execution feed

Run a quick action to see a simulated workflow card with validation steps, progress, and audit confirmation.

Provisioning & Diagnostics Catalog
Audit History Log
Workflow ID Type Status Duration Operator Time
wf-8812 Take Thread Dump Complete 3.4s Priya Nair 4 hrs ago
wf-8794 Release DB Lock Complete 1.8s Sam Chen 1 day ago

Cosmic AI

Autonomous agent

What it is / Why it matters

An AI-powered "senior SRE" that lives in Slack. Talk to it in plain English about any infrastructure or incident problem and it investigates, reasons step-by-step, runs the right operations, and remembers everything — available 24/7. Powered by GPT-5.

BUSINESS IMPACT
Replaces 24/7 manual toil
Investigates diagnoses in parallel
Learns from every incident history
Mandatory human approvals for write operations
SRE Copilot Hub
Cosmic AI Diagnostics Copilot
Status: Connected | Agent: Idle
Cosmic AI Just now

Welcome to the Cosmic AI diagnostics terminal. I can check pod performance, query status, run safe operations, and recall past incident notes. What should we investigate?

Cosmic AI Diagnostics Parameters
AI Queries Sent 1,248
Active Watchlists 4 Pods
Granted Scope 18 SRE Agents
Available Console CLI Commands
/sreagent help List available agents, parameters, and command controls.
/sreagent memory recall <query> Search past incident conversations and resolution details.
/sreagent watch <pod> Configure monitoring alerts and notify channels on latency.

SSP

Governed Self-Service

What it is / Why it matters

The governed automation engine behind the scenes. Every operational action is validated, gets a JIRA change ticket, waits for the right approvals (Change Advisory Board), then executes against production and notifies everyone — fully audited, every time.

BUSINESS IMPACT
No direct production logins
Mandatory Change Advisory Board (CAB) gate
Temporary, auto-expiring user parameters
Complete Slack + JIRA + Email audit ledger
SSP Governance Stepper
1
Submit & Validate Pending form inputs...
2
Change request ticket Not started
3
CAB Notification Not started
4
Apply Change Not started
5
Deliverable & Notify Not started
SSP Governed Execution request
Projects / CloudStack Change Advisory

CHG-447561

Summary [Cloud-SSP] Feature Toggle | Pod cloudstack-prod-use2 | Org org-acme-8842
Description

Governed Feature Toggle execution triggered via SSP. Verification required.

Labels
cloud-govportal cloud-change-governance
Status READY FOR CAB APPROVAL
Assignee CloudStack Prod Ops
Reporter Alex Rivera
Feature Toggle Request AWAITING APPROVAL

Ops-Sage

On-Call Alert Automation

What it is / Why it matters

Ops-Sage watches incoming on-call alerts and acts on them automatically. For each alert type you configure a validation checklist and an approved remediation action. When an alert fires, Ops-Sage validates the signal against live context, runs the action through the governed SSP engine if checks pass, and only escalates to a human when validation fails — removing pager fatigue for known failure patterns.

BUSINESS IMPACT
Automated response in seconds, not minutes
Reduces on-call toil for repeat alerts
Validation gate before any production change
Every auto-action fully audited

Simulated alert queue. Click Process to watch Ops-Sage validate an alert and execute the configured action without paging on-call.

Alert Severity Pod Runbook Status
JVM heap utilization above 90% High cloudstack-prod-use2 heap-scale-v2 Waiting
API latency p99 > 2500ms on profiling endpoints High cloudstack-prod-use2 conn-pool-scale Waiting
Disk usage > 85% on worker node pool Medium cloudstack-prod-apse1 log-rotate-sweep Waiting

Recent automated remediations executed by Ops-Sage (simulated audit trail).

Time Alert Action Pod Result On-call paged?
06:42 UTC Stale deployment lock on org org-acme-8842 release-lock cloudstack-stg-pod1 Complete No
05:18 UTC Certificate expiry warning (< 14 days) ssl-renotify cloudstack-prod-use2 Complete No

Each alert type maps to validation checks and an approved action. Ops-Sage only escalates when a check fails.

heap-scale-v2 JVM heap > 90%

Validate: confirm single pod spike, no active deploy, heap trend > 5 min.
Action: apply JVM heap multiplier via SSP workflow.

conn-pool-scale Latency p99 high

Validate: DB pool at capacity, no Sev-1 open.
Action: scale connection pool ConfigMap + rolling restart.

log-rotate-sweep Disk > 85%

Validate: log volume growth, not data disk.
Action: trigger log rotation job on node pool.

Engineering Repositories

NextOps ChatOps Platform

Salesforce

A Slack-native production operations self-service automating operations with high reliability.

  • Built event-driven workflows powered by Temporal Orchestration for reliable long-running operations.
  • Engineered a live Slack-administered RBAC engine enabling secure access directly in conversation.
  • Integrated on-call alert automation (Ops-Sage) and AI-assisted diagnostics to minimize MTTR.
Temporal ChatOps AI/LLM Slack Python

Cosmic AI Agent Platform

Salesforce

A manifest-driven AI agent engine automating multi-tenant cloud infrastructure management via chat interfaces.

  • Auto-compiles YAML specification files into active, task-running Temporal workflow workers.
  • Generates LLM queries contextually with fine-grained capability-level RBAC for cloud safety.
  • Supports multi-tenant architectures, translating natural language into actionable Infrastructure actions.
LLMs Azure OpenAI Temporal Kubernetes GitOps

Career Timeline

Member of Technical Staff

Salesforce

Mar 2026 - Present Bangalore, IN
  • Architected and built NextOps, a Slack-native ChatOps/self-service platform for automating production operations, featuring Temporal-backed workflows, a live Slack-administered RBAC engine, on-call automation, and AI-driven Root Cause Analysis.
  • Designed and implemented Cosmic AI, a manifest-driven AI agent platform that manages multi-tenant cloud infrastructure via natural-language Slack/Teams chat, leveraging YAML specs for auto-compilation into Temporal workers with LLM query generation and capability-level RBAC.

Senior DevOps Engineer

Informatica

Apr 2024 - Mar 2026 Bengaluru, IN
  • Designed and developed a scalable AI Agent Orchestration Platform, incorporating dynamic task chaining, parallel execution, intelligent data flow, and a central agent knowledge layer for improved automation accuracy and accelerated multi-step workflows.
  • Built Ops-Sage, an on-call alert automation system that validates incoming alerts against configured runbooks and executes remediations through SSP — reducing pager load for repeat failure patterns.
  • Developed multiple Microsoft Teams bots to support team operations and streamline incident management processes.
  • Created a self-service portal using Python and Temporal, enabling engineering teams to independently run scripts and perform operational tasks, reducing reliance on DevOps/Platform support.
  • Automated Emergency Bug Fix (EBF) deployments, optimizing the release process and achieving a 50% increase in deployment speed by minimizing manual intervention.
  • Partnered with customers to diagnose and resolve critical production issues, ensuring high satisfaction and reliable service delivery.
  • Contributed to Disaster Recovery (DR) drills, driving process enhancements and reducing recovery times annually.
  • Collaborated with the FinOps team to monitor and optimize infrastructure costs, implementing cost-efficient architectures that significantly lowered cloud expenses.

DevOps Engineer

Informatica

Sep 2022 - Apr 2024 Bengaluru, IN
  • Led the onboarding of a multi-cloud data platform service across AWS, Azure, and GCP, optimizing data management workflows.
  • Collaborated with cross-functional teams to implement GitOps methodologies, enhancing version control and deployment accuracy.
  • Utilized Kubernetes to orchestrate containerized applications, improving scalability and resource utilization.
  • Engineered infrastructure as code (IaC) using Terraform to ensure consistent and repeatable environment provisioning.
  • Employed Chef for configuration management, automating software deployments and system configurations.
  • Developed Python and Bash scripts to automate routine tasks, reducing manual effort by 80%.
  • Played a key role in cloud migrations, ensuring minimal downtime and seamless data transition.
  • Swiftly addressed production incidents, performing root cause analysis and implementing corrective actions.
  • Contributed to the development of internal tools, streamlining team processes and enhancing productivity.

Infrastructure Consultant

Thoughtworks

Dec 2021 - Aug 2022 Bangalore, IN
  • Implemented automation solutions using Python to streamline infrastructure management tasks.
  • Managed application deployments within Kubernetes environments, ensuring smooth operations and scalability.
  • Designed and implemented robust CI/CD pipelines to accelerate software delivery and improve release reliability.
  • Conducted security vulnerability checks across infrastructure to maintain high security standards.
  • Performed cost-cutting research for AWS cloud resources, identifying and implementing strategies to optimize expenditure.

Assistant System Engineer

Tata Consultancy Services

Aug 2020 - Nov 2021 Chennai, IN
  • Integrated CI/CD pipelines to automate build, test, and deployment processes for microservices, enhancing workflow efficiency.
  • Configured and managed Kubernetes clusters for container orchestration and application deployment, ensuring reliable application performance.
  • Automated preparation and configuration of testing platforms for IBM MQ software across cloud providers (AWS, Azure, IBM Cloud, GCP) and on-premise servers using Ansible, streamlining deployment processes.
  • Built microservices into Docker images, facilitating containerization and deployment.

Education

BE / Mechanical Engineering

Chennai Institute of Technology

Graduated: Jan 2020

HSLC / Bio-Math

IIPE Laxmi Raman Higher Secondary School

Completed: Jan 2016

Credentials

  • Microsoft: Azure Fundamentals
  • IBM: Containers & Kubernetes Essentials
  • IBM: MQ Developer Essentials
  • Dassault: SolidWorks Associate

Activities

Volunteer / APJ Youth Club

Engaged in local community empowerment, educational drives, and youth guidance programs.

Contributor / JsonQ

Contributed updates and enhancements to JsonQ, a query-like library for JSON data structures in Python/Golang.