dwarak | sre engineer

T-Mobile

Jan 2019 – present · 6+ years

Atlanta, GA

Senior Site Reliability Engineer current

Mar 2026 – present

Embedded with Akamai delivery team to modernize a fully manual, ClickOps-driven operation — introducing GitLab pipelines, Terraform, and Akamai PAPI as the foundation for repeatable, auditable infrastructure.
Designed and implemented end-to-end GitLab pipelines for deploying and managing 200+ Akamai properties and cloudlets — replacing ad-hoc manual changes with versioned, reviewable, pipeline-driven delivery.
Built pipeline-native rollback capability for every property and cloudlet deployment — eliminating the risk of manual reversal under pressure and giving engineers a safe, one-click recovery path.
Architected reusable, modular pipeline and Terraform components to standardize every new property from day one — reducing onboarding friction and config drift across the estate.
Drove engineer adoption through hands-on training and design walkthroughs — securing buy-in from managers and principal engineers on GitLab and IaC-based workflows.
Integrated Claude (Anthropic API / MCP) directly into SRE workflows — accelerating pipeline design, documentation, and automation tasks, bringing AI-augmented practices into the team's day-to-day delivery.

Site Reliability Engineer → Senior SRE ↑ promoted

Mar 2024 – Mar 2026

Designed the enterprise OpenTelemetry integration pattern for 50+ customer care applications across Java/Spring Boot and Node.js stacks — publishing documentation and guiding DevOps engineers through adoption across varying deployment strategies.
Achieved RED metrics (Rate, Errors, Duration) out of the box post-integration — delivering instant, query-free observability into service health across the entire customer care application estate.
Modernized certificate lifecycle management by migrating from legacy tooling to Keyfactor and cert-manager — enabling automated, low-friction cert rotation and eliminating manual renewal risk across the estate.
Designed and executed custom chaos engineering experiments targeting customer care applications across multiple data centers — proactively validating resilience posture and surfacing failure modes before they became production incidents.

Platform Engineer

May 2022 – Mar 2024

Led the end-to-end migration of 50+ customer care applications from PCF to Kubernetes (APIs) and AWS S3/CloudFront/WAF (SPAs) — designed the migration pattern, pipelines, runbooks, and documentation, then directed an offshore team to replicate the pattern at scale.
Engineered the load balancer configuration serving as the single entry point for 100+ customer-facing applications — designing routing rules, health checks, and failover policies across multiple data centers, reducing per-environment YAML from 5,000 to 500 lines using Jinja templating.
Designed and implemented a Vault secrets migration for 50–100 applications — architected the secrets structure, automated the migration process, and presented design documentation to principal engineers and architects for sign-off.
Migrated application configuration management from Bitbucket to Vault (sensitive) and GitLab (non-sensitive) — establishing a clean, auditable separation of config and secrets across the org.
Developed reusable CI/CD pipeline templates, practices, and runbooks enabling consistent, repeatable deployments across a large application estate — serving as the technical authority for a distributed offshore delivery team.

Cloud DevOps Engineer

Oct 2019 – May 2022

Supported deployment operations for 50+ applications on PCF — triaging CI/CD failures, resolving pipeline issues, and ensuring reliable delivery for development teams.
Contributed to the GitLab migration from Jenkins — porting pipelines and validating delivery workflows as part of a broader org-wide toolchain modernization.
Implemented near-far resiliency and HAProxy configurations — managing routing rules and request patching to maintain availability across data centers.
Delivered feature development on a critical Groovy/Grails monolith — including changes to the device purchase flow shipped directly to production at national carrier scale.

AWS DevOps Engineer

Jan 2019 – Oct 2019 · Seattle

Worked as part of a lean two-person team to design and stand up Ansible Tower on AWS as a company-wide automation platform for T-Mobile — collaborating directly with a senior architect on the full infrastructure build.
Provisioned enterprise AWS infrastructure spanning EC2, API Gateway, Lambda, and KMS — building hands-on depth across core AWS services while delivering the end-to-end Ansible Tower setup.
Authored Ansible playbooks and configured Tower workflows for org-wide automation consumption — building foundational infrastructure-as-code proficiency that carried forward through every subsequent role.

Raymond James Financial

Jan 2018 – Dec 2018

St. Petersburg, FL

DevOps Engineer

Built and maintained CI/CD pipelines and deployment infrastructure in a regulated financial services environment — gaining foundational DevOps experience across the full delivery lifecycle.
Managed infrastructure operations and supported development teams with deployments — establishing core practices in pipeline management and environment stability.

GoFrugal

Jun 2015 – Dec 2015

Chennai, India

Software Developer

Developed features for a SaaS ERP platform serving retail and distribution businesses — building early software engineering foundations across the full development lifecycle.

Experience