Engineered a real-time batch rule evaluation system processing over 22,000 queries per second (QPS) with 99.98% uptime and a P99 latency of 100ms, effectively preventing fraudulent transactions and saving millions of rupees monthly.
Designed and implemented asynchronous orchestration for periodic rule evaluations using Spark, Airflow, and Kafka event streams, optimizing accuracy and performance by reducing live service load bursts by 17%.
Enabled critical business use cases, including trade validations, compliance checks, business validation, and Anti-Money Laundering (AML) fraud monitoring.
Developed a tooling platform with an automation framework for schema generation and migration, resulting in a 90% reduction in build time and a 70% increase in onboarding speed.
Engineered a scalable Experimentation Suite for rapid iteration of rules and ML models, incorporating real-time confusion matrix, controlled rollouts, and A/B testing, which reduced false positives by 32%, false negatives by 5%, and runtime errors by 37%.
Architected an AI-powered interview feedback tool utilizing Gen-AI for transcription and metric-based discussion tagging, reducing review feedback time by 80%.
Led a comprehensive revamp of failure handling mechanisms, unifying logging, metric ingestion, and error management, which decreased upstream errors by 7% and reduced data ingestion by 20% through improved code structure and design patterns.
Implemented robust failure handling, including fallback mechanisms for data operations, auto-rollback for system failures, and circuit breakers, ensuring high system stability and data integrity.
Standardized and consolidated critical workflows, including test rule archival, Kafka event schema generation, data backfill, production mirroring, and analytical pipelines for OLAP DBs (sync and async).
Spearheaded critical application upgrades, including migrating from Java 8 to 17, significantly improving maintainability by reducing JAR sizes through dependency cleanups and API deprecation.
Restructured and unified Data Access Object (DAO) and database interaction layers for Aerospike, establishing new team-wide architectural patterns.
Guided the design and implementation of rule failure alerting, integrating with monitoring tools to detect abnormal spikes and block rates, and re-modeled API layers for broader service adoption.
Led the design for multi-vendor investigation, transitioning from a reactive human-driven approach to an automated, scheduled system, optimizing cost and efficiency in data sourcing.
Designed a comprehensive testing suite with CSV upload support and real-time context enrichment capabilities via data mutators.
Developed a robust storage management strategy for efficient handling of result information and downstream triggers.
Provided technical leadership, leading disaster recovery (DR) drills, overseeing database migration activities, mentoring team members, and conducting interviews for platform engineering roles.