Hi, I’m Nirav!

I’m a fourth-year Ph.D. student in the Computer Science Department (CSD) at Carnegie Mellon University (CMU), where I’m fortunate to be advised by Prof. Justine Sherry. My research area is Networking, and I’m part of the Systems, Networking, and Performance Lab (SNAP) at CMU; I’m secondarily affiliated with CONIX and CMU’s Parallel Data Lab (PDL). My research interests broadly lie at the intersection of systems and performance modeling (i.e. building high-performance networked systems and proving theoretical properties about their performance and stability).

Prior to starting graduate school, I completed my undergraduate degree (B.A.Sc.) in Computer Engineering at the University of Toronto, Canada, in 2018. Despite my apparent penchant for cold, capricious weather, I spent most of life in Mumbai, India. In my spare time, I enjoy reading (fiction, philosophy), imbibing copious amounts of espresso, and playing chess, golf, and soccer.


Kernel-bypass network APIs, which allow applications to circumvent the kernel and interface directly with the NIC hardware, have recently emerged as one of the main tools for improving application network performance. However, allowing applications to circumvent the kernel makes it impossible to use tools (e.g., tcpdump) or impose policies (e.g., QoS and filters) that need to consider traffic sent by different applications running on a host. This makes maintainability and manageability a challenge for kernel-bypass applications. In response we propose Kernel On-Path Interposition (KOPI), in which traditional kernel dataplane functionality is retained but implemented in a fully programmable SmartNIC. We hypothesize that KOPI can support the same tools and policies as the kernel stack while retaining the performance benefits of kernel bypass.
  author = {Sadok, Hugo and Zhao, Zhipeng and Choung, Valerie and Atre, Nirav and Berger, Daniel S. and Hoe, James C. and Panda, Aurojit and Sherry, Justine},
  title = {We Need Kernel Interposition over the Network Dataplane},
  year = {2021},
  isbn = {},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi.org/10.1145/3458336.3465281},
  doi = {10.1145/3458336.3465281},
  booktitle = {Proceedings of the Workshop on Hot Topics in Operating Systems},
  pages = {152--158},
  month = may,
  series = {HotOS '21}
Intrusion Detection and Prevention Systems (IDS/IPS) are among the most demanding stateful network functions. Today's network operators are faced with securing 100Gbps networks with 100K+ concurrent connections by deploying IDS/IPSes to search for 10K+ rules concurrently. In this paper we set an ambitious goal: Can we do all of the above in a single server? Through the Pigasus IDS/IPS, we show that this goal is achievable, perhaps for the first time, by building on recent advances in FPGA-capable SmartNICs. Pigasus' design takes an FPGA-first approach, where the majority of processing, and all state and control flow are managed on the FPGA. However, doing so requires careful design of algorithms and data structures to ensure fast common-case performance while densely utilizing system memory resources. Our experiments with a variety of traces show that Pigasus can support 100Gbps using an average of 5 cores and 1 FPGA, using 38x less power than a CPU-only approach.
  author = {Zhao, Zhipeng and Sadok, Hugo and Atre, Nirav and Hoe, James and Sekar, Vyas and Sherry, Justine},
  title = {Achieving 100Gbps Intrusion Prevention on a Single Server},
  booktitle = {14th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 20)},
  year = {2020},
  address = {Banff, Alberta},
  url = {https://www.usenix.org/conference/osdi20/presentation/zhao-zhipeng},
  publisher = {USENIX Association},
  month = nov,
Caches are at the heart of latency-sensitive systems. In this paper, we identify a growing challenge for the design of latency-minimizing caches called delayed hits. Delayed hits occur at high throughput, when multiple requests to the same object queue up before an outstanding cache miss is resolved. This effect increases latencies beyond the predictions of traditional caching models and simulations; in fact, caching algorithms are designed as if delayed hits simply didn’t exist. We show that traditional caching strategies – even so called ‘optimal’ algorithms – can fail to minimize latency in the presence of delayed hits. We design a new, latency-optimal offline caching algorithm called BELATEDLY which reduces average latencies by up to 45% compared to the traditional, hit-rate optimal Belady’s algorithm. Using BELATEDLY as our guide, we show that incorporating an object’s ‘aggregate delay’ into online caching heuristics can improve latencies for practical caching systems by up to 40%. We implement a prototype, Minimum-AggregateDelay (MAD), within a CDN caching node. Using a CDN production trace and backends deployed in different geographic locations, we show that MAD can reduce latencies by 12-18% depending on the backend RTTs.
  author = {Atre, Nirav and Sherry, Justine and Wang, Weina and Berger, Daniel S.},
  title = {Caching with Delayed Hits},
  year = {2020},
  isbn = {9781450379557},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi.org/10.1145/3387514.3405883},
  doi = {10.1145/3387514.3405883},
  booktitle = {Proceedings of the 2020 Conference of the ACM Special Interest Group on Data Communication (SIGCOMM)},
  pages = {495–513},
  numpages = {19},
  keywords = {Caching, Belatedly, Delayed hits},
  location = {Virtual Event, USA},
  series = {SIGCOMM ’20}

Research Talks

Caching with Delayed Hits

Industry Experience

Summer, 2020 (4 months)
Summer, 2018 (4 months)
Sep, 2016 - Aug, 2017 (1 year)
Summer, 2016 (4 months)