Hi, I’m Nirav!

I’m a fourth-year Ph.D. student in the Computer Science Department (CSD) at Carnegie Mellon University (CMU), where I’m fortunate to be advised by Prof. Justine Sherry. My research area is Networking, and I’m part of the Systems, Networking, and Performance Lab (SNAP) at CMU; I’m secondarily affiliated with CONIX and CMU’s Parallel Data Lab (PDL). My research interests broadly lie at the intersection of systems and performance modeling (i.e. building high-performance networked systems and proving theoretical properties about their performance and stability).

Prior to starting graduate school, I completed my undergraduate degree (B.A.Sc.) in Computer Engineering at the University of Toronto, Canada, in 2018. Despite my apparent penchant for cold, capricious weather, I spent most of life in Mumbai, India. In my spare time, I enjoy reading (fiction, philosophy), imbibing copious amounts of espresso, and playing chess, golf, and soccer.


Publications

Conference Papers

Abstract
Algorithmic complexity attacks (ACAs) are a class of denial-of-service (DoS) attacks where an attacker uses a small amount of adversarial traffic to induce a large amount of work in the target system, pushing the system into overload and causing it to drop packets from innocent users. ACAs are particularly dangerous because, unlike volumetric DoS attacks, ACAs don't require a significant network bandwidth investment from the attacker. Today, network functions (NFs) on the Internet must be painstakingly designed and engineered on a case-by-case basis to mitigate the debilitating impact of ACAs. Further, the resulting designs tend to be overly conservative in their attack mitigation strategy, limiting the innocent traffic that the NF can serve under common-case operation.

In this work, we propose a general framework to make any NF more resilient to ACAs without the limitations of prior approaches. Our framework, SurgeProtector, uses the NF's scheduler to mitigate the impact of ACAs using a very traditional scheduling algorithm: weighted-shortest-job-first (WSJF). To evaluate SurgeProtector, we propose a new metric of vulnerability called the Displacement Factor (DF), which quantifies the 'harm per unit effort' which an adversary can inflict on the system. We provide novel, adversarial analysis of WSJF and show that any system using this policy has a worst-case DF of only a small constant, where traditional schedulers place no upper bound on the DF. Illustrating that SurgeProtector is not only theoretically, but practically robust, we integrate SurgeProtector into an open source intrusion detection system (IDS). Under simulated attack, the SurgeProtector-augmented IDS suffers 90-99% lower innocent traffic loss than the original system.
BibTeX
@inproceedings{atre-sigcomm2022,
  author = {Atre, Nirav and Sadok, Hugo and Chiang, Erica and Wang, Weina and Sherry, Justine},
  title = {SurgeProtector: Mitigating Temporal Algorithmic Complexity Attacks using Adversarial Scheduling},
  year = {2022},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  booktitle = {Proceedings of the 2022 Conference of the ACM Special Interest Group on Data Communication (SIGCOMM)}
}
Abstract
Kernel-bypass network APIs, which allow applications to circumvent the kernel and interface directly with the NIC hardware, have recently emerged as one of the main tools for improving application network performance. However, allowing applications to circumvent the kernel makes it impossible to use tools (e.g., tcpdump) or impose policies (e.g., QoS and filters) that need to consider traffic sent by different applications running on a host. This makes maintainability and manageability a challenge for kernel-bypass applications. In response we propose Kernel On-Path Interposition (KOPI), in which traditional kernel dataplane functionality is retained but implemented in a fully programmable SmartNIC. We hypothesize that KOPI can support the same tools and policies as the kernel stack while retaining the performance benefits of kernel bypass.
BibTeX
@inproceedings{sadok-hotos21,
  author = {Sadok, Hugo and Zhao, Zhipeng and Choung, Valerie and Atre, Nirav and Berger, Daniel S. and Hoe, James C. and Panda, Aurojit and Sherry, Justine},
  title = {We Need Kernel Interposition over the Network Dataplane},
  year = {2021},
  isbn = {},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi.org/10.1145/3458336.3465281},
  doi = {10.1145/3458336.3465281},
  booktitle = {Proceedings of the Workshop on Hot Topics in Operating Systems},
  pages = {152--158},
  month = may,
  series = {HotOS '21}
}
Abstract
Intrusion Detection and Prevention Systems (IDS/IPS) are among the most demanding stateful network functions. Today's network operators are faced with securing 100Gbps networks with 100K+ concurrent connections by deploying IDS/IPSes to search for 10K+ rules concurrently. In this paper we set an ambitious goal: Can we do all of the above in a single server? Through the Pigasus IDS/IPS, we show that this goal is achievable, perhaps for the first time, by building on recent advances in FPGA-capable SmartNICs. Pigasus' design takes an FPGA-first approach, where the majority of processing, and all state and control flow are managed on the FPGA. However, doing so requires careful design of algorithms and data structures to ensure fast common-case performance while densely utilizing system memory resources. Our experiments with a variety of traces show that Pigasus can support 100Gbps using an average of 5 cores and 1 FPGA, using 38x less power than a CPU-only approach.
BibTeX
@inproceedings{zhao-osdi20,
  author = {Zhao, Zhipeng and Sadok, Hugo and Atre, Nirav and Hoe, James and Sekar, Vyas and Sherry, Justine},
  title = {Achieving 100Gbps Intrusion Prevention on a Single Server},
  booktitle = {14th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 20)},
  year = {2020},
  address = {Banff, Alberta},
  url = {https://www.usenix.org/conference/osdi20/presentation/zhao-zhipeng},
  publisher = {USENIX Association},
  month = nov,
}
Abstract
Caches are at the heart of latency-sensitive systems. In this paper, we identify a growing challenge for the design of latency-minimizing caches called delayed hits. Delayed hits occur at high throughput, when multiple requests to the same object queue up before an outstanding cache miss is resolved. This effect increases latencies beyond the predictions of traditional caching models and simulations; in fact, caching algorithms are designed as if delayed hits simply didn’t exist. We show that traditional caching strategies – even so called ‘optimal’ algorithms – can fail to minimize latency in the presence of delayed hits. We design a new, latency-optimal offline caching algorithm called BELATEDLY which reduces average latencies by up to 45% compared to the traditional, hit-rate optimal Belady’s algorithm. Using BELATEDLY as our guide, we show that incorporating an object’s ‘aggregate delay’ into online caching heuristics can improve latencies for practical caching systems by up to 40%. We implement a prototype, Minimum-AggregateDelay (MAD), within a CDN caching node. Using a CDN production trace and backends deployed in different geographic locations, we show that MAD can reduce latencies by 12-18% depending on the backend RTTs.
BibTeX
@inproceedings{atre-sigcomm20,
  author = {Atre, Nirav and Sherry, Justine and Wang, Weina and Berger, Daniel S.},
  title = {Caching with Delayed Hits},
  year = {2020},
  isbn = {9781450379557},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi.org/10.1145/3387514.3405883},
  doi = {10.1145/3387514.3405883},
  booktitle = {Proceedings of the 2020 Conference of the ACM Special Interest Group on Data Communication (SIGCOMM)},
  pages = {495–513},
  numpages = {19},
  keywords = {Caching, Belatedly, Delayed hits},
  location = {Virtual Event, USA},
  series = {SIGCOMM ’20}
}

Selected Posters

Abstract
Packet scheduling algorithms control the order in which a system serves network packets, which can have significant impact on system performance. Recent work shows that Weighted Shortest Job First (WSJF) -- scheduling packets by the ratio of job size to packet size -- significantly mitigates a system’s susceptibility to algorithmic complexity attacks (ACAs), a particularly dangerous class of Denial-of-Service (DoS) attacks. WSJF relies on knowledge of a packet’s job size, information that is not available a priori. In this work, we explore the theoretical implications of using heuristics for job size estimation. Further, we consider preemption as another technique that may help protect systems when job sizes are unknown. We find that heuristics with certain properties (estimated job-size-to-packet-size ratios increasing monotonically with the actual ratios, step function-based categorization) can protect systems against ACAs with varying guarantees, while preemption alone does not.
BibTeX
@inproceedings{chiang-sigcomm22-poster,
  title={Robust Heuristics: Attacks and Defenses on Job Size Estimation for WSJF Systems},
  author={Chiang, Erica and Atre, Nirav and Sadok, Hugo and Wang, Weina and Sherry, Justine},
  booktitle={Proceedings of the 2022 Conference of the ACM Special Interest Group on Data Communication (SIGCOMM) (Poster Session)},
  year={2022}
}
Abstract
Traditional caching models emphasize hit rate as the principal measure of performance for cache replacement algorithms. However, hit rate alone can be misleading in the presence of a phenomenon known as a delayed hit. Delayed hits occur in high-throughput systems when multiple requests for an object accumulate before the object can be fetched from the backing store. Prior work by Atre et al. has explored the impact of delayed hits in simple caching scenarios, namely single-tier caches with uniform object sizes. In this work we seek to extend that investigation to consider multi-level caches, such as those that might be found in a modern CDN. Furthermore, we extend MAD, the delayed-hits-aware policy proposed by Atre et al., so that it can be deployed in a multi-tier caching system. We evaluate the performance of MAD using a multi-tier cache simulator and an empirical cache configuration based on modern CDNs. Our initial results lead us to believe that delayed hits can still be a prominent factor in the performance of multi-level caches, although their effect may be reduced in comparison to simpler cache configurations.
BibTeX
@inproceedings{carleton-sosp2021-poster,
  title={Delayed Hits in Multi-Level Caches},
  author={Carleton, Benjamin},
  booktitle={Symposium on Operating Systems Principles (Poster Session)},
  year={2021}
}

Research Talks

SurgeProtector: Mitigating Temporal Algorithmic Complexity Attacks (ACAs) using Adversarial Scheduling
Caching with Delayed Hits

Industry Experience

Summer, 2020 (4 months)
Summer, 2018 (4 months)
Sep, 2016 - Aug, 2017 (1 year)
Summer, 2016 (4 months)