
I am an assistant professor at CMU in the Machine Learning and the Computer Science departments. I work in the areas of machine learning, game theory and crowdsourcing, with a focus on learning from people with objectives of fairness, accuracy, and robustness.
Tutorial on problems in peer review
Blog on various aspects of academia, research, and peer review
Google Scholar page
nihars [at] cs.cmu.edu
Office: GHC 8211
My research interests lie in the areas of statistics, machine learning, information theory and game theory, with a focus on "learning from people": How to elicit high-quality data from people? How to draw inferences from such data? I am particularly interested in dealing with issues of fairness, accuracy, and robustness. This is an exciting and challenging area of research that has many important applications including hiring, admissions, crowdsourcing, A/B testing, online ratings and recommendations, peer grading, and peer review. My research aims to address these important challenges at scale, in a principled and pragmatic manner.
I am presently particularly excited about developing principled approaches towards improving peer review. Peer review is a microcosm of this general problem of learning from people, and addressing it is particularly urgent to allow for the research community to thrive. I look to formulate and solve problems from the perspective of peer review, and then generalize the inherent ideas to the various other applications of learning from people. Here are examples of a few recent works in this area:
- Bias: There is considerable debate in many research communities about single vs. double blind reviewing, with a FAQ "Where is the evidence of bias in reviewing in my community?'' We design statistical tests for biases in peer review, that accommodates the various idiosyncrasies of the peer review process. (link). Previous tests are based on review scores, and then we also design algorithms to test for biases in review text (link).
- Miscalibration: Reviewers are often miscalibrated. For instance, some reviewer may be lenient and always provide scores greater than 5/10, while some other may be strict and never provide more than 5/10. If these biases are a priori unknown, how would can calibrate the reviewers (from say, just one review obtained per reviewer)? We design a novel randomized estimator that can handle arbitrary and even adversarial miscalibrations. This also leads to a surprising insight into the age-old debate between ratings and rankings (link).
- Reviewer assignment: Algorithms commonly employed today for assigning reviewers to papers can be unfair to certain papers (e.g., to interdisciplinary or novel papers). We design an algorithm to assign reviewers to papers which guarantees fairness of assignment to all papers. Simultaneously, the algorithm guarantees statistical accuracy of the review procedure. This algorithm was used in ICML 2020 and performed very well on common metrics of evaluation (link).
- Subjectivity: It is common to see a handful of reviewers reject a highly novel paper, because they view extensive experiments more important than novelty, whereas the community as a whole would have embraced the paper. We develop a novel method to mitigate such subjectivity by ensuring that every paper is judged by the same yardstick. We prove that surprisingly, this is the only method which meets three natural requirements. We also provide an empirical analysis on IJCAI 2017 (link).
- Strategic behavior: Conference peer review incentivizes reviewers to influence the final outcome for their own papers by manipulating the reviews they provide. We present a framework to ensure peer review systems that are insulated from strategic manipulations. We present positive results in terms of an algorithm and an analysis on ICLR 2017 data, as well as negative results which demonstrate the challenges in this problem (link). Additionally, since program chairs may want to first test if such strategic behavior exists at a large scale (before deploying any mitigation techniques), we also design a statistical test to check for large-scale strategic behavior (link).
- Malicious coalitions: By strategically manipulating their profile and bids, a reviewer can attain a pretty high chance of getting assigned a friend's paper they are targeting. Then if a paper gets all or most reviewers as the authors' friends, they can strongly vouch for the paper and get it accepted. (Moreover, cliques may operate across conferences: A accepts B's paper in one conference, and B accepts A's paper in another conference.) Multiple such instances have recently come to light. We design randomized assignment algorithms such that the program chairs can cap the probability of any reviewer being assigned to any paper. This mitigates the chances of a "friend" reviewer getting assigned to the paper. It provably guarantees that the assignment is optimal (in expectation) subject to these randomization constraints (link).
- Bidding: Many conferences ask reviewers to bid on papers they are interested to review. Such bidding is known to be highly skewed with a large number of papers getting zero or insufficient bids. We address this issue by exploiting primacy effects to design an algorithm that balances (i) amount of skew in the bids, and (ii) reviewer satisfaction. (link)
- Private data release: A major impediment towards research on peer review is the dearth of data. There is indeed a challenge in releasing any data publicly due to requirements on the anonymity of reviewers for each paper. We establish a framework for releasing certain kinds of peer-review data while preserving privacy of reviewer assignments. (link)
- Feedback: To evaluate the quality of peer reviews, one may be tempted to ask authors for their feedback. However, author feedback is often biased: positive if their paper was accepted and negative if not. We design algorithms to de-bias such feedback (link).
- Empirical analysis: In NeurIPS 2016 (link), ICML 2020 (link1, link2, link3); we also perform other empirical analyses in the aforementioned works.
RESEARCH THEMES:
Peer review (2018-). NSF CAREER Award, AAMAS 2019 Best Student Paper Award and Best Paper Nomination.
Crowdsourcing (2013-17). Awarded the David J. Sakrison Memorial Prize at UC Berkeley.
Distributed storage (2009-13). Awarded the IEEE Data Storage Best Paper and Best Student Paper Awards for the years 2011 & 12.
- Uncovering Latent Biases in Text: Method and Application to Peer ReviewEmaad Manzoor and Nihar B. Shah
AAAI 2021.
Code
- Mitigating Manipulation in Peer Review via Randomized Reviewer Assignments
Steven Jecmen, Hanrui Zhang, Ryan Liu, Nihar B. Shah, Vincent Conitzer, and Fei Fang
NeurIPS 2020.
Code
- Prior and Prejudice: The Novice Reviewers' Bias against Resubmissions in Conference Peer Review.
Ivan Stelmakh, Nihar B. Shah, Aarti Singh and Hal Daumé III
CSCW 2021.
- A Large Scale Randomized Controlled Trial on Herding in Peer-Review Discussions.
Ivan Stelmakh, Charvi Rastogi, Nihar B. Shah, Aarti Singh and Hal Daumé III
arxiv 2020.
- A Novice-Reviewer Experiment to Address Scarcity of Qualified Reviewers in Large Conferences.
Ivan Stelmakh, Nihar B. Shah, Aarti Singh and Hal Daumé III
AAAI 2021.
- Catch Me if I Can: Detecting Strategic Behaviour in Peer Assessment.
Ivan Stelmakh, Nihar B. Shah and Aarti Singh
AAAI 2021.
- Debiasing Evaluations that are Biased by Evaluations
Jingyan Wang, Ivan Stelmakh, Yuting Wei and Nihar B. Shah
AAAI 2021.
- On the Privacy-Utility Tradeoff in Peer-Review Data Analysis
Wenxin Ding, Nihar B. Shah, and Weina Wang
AAAI Privacy-Preserving Artificial Intelligence (PPAI-21) workshop 2021.
- On Testing for Biases in Peer Review
Ivan Stelmakh, Nihar B. Shah and Aarti Singh
NeurIPS 2019 (spotlight).
- A SUPER* Algorithm to Optimize Paper Bidding in Peer Review
Tanner Fiez, Nihar B. Shah and Lillian Ratliff
UAI 2020
Code
- PeerReview4All: Fair and Accurate Reviewer Assignment in Peer Review
Ivan Stelmakh, Nihar B. Shah and Aarti Singh
ALT 2019
Code for the PeerReview4All paper-reviewer assignment algorithm
Dataset
- Your 2 is My 1, Your 3 is My 9: Handling Arbitrary Miscalibrations in Ratings
Jingyan Wang and Nihar B. Shah
AAMAS 2019
My PhD student Jingyan Wang won the Best Student Paper Award at AAMAS 2019
Nominated for the Best Paper Award
- Loss Functions, Axioms, and Peer Review
Ritesh Noothigattu, Nihar B. Shah and Ariel Procaccia
ICML Workshop on Incentives in Machine Learning 2020
Code
- On Strategyproof Conference Review
Yichong Xu, Han Zhao, Xiaofei Shi and Nihar B. Shah
IJCAI 2019
Code and data
- Design and Analysis of the NIPS 2016 Review Process
Nihar B. Shah, Behzad Tabibian, Krikamol Muandet, Isabelle Guyon and Ulrike von Luxburg
Journal of Machine Learning Research 2018
PAST RESEARCH ON CROWDSOURCING
- Stretching the Effectiveness of MLE from Accuracy to Bias for Pairwise Comparisons
Jingyan Wang, Nihar B. Shah and R. Ravi
AISTATS 2020.
- Two-Sample Testing with Ranked Preference Data and the Role of Modeling Assumptions
Charvi Rastogi, Sivaraman Balakrishnan, Nihar B. Shah and Aarti Singh
ISIT 2020.
Video Slides
- Approval Voting and Incentives for Crowdsourcing
Nihar B. Shah, Dengyong Zhou and Yuval Peres
ACM Transactions on Economics and Computation (to appear; shorter version in ICML 2015).
Video Slides Dataset
- A Permutation-based Model for Crowd Labeling: Optimal Estimation and Robustness
Nihar B. Shah, Sivaraman Balakrishnan and Martin J. Wainwright
IEEE Transactions on Information Theory (to appear)
Code for the WAN and the OBI-WAN estimators
Dataset
- Active Ranking from Pairwise Comparisons and when Parametric Assumptions Do Not Help
Reinhard Heckel, Nihar B. Shah, Kannan Ramchandran and Martin J. Wainwright
Annals of Statistics 2019
Code
- Low Permutation-rank Matrices: Structural Properties and Noisy Completion
Nihar B. Shah, Sivaraman Balakrishnan and Martin J. Wainwright
JMLR 2019
- Feeling the Bern: Adaptive Estimators for Bernoulli Probabilities of Pairwise Comparisons
Nihar B. Shah, Sivaraman Balakrishnan and Martin J. Wainwright
IEEE Transactions on Information Theory (Shorter version at ISIT 2016).
Code for the CRL estimator
- Simple, Robust and Optimal Ranking from Pairwise Comparisons
Nihar B. Shah and Martin J. Wainwright
Journal of Machine Learning Research, 2018.
Dataset
- Stochastically Transitive Models for Pairwise Comparisons: Statistical and Computational Issues
Nihar B. Shah, Sivaraman Balakrishnan, Adityanand Guntuboyina and Martin J. Wainwright
IEEE Transactions on Information Theory 2017 (Shorter version at ICML 2016).
- No Oops, You Won't Do It Again: Mechanisms for Self-correction in Crowdsourcing
Nihar B. Shah and Dengyong Zhou
ICML 2016.
- Estimation from Pairwise Comparisons: Sharp Minimax Bounds with Topology Dependence
Nihar B. Shah, Sivaraman Balakrishnan, Joseph Bradley, Abhay Parekh, Kannan Ramchandran, and Martin J. Wainwright
The Journal of Machine Learning Research, 2016.
Dataset for cardinal vs. ordinal Dataset for pairwise comparison topologies
- Double or Nothing: Multiplicative Incentive Mechanisms for Crowdsourcing
Nihar B. Shah and Dengyong Zhou
Journal of Machine Learning Research 2016 (shorter version at NeurIPS 2015).
Dataset
- Parametric Prection from Parametric Agents
Yuan Luo, Nihar B. Shah, Jianwei Huang, Jean Walrand
Operations Research, 2017.
- Truth Serums for Massively Crowdsourced Evaluation Tasks
Vijay Kamble, Nihar Shah, David Marn, Abhay Parekh, Kannan Ramachandran
SCUGC 2015: The 5th Workshop on Social Computing and User-Generated Content.
- On the Impossibility of Convex Inference in Human Computation
Nihar B. Shah and Dengyong Zhou
AAAI, Austin, Jan. 2015.
- A Case for Ordinal Peer-evaluation in MOOCs
Nihar B. Shah, Joseph Bradley, Abhay Parekh, Martin J. Wainwright, Kannan Ramchandran
Neural Information Processing Systems (NeurIPS): Workshop on Data Driven Education, Lake Tahoe, Dec. 2013.
- Regularized Minimax Conditional Entropy for Crowdsourcing
Dengyong Zhou, Qiang Liu, John Platt, Christopher Meek, and Nihar B. Shah
Dec. 2014.
PAST RESEARCH ON DISTRIBUTED STORAGE
(* indicates equal contribution)
- The MDS Queue: Analysing Latency Performance of Codes
Nihar B. Shah, Kangwook Lee and Kannan Ramchandran
IEEE Transactions on Information Theory, 2017.
- A Piggybacking Design Framework for Read-and Download-efficient Distributed Storage Codes
K. V. Rashmi, Nihar B. Shah and Kannan Ramchandran
IEEE Transactions on Information Theory, 2017.
Slides from conference (ISIT) presentation
- When Do Redundant Requests Reduce Latency ?
Nihar B. Shah, Kangwook Lee and Kannan Ramchandran
IEEE Transactions on Communication, Feb. 2016.
Slides
- Distributed Storage Codes with Repair-by-Transfer and Non-achievability of Interior Points on the Storage-Bandwidth Tradeoff
Nihar B. Shah*, K. V. Rashmi*, P. Vijay Kumar and Kannan Ramchandran
IEEE Transactions on Information Theory, March 2012.
- Optimal Exact-Regenerating Codes for Distributed Storage at the MSR and MBR Points via a Product-Matrix Construction
K. V. Rashmi*, Nihar B. Shah* and P. Vijay Kumar
IEEE Transactions on Information Theory, August 2011.
IEEE Data Storage Best Paper and Best Student Paper Awards for the years 2011 & 2012.
- Interference Alignment in Regenerating Codes for Distributed Storage: Necessity and Code Constructions
Nihar B. Shah*, K. V. Rashmi*, P. Vijay Kumar and Kannan Ramchandran
IEEE Transactions on Information Theory, April 2012.
- On Minimizing Data-read and Download for Storage-Node Recovery
Nihar B. Shah
IEEE Communications Letters, 2013.
Second place in the first ACM University Student Research Competition, 2013.- Having Your Cake and Eating It Too: Jointly Optimal Codes for I/O, Storage and Network-bandwidth In Distributed Storage Systems
KV Rashmi, Preetum Nakkiran, Jingyan Wang, Nihar B. Shah, and Kannan Ramchandran
USENIX FAST, Santa Clara, Feb. 2015.
Picked as the best paper of USENIX FAST 2015 by StorageMojo.
- Fundamental Limits on Communication for Oblivious Updates in Storage Networks
Preetum Nakkiran, Nihar B. Shah, K. V. Rashmi
IEEE GLOBECOM 2014, Dec. 2014.
- A "Hitchhiker's" Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers
K. V. Rashmi, Nihar B. Shah, Dikang Gu, Hairong Kuang, Dhruba Borthakur, and Kannan Ramchandran
ACM SIGCOMM, Aug 2014.
- One Extra Bit of Download Ensures Perfectly Private Information Retrieval
Nihar B. Shah, K. V. Rashmi and Kannan Ramchandran
ISIT 2014.
- A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems: A Study on the Facebook Warehouse Cluster
K. V. Rashmi, Nihar B. Shah, Dikang Gu, Hairong Kuang, Dhruba Borthakur, and Kannan Ramchandran
USENIX HotStorage, San Jose, Jun. 2013.
- Secret Sharing Across a Network with Low Communication Cost: Distributed Algorithm and Bounds
Nihar B. Shah, K. V. Rashmi and Kannan Ramchandran
IEEE International Symposium on Information Theory (ISIT), Istanbul, Jul. 2013.
Slides Poster
- Regenerating Codes for Errors and Erasures in Distributed Storage
K. V. Rashmi*, Nihar B. Shah*, Kannan Ramchandran, and P. Vijay Kumar
IEEE International Symposium on Information Theory (ISIT), Cambridge, Jul. 2012.
Slides
- Information-theoretically Secure Regenerating Codes for Distributed Storage
Nihar B. Shah*, K. V. Rashmi*, and P. Vijay Kumar
Globecom 2011.
- Enabling Node Repair in Any Erasure Code for Distributed Storage
K. V. Rashmi*, Nihar B. Shah* and P. Vijay Kumar
IEEE International Symposium on Information Theory (ISIT), St. Petersburg, Jul. 2011.- A Flexible Class of Regenerating Codes for Distributed Storage
Nihar B. Shah*, K. V. Rashmi*, and P. Vijay Kumar
IEEE International Symposium on Information Theory (ISIT), Austin, Jun. 2010.
- Explicit and Optimal Exact-Regenerating Codes for the Minimum-Bandwidth Point in Distributed Storage
K. V. Rashmi*, Nihar B. Shah*, P. Vijay Kumar, and Kannan Ramchandran
IEEE International Symposium on Information Theory (ISIT), Austin, Jun. 2010.
- Explicit Codes Minimizing Repair Bandwidth for Distributed Storage    (the complete version on Arxiv)
Nihar B. Shah*, K. V. Rashmi*, P. Vijay Kumar and Kannan Ramchandran
IEEE Information Theory Workshop (ITW), Cairo, Jan. 2010.
- Explicit Construction of Optimal Exact Regenerating Codes for Distributed Storage
K. V. Rashmi*, Nihar B. Shah*, P. Vijay Kumar and Kannan Ramchandran
Allerton Conference on Control, Computing and Communication, Urbana-Champaign, Sep. 2009.
- Regenerating Codes for Distributed Storage Networks (invited)
Nihar B. Shah*, K. V. Rashmi*, P. Vijay Kumar, and Kannan Ramchandran
International Workshop on the Arithmetic of Finite Fields (WAIFI), Istanbul, Jun. 2010.
- Network Coding
K. V. Rashmi*, Nihar B. Shah* and P. Vijay Kumar.
Resonance, vol. 15, no. 7, pp. 604-621., Jul. 2010.
(Resonance is a journal of science education published by the Indian Academy of Sciences)- Distributed Storage System for Optimal Storage Space and Network Bandwidth Utilization and A Method Thereof
K. V. Rashmi*, Nihar B. Shah* and P. Vijay Kumar
US Patent, Nov 2011.
GROUP PHD STUDENTS
Jingyan Wang
Robotics Institute, CMU
Ivan Stelmakh
Machine Learning Department, CMU
(advised jointly with Aarti Singh)
Charvi Rastogi
Machine Learning Department, CMU
(advised jointly with Ken Holstein)
Steven Jecmen
Computer Science Department, CMU
(advised jointly with Fei Fang)
UNDERGRADUATE STUDENTS
Wenxin Ding
Mathematics and Computer Science, CMU
(advised jointly with Weina Wang)
Komal Dhull
Computer Science, CMU
Ryan Liu
Computer Science, CMU
Carmel Baharav
Computer Science, CMU
FUNDING We gratefully acknowledge support from NSF and CMU Block center!
TEACHING
Fall 2020 10-715 Advanced Introduction to Machine Learning Spring 2020 15-780 Graduate Artificial Intelligence Fall 2019 10-715 Advanced Introduction to Machine Learning Spring 2019 15-780 Graduate Artificial Intelligence Fall 2017 10-709 Fundamentals of Learning from the Crowd
CURRICULUM VITAE EDUCATION- UC Berkeley
PhD in Electrical Engineering and Computer Sciences
Advisors: Prof. Martin J. Wainwright and Prof. Kannan Ramchandran
Other members of thesis committee: Prof. Christos Papadimitriou and Prof. Tom Griffiths
- Indian Institute of Science (IISc), Bangalore
M.E. in Telecommunication
Thesis: Minimizing Repair Bandwidth in Distributed Storage Systems
Advisor: Prof. P. Vijay Kumar
- National Institute of Technology Karnataka, Surathkal
B. Tech. in Electronics and Communication
PUBLICATIONS
-
Please visit the publications page.
HONORS
- NSF CAREER award 2020-2025
- Best paper nomination at AAMAS 2019
- Mentored and co-authored Best Student Paper at AAMAS 2019 to my PhD student Jingyan Wang
- PhD thesis received David J. Sakrison Memorial Prize for a "truly outstanding piece of research" at EECS, UC Berkeley, May 2017
- Outstanding Graduate Student Instructor award at UC Berkeley, 2015-16
- Microsoft Research PhD Fellowship, 2014-2016.
- IEEE Data Storage Best Paper and Best Student Paper awards for years 2011 & 2012
- Second place in the first ACM University Student Research Competition, 2013.
- Berkeley Fellowship, 2011-13 (the most prestigious fellowship for incoming graduate students at UC Berkeley).
- Excellence Award for the academic year 2011-2012 at UC Berkeley.
- Prof. SVC Aiya Medal for the best master-of-engineering student in the ECE department at IISc, 2010.
INDUSTRY EXPERIENCE
- Intern, Microsoft Research Redmond, May 2013 to August 2013 and May 2014 to August 2014
- Crowdsourcing algorithms.
- Project Associate, IISc-Infosys collaborative project, Bangalore, July 2010 to June 2011
- Algorithms for robust and efficient media content distribution networks.
- Member of Technical Staff, Adobe Systems, Bangalore, July 2007 to July 2008.
- Worked on Adobe Captivate, an automated e-learning authoring tool.
- Mitigating Manipulation in Peer Review via Randomized Reviewer Assignments