Sainyam Galhotra

Sainyam Galhotra

Assistant Professor

Cornell University

Sainyam Galhotra

I am an assistant professor in Computer Science at Cornell University. Before that, I was a Computing Innovation Fellow pursuing postdoctoral research at the University of Chicago. The goal of my research is to develop data science tools for effective and responsible analytics.

My work has leveraged techniques from causal inference, data management, theoretical computer science, ML and HCI to understand various aspects of trustworthy system design including robustness, explainability, and fairness. I received my Ph.D. from University of Massachusetts Amherst under the supervision of Barna Saha. I completed my undergraduate studies from Indian Institute of Technology Delhi (IIT Delhi) in May, 2014 under the guidance of Prof. Amitabha Bagchi. Prior to joining UMass, I worked as a budding scientist at Xerox Research Centre India, Bangalore for a year.

I am actively looking for students to work with me. If you are interested, please complete an application here and email me sg@cs.cornell.edu.

Interests
  • Databases
  • Data Management
  • Responsible Data Science
  • Causal Inference
  • Machine Learning
Education
  • Postdoc

    University of Chicago

  • MS, PhD

    University of Massachusetts Amherst

  • BTech

    Indian Institute of Technology Delhi

Updates

Rising Star in Data Science at the Data Science Institute, UChicago
DAAD AInet Fellow
ACM SIGMOD Entity Resolution Programming Contest – Top 5 finalist
Most reproducible paper award in SIGMOD 2018 and 2019
First recipient of Krithi Ramamritham Computer Science Scholarship
Best paper award in SIGSOFT FSE 2017

Recent Publications

Quickly discover relevant content by filtering publications.
(2024). Building Taxonomies with Triplet Queries. Proceedings of the 32nd Symposium of Advanced Database Systems, Villasimius, Italy, June 23rd to 26th, 2024.

PDF Cite

(2024). Demonstration of Ver: View Discovery in the Wild. Companion of the 2024 International Conference on Management of Data, SIGMOD/PODS 2024, Santiago AA, Chile, June 9-15, 2024.

Cite DOI URL

(2024). Fainder: A Fast and Accurate Index for Distribution-Aware Dataset Search. Proc. VLDB Endow..

PDF Cite

(2024). Faster Algorithms for Fair Max-Min Diversification in R(^mboxd). Proc. ACM Manag. Data.

Cite DOI URL

(2024). First Workshop on Governance, Understanding and Integration of Data for Effective and Responsible AI (GUIDE-AI). Companion of the 2024 International Conference on Management of Data, SIGMOD/PODS 2024, Santiago AA, Chile, June 9-15, 2024.

Cite DOI URL

(2024). Intervention and Conditioning in Causal Bayesian Networks. Neurips 2024.

Cite DOI URL

(2024). Nexus: Correlation Discovery over Collections of Spatio-Temporal Tabular Data. 2024 International Conference on Management of Data, SIGMOD/PODS 2024, Santiago AA, Chile, June 9-15, 2024.

Cite DOI URL

(2023). Consistent Range Approximation for Fair Predictive Modeling. PVLDB.

PDF Cite DOI

(2023). Ver: View Discovery in the Wild. 39th IEEE International Conference on Data Engineering, ICDE 2023, Anaheim, CA, USA, April 3-7, 2023.

Cite DOI URL

(2023). Community Recovery in the Geometric Block Model. J. Mach. Learn. Res..

Cite URL

(2023). Consistent Range Approximation for Fair Predictive Modeling. Proc. VLDB Endow..

PDF Cite DOI

(2023). Causal What-If and How-To Analysis Using HypeR. 39th IEEE International Conference on Data Engineering, ICDE 2023, Anaheim, CA, USA, April 3-7, 2023.

Cite DOI URL

(2023). Metam: Goal-Oriented Data Discovery. 39th IEEE International Conference on Data Engineering, ICDE 2023, Anaheim, CA, USA, April 3-7, 2023*.

Cite DOI URL

(2023). Joint Proceedings of Workshops at the 49th International Conference on Very Large Data Bases (VLDB 2023), Vancouver, Canada, August 28 - September 1, 2023. CEUR-WS.org.

Cite URL

(2022). Causal Feature Selection for Algorithmic Fairness. SIGMOD ‘22: International Conference on Management of Data, Philadelphia, PA, USA, June 12 - 17, 2022.

PDF Cite DOI

(2022). DataPrism: Exposing Disconnect between Data and Systems. SIGMOD ‘22: International Conference on Management of Data, Philadelphia, PA, USA, June 12 - 17, 2022.

PDF Cite DOI

(2022). Explainable AI: Foundations, Applications, Opportunities for Data Management Research. 38th IEEE International Conference on Data Engineering, ICDE 2022, Kuala Lumpur, Malaysia, May 9-12, 2022.

PDF Cite DOI

(2022). Explainable AI: Foundations, Applications, Opportunities for Data Management Research. SIGMOD ‘22: International Conference on Management of Data, Philadelphia, PA, USA, June 12 - 17, 2022.

PDF Cite DOI

(2022). Fair k-Center Clustering in MapReduce and Streaming Settings. WWW ‘22: The ACM Web Conference 2022, Virtual Event, Lyon, France, April 25 - 29, 2022.

PDF Cite DOI

(2022). Hierarchical Entity Resolution using an Oracle. SIGMOD ‘22: International Conference on Management of Data, Philadelphia, PA, USA, June 12 - 17, 2022.

PDF Cite DOI

(2022). HypeR: Hypothetical Reasoning With What-If and How-To Queries Using a Probabilistic Causal Approach. SIGMOD ‘22: International Conference on Management of Data, Philadelphia, PA, USA, June 12 - 17, 2022.

PDF Cite DOI

(2022). Revisiting Online Data Markets in 2022: A Seller and Buyer Perspective. SIGMOD Rec..

Cite DOI URL

(2021). Semantic Concept Annotation for Tabular Data. CIKM ‘21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1 - 5, 2021.

PDF Cite DOI

(2021). BEER: Blocking for Effective Entity Resolution. SIGMOD ‘21: International Conference on Management of Data, Virtual Event, China, June 20-25, 2021.

PDF Cite DOI

(2021). Demonstration of Generating Explanations for Black-Box Algorithms Using Lewis. VLDB.

PDF Cite DOI

(2021). Adaptive Rule Discovery for Labeling Text Data. SIGMOD ‘21: International Conference on Management of Data, Virtual Event, China, June 20-25, 2021.

PDF Cite DOI

(2021). Explaining Black-Box Algorithms Using Probabilistic Contrastive Counterfactuals. SIGMOD ‘21: International Conference on Management of Data, Virtual Event, China, June 20-25, 2021.

PDF Cite DOI

(2021). How to Design Robust Algorithms using Noisy Comparison Oracle. Proc. VLDB Endow..

PDF Cite DOI

(2021). Interventional Fairness with Indirect Knowledge of Unobserved Protected Attributes. Entropy.

PDF Cite DOI

(2021). Learning to Generate Fair Clusters from Demonstrations. AIES ‘21: AAAI/ACM Conference on AI, Ethics, and Society, Virtual Event, USA, May 19-21, 2021.

PDF Cite DOI

(2021). Efficient and effective ER with progressive blocking. VLDB J..

PDF Cite DOI

(2020). Balancing the Tradeoff Between Clustering Value and Interpretability. AIES ‘20: AAAI/ACM Conference on AI, Ethics, and Society, New York, NY, USA, February 7-8, 2020.

PDF Cite DOI

(2020). Reliable Clustering with Applications to Data Integration. Proceedings of the VLDB 2020 PhD Workshop co-located with the 46th International Conference on Very Large Databases (VLDB 2020), ONLINE, August 31 - September 4, 2020.

PDF Cite

(2020). Semantic Search over Structured Data. CIKM ‘20: The 29th ACM International Conference on Information and Knowledge Management, Virtual Event, Ireland, October 19-23, 2020.

PDF Cite DOI

(2019). Automated Feature Enhancement for Predictive Modeling using External Knowledge. 2019 International Conference on Data Mining Workshops, ICDM Workshops 2019, Beijing, China, November 8-11, 2019.

PDF Cite DOI

(2019). Connectivity of Random Annulus Graphs and the Geometric Block Model. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2019, September 20-22, 2019, Massachusetts Institute of Technology, Cambridge, MA, USA.

PDF Cite DOI

(2019). Crowd-Sourced Entity Resolution with Control Queries. Proceedings of the 27th Italian Symposium on Advanced Database Systems, Castiglione della Pescaia (Grosseto), Italy, June 16-19, 2019.

PDF Cite

(2019). Influence Maximization Revisited: The State of the Art and the Gaps that Remain. Advances in Database Technology - 22nd International Conference on Extending Database Technology, EDBT 2019, Lisbon, Portugal, March 26-29, 2019.

PDF Cite DOI

(2018). Robust Entity Resolution Using a CrowdOracle. IEEE Data Eng. Bull..

PDF Cite

(2018). Robust Entity Resolution using Random Graphs. Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10-15, 2018.

PDF Cite DOI

(2018). The Geometric Block Model. Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018.

PDF Cite

(2018). The Geometric Block Model and Applications. 56th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2018, Monticello, IL, USA, October 2-5, 2018.

PDF Cite DOI

(2017). Debunking the Myths of Influence Maximization: An In-Depth Benchmarking Study. Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14-19, 2017.

PDF Cite DOI

(2017). Fairness testing: testing software for discrimination. Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2017, Paderborn, Germany, September 4-8, 2017.

PDF Cite DOI

(2016). Holistic Influence Maximization: Combining Scalability and Efficiency with Opinion-Aware Models. Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016.

PDF Cite DOI

(2016). QA(^mboxRT): A System for Real-Time Holistic Quality Assurance for Contact Center Dialogues. Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12-17, 2016, Phoenix, Arizona, USA.

PDF Cite

(2015). ASIM: A Scalable Algorithm for Influence Maximization under the Independent Cascade Model. Proceedings of the 24th International Conference on World Wide Web Companion, WWW 2015, Florence, Italy, May 18-22, 2015 - Companion Volume.

PDF Cite DOI

(2015). Optimal Radius for Connectivity in Duty-Cycled Wireless Sensor Networks. ACM Trans. Sens. Networks.

PDF Cite DOI

(2015). STAR: Real-time Spatio-Temporal Analysis and Prediction of Traffic Insights using Social Media. Companion Volume to the Proceedings of the 2nd IKDD Conference on Data Sciences, CODS 2015 Companion Volume, Bangalore, India, March 20, 2015.

PDF Cite DOI

(2015). Tracking the Conductance of Rapidly Evolving Topic-Subgraphs. Proc. VLDB Endow..

PDF Cite DOI

(2014). Min-d-Occur: Ensuring Future Occurrences in Streaming Sets. Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence, UAI 2014, Quebec City, Quebec, Canada, July 23-27, 2014.

PDF Cite

(2013). Optimal radius for connectivity in duty-cycled wireless sensor networks. 16th ACM International Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems, MSWiM ‘13, Barcelona, Spain, November 3-8, 2013.

PDF Cite DOI