Openings in the Lab

Prospective Applicants: If you’re interested in working together, please go over the list of project directions below or my publications page to see what I work on. Find a paper that look interesting and read it fully. If you’re still interested, please complete the corresponding form below:

  1. Prospective Phd Applicants
  2. Prospective visiting students (Undergrads, masters or PhDs)
  3. Cornell Masters and Undergrads (any department)

Below is a list of topics that I am currently working on. I am always interested to expand the list of topics based on my students’ interests. Feel free to suggest any new topic along with a justification and relevant ideas.

Data Discovery and Marketplaces

  • Goal-oriented Data Discovery

    Data Preparation

    Causal Inference

    Data Debugging

    Responsible Data Science (Fairness, Explainability, Robustness)

    1. Which project are you most interested in and Why?
    2. What do you think are some shortcomings of the respective papers?
    3. Describe any of your relevant skills that would be helpful for this project
    4. How much time are you willing to devote to a project in a week?
    5. Till when can you work on a project?
    6. What are your long term plans?
    7. Include “panda” at the end of your introductory email.

    If you have an idea for a new project that you want to work with me, you can describe that too.

    Mention 5 project directions and club the rest for my knowledge.

    Goal-oriented Data Science

    The project aims to build an open-source system to build data discovery and preparation toolkit for a downstream task. Relevant papers: Metam
    • EDA
    • Data Prep

    Semantic Data Understanding

    Practical Causal Inference

    - Efficiency of causal inference using semi-rings - Data discovery for causal inference

    Hypothetical Reasoning

    Debugging Data Science Pipelines

    - Debugging datasets - Development phase - Deployment phase

    Human in the loop data science

    Responsible Data Science

    Fairness

    - Classification -- noisy sensitive attribute -- missing sensitive attribute - Clustering

    Robustness

    - Clustering - Oracle-based data analysis

    Explainability

    Multi-modal Data Dsicovery

    - Extend metam - Metam with transformations

    Prompt engineering for Data Preparation

    Fine-tuning LLMs



    Not urgent

    Multi-objective data science

    Extend goal-oriented for mutiple objectives. Skyline operations. - Lexicographic preference

    Data Discovery and Sharing

    - How to build a marketplace - Federated settings?
  • Sainyam Galhotra
    Sainyam Galhotra
    Assistant Professor