Research
I am primarily interested in problems at the intersection of multimodality, commonsense abilities, and reasoning. My broad goal, in future, is to work towards unified architectures capable of modelling efficient and semantically convergent representations of different modalities corresponding to human sense-perception, to design and evaluate systems that enable the emergence of commonsense abilities in large models akin to humans, and to enable sound, verifiable, and scalable reasoning in foundation models.
Ongoing Work
UG Thesis on Neurosymbolic Reasoning for VQA: I am extend the framework proposed in NELLIE and TV-TREES line of papers to enable grounded and verifiable reasoning for Visual Question Answering (VQA). I primarily working to (a) expand the scope to open-domain commonsense-based VQA, (b) improve the mechanism of visual information integration into the inference engine, (c) use small, local models instead of large closed-source models in the engine.
Multimodal Representational Alignment: Furthering the work done in Platonic Representation Hypothesis to Vision-Language Models (VLMs), I am working alongside a group at the Precog Lab, in collaboration with MSR India, to explore the implications of multimodal representational alignment. We aim to understand how alignment relates to performance across downstream vision-language tasks, investigating the mechanisms through which alignment emerges, and exploring how these insights generalize across different model families and downstream task.
Past Experience
ML-based Forecasting of Antiretroviral Therapy (ART) Drugs: I worked with Prof. Debayan Gupta and in collaboration with Prof. Steven Clipman of the Johns Hopkins Medicine Institute to develop machine learning methods for enhanced forecasting of various antiretroviral therapy (ART) drug regimens for the National AIDS Control Authority of India (NACO) supported by the GKII Breakthrough Grant 2024. I led the development of the initial prototypes of our method and set up the data ingestion and preparation pipeline in coordinators with various stakeholders at NACO and presented initial results to NACO. We finally settled on using an adaptive model based on a combination of ARIMA and TimesFM which outperformed the existing methods significantly. Our poster on this work was also invited to be presented at the Johns Hopkins GKII India Tour in 2024.
Document Text Recognition for Indic Languages: As an ML Engineer Intern at Sarvam, I independently developed the first prototype of Sarvam's document text recognition pipeline from scratch, supporting over 10 Indic languages. My responsibilities included setting up the full pipeline for data curation, collection, and preparation. I experimented with various end-to-end multimodal architectures, exploring different encoder and decoder models to devise an appropriate modality fusion mechanism and fine-tuning recipe for the task. The entire system was trained end-to-end on large-scale data using a multi-node distributed training cluster framework. Additionally, I contributed to Sarvam's Parsing API through rigorous benchmarking and evaluations prior to its release.
Zero-Shot Coreset Selection: Under Prof. Raghavendra Singh, I worked to develop compute-efficient and easily scalable methods for coreset selection (dataset distillation) through a graph and network-analysis centered approach. I came up with a novel zero-shot method for coreset selection on image datasets using a simple PageRank-based approach for dataset distillation. The key highlight of this technique is its ability to identify important examples in a given dataset without requiring any training on that target data. The method achieves performance comes close to current state-of-the-art (SOTA) methods while being more computationally simple and efficient.
ML-based Cancer Metastasis Prognosis: I contributed to the AISCan project during my first year, which focused on the precision profiling of cancerous tumor cells for a predictive analysis of cancer metastasis and progression. We developed a novel machine learning-based method of digital cytometry to classify and quantify gene expression data from human stem cells. This allows for a highly accurate profiling of tumors to aid in the diagnosis and treatment of the disease. My specific responsibilities included collating and preparing the data used for training, running all ML-based experiments, and developing the primary codebase for a software package that integrates this functionality into a user-friendly tool. This work is currently under review (see the preprint below!)
All Publications
Application of Transformer-Based Language Models to Detect Hate Speech in Social Media
Journal of Computational and Cognitive Engineering (2021)