My research focuses on creating holistic artificial agents that truly comprehend human language. This entails investigating grounded language learning in real-world contexts that extend beyond textual modality to include multimodal aspects of human interaction. This presents several challenges, three of which I am currently focusing on:
Human language comprehension and generation are inherently multimodal, with multiple signal sources and contexts interacting with one another. My research focuses on developing computational foundations and building automated models for grounded language learning across multiple modalities. Contextual models that integrate complex information and discourse across multiple mediums, multimodal pragmatics in face-to-face communication, and visually grounded machine translation are all examples of my work.
Machine learning methods are extremely powerful tools for making predictions or decisions based on data. However, the performance of these methods is heavily dependent on the data representation (or features) used. This is extremely difficult when models are learning from multiple modalities. This is also difficult because human languages change over time as new words, morphological variations, grammatical structures, and pragmatic shifts occur. My research has looked into various methods for integrating modular representations from various modalities and creating domain-invariant representations.
Machine learning models are becoming more complex and opaque, making it difficult to understand how they work and whether they are reliable. To ensure that machine learning models are fair, effective, and reliable, they must be thoroughly evaluated. One approach is to use a quantifiable metric to measure and summarise. However, we must be careful not to overestimate the results of such metrics. My research has focused on comprehending and interpreting complex models, as well as how they perform over abstract but quantifiable metrics. My work has also contributed proposals for novel metrics that take into account multiple sources of information.