Github: roma-patel
How do humans interpret and understand the meaning of a sentence? Are we internally decomposing it into abstractive, logical forms (lambda calculus?), relying only on what we know and have seen already (consult knowledge bases?), or is it a result of some uninterpretable mental signal process (neural nets?). More importantly, how can we get machines to efficiently represent text structures in a way that can allow such reasoning?
This is similar to learning meaningful and understandable representations of text, but the keyword is interpretable. In order to model interactions between two agents; to let them reason pragmatically and semantically about the words they encounter and use, our representation of text has to be recoverable to some degree, to allow iterative interaction. This is arguably not necessary in most instances (e.g., a classification or mapping task) but is extremely useful in systems that want to model and learn from interactions or generate text.
As humans, our reasoning and understanding capabilities are likely influenced by a number of factors; our participation in multimodal learning is very natural to us and our functioning. Is this a contributing factor to our higher reasoning abilities? More concretely, if we also engage machines in multimodal learning, does this improve their reasoning capabilities? The framework for certain understanding tasks like entailment, semantic composition, pragmatic reasoning etc., fit into this framework well, especially in the case of vision, and this is what we want to use to further natural language understanding. Can we map sequences of text to images they refer to and use this to infer knowledge about the text and further language understanding?
Is word meaning goal-oriented? One person in two seperate communication instances, each with a different shared “common ground” can use the same world very differently. If we have different word representations in worlds with different communicative agents that are built from different knowledge sets, can we then compose them together to understand word meaning? Another line of thought -- certain words and structures can be interpreted in multiple ways. Are these merely a result of syntax and sentence structure or are certain "concepts" more likely to lead to ambiguity? For example; given a sentence and the context around it, can our models determine the degree of ambiguity or clarity in the given context? Can we also uncover ambiguous structures in free text?
How do humans represent new concepts? How do they connect this with existing concepts? Extensive literature in cognitive science and psychology has both commended and criticised standard prototype and exemplar models, but is it some incorporation of both psychological theories interweaved with many layered networks that works best? How different are representations formed by humans from representations learned from the best models that attempt to perform the same task?
The Wall Street Journal, PubMed etc., are hugely important resources that have their own (very impactful) uses. But there is so much that we can learn from literary text; text that is free and unstructured in every sense and represents, in so many ways, the complexities and idealogies of different humans in different contexts, locations and lifetimes. Can language processing models and tasks extend to such larger units of text (paragraphs, chapters, entire novels) to uncover meaning, entity interactions, important plot events or summary story-lines?