Search
Model interpretability methods are often used to explain NLP model decisions on tasks such as text classification, where the output space is relatively small. However, when applied to language generation, where the output space often consists of tens of thousands of tokens, these methods are unable...
This paper introduces a novel generation
system that composes humanlike descrip-
tions of images from computer vision de-
tections. By leveraging syntactically in-
formed word co-occurrence statistics, the
generator filters and constrains the noisy
detections output from a vision system to...
We present an incremental syntactic representation that consists of assigning a single discrete label to each word in a sentence, where the label is predicted using strictly incremental processing of a prefix of the sentence, and the sequence of labels for a sentence fully determines a parse tree...
Lower-caste cleaners must wear GPS-enabled smartwatches, raising questions about their privacy and data protection
Explanation methods that help users determine whether to trust machine-learning model predictions can be less accurate for disadvantaged subgroups, a new study finds.
AI and machine learning are improving weather forecasts, but they won’t replace human experts
Evaluating bias, fairness, and social impact in monolingual language models is a difficult task. This challenge is further compounded when language modeling occurs in a multilingual context. Considering the implication of evaluation biases for large multilingual language models, we situate the...
This paper describes the motivation and development of speech synthesis systems for the purposes of language revitalization. By building speech synthesis systems for three Indigenous languages spoken in Canada, Kanien’kéha, Gitksan & SENĆOŦEN, we re-evaluate the question of how much data is required...
Lexical ambiguity poses one of the greatest challenges in the field of Machine Translation. Over the last few decades, multiple efforts have been undertaken to investigate incorrect translations caused by the polysemous nature of words. Within this body of research, some studies have posited that...
Assistant Professor Marzyeh Ghassemi explores how hidden biases in medical data could compromise artificial intelligence approaches.
Do AI systems really have their own secret language?
Although real-world data is scant, proponents say robotics and AI will soon revolutionize agriculture.
Prompt-learning has become a new paradigm in modern natural language processing, which directly adapts pre-trained language models (PLMs) to cloze-style prediction, autoregressive modeling, or sequence to sequence generation, resulting in promising performances on various tasks. However, no standard...
A multi-language dictionary is a fundamental tool for language learning, allowing the learner to look up unfamiliar words. Searching an unrecognized word in the dictionary does not usually require deep knowledge of the target language. However, this is not true for sign language, where gestural...
Despite data’s crucial role in machine learning, most existing tools and research tend to focus on systems on top of existing data rather than how to interpret and manipulate data.In this paper, we propose DataLab, a unified data-oriented platform that not only allows users to interactively analyze...
Studying a powerful type of cyberattack, researchers identified a flaw in how it’s been analyzed before, then developed new techniques that stop it in its tracks.
It is challenging to translate names and technical terms across languages with different alphabets and sound inventories. These
items are commonly transliterated, i.e., replaced with approximate phonetic equivalents. For example, computer in English comes out
as ~ i/l:::'=--~-- (konpyuutaa) in...
When algorithmic harms emerge, a reasonable response is to stop using the algorithm to resolve concerns related to fairness, accountability, transparency, and ethics (FATE). However, just because an algorithm is removed does not imply its FATE-related issues cease to exist. In this paper, we...
Estimating the difficulty of a dataset typically involves comparing state-of-the-art models to humans; the bigger the performance gap, the harder the dataset is said to be. However, this comparison provides little understanding of how difficult each instance in a given distribution is, or what...
A new technique in computer vision may enhance our three-dimensional understanding of two-dimensional images.
Stay in the loop
Subscribe to our newsletter for a weekly update on the latest podcast, news, events, and jobs postings.