Search

Interpreting Language Models with Contrastive Explanations

Model interpretability methods are often used to explain NLP model decisions on tasks such as text classification, where the output space is relatively small. However, when applied to language generation, where the output space often consists of tens of thousands of tokens, these methods are unable...

Midge: Generating Image Descriptions From Computer Vision Detections

This paper introduces a novel generation
system that composes humanlike descrip-
tions of images from computer vision de-
tections. By leveraging syntactically in-
formed word co-occurrence statistics, the
generator filters and constrains the noisy
detections output from a vision system to...

Learned Incremental Representations for Parsing

We present an incremental syntactic representation that consists of assigning a single discrete label to each word in a sentence, where the label is predicted using strictly incremental processing of a prefix of the sentence, and the sequence of labels for a sentence fully determines a parse tree...

In India, Digital Snooping on Sanitation Workers

Lower-caste cleaners must wear GPS-enabled smartwatches, raising questions about their privacy and data protection

In bias we trust?

Explanation methods that help users determine whether to trust machine-learning model predictions can be less accurate for disadvantaged subgroups, a new study finds.

AI and machine learning are improving weather forecasts, but they won’t replace human experts

You reap what you sow: On the Challenges of Bias Evaluation Under Multilingual Settings

Evaluating bias, fairness, and social impact in monolingual language models is a difficult task. This challenge is further compounded when language modeling occurs in a multilingual context. Considering the implication of evaluation biases for large multilingual language models, we situate the...

Requirements and Motivations of Low-Resource Speech Synthesis for Language Revitalization

This paper describes the motivation and development of speech synthesis systems for the purposes of language revitalization. By building speech synthesis systems for three Indigenous languages spoken in Canada, Kanien’kéha, Gitksan & SENĆOŦEN, we re-evaluate the question of how much data is required...

DiBiMT: A Novel Benchmark for Measuring Word Sense Disambiguation Biases in Machine Translation

Lexical ambiguity poses one of the greatest challenges in the field of Machine Translation. Over the last few decades, multiple efforts have been undertaken to investigate incorrect translations caused by the polysemous nature of words. Within this body of research, some studies have posited that...

The downside of machine learning in health care

Assistant Professor Marzyeh Ghassemi explores how hidden biases in medical data could compromise artificial intelligence approaches.

Do AI systems really have their own secret language?

In Farming, a Constant Drive For Technology

Although real-world data is scant, proponents say robotics and AI will soon revolutionize agriculture.

OpenPrompt: An Open-source Framework for Prompt-learning

Prompt-learning has become a new paradigm in modern natural language processing, which directly adapts pre-trained language models (PLMs) to cloze-style prediction, autoregressive modeling, or sequence to sequence generation, resulting in promising performances on various tasks. However, no standard...

Automatic Gloss Dictionary for Sign Language Learners

A multi-language dictionary is a fundamental tool for language learning, allowing the learner to look up unfamiliar words. Searching an unrecognized word in the dictionary does not usually require deep knowledge of the target language. However, this is not true for sign language, where gestural...

DataLab: A Platform for Data Analysis and Intervention

Despite data’s crucial role in machine learning, most existing tools and research tend to focus on systems on top of existing data rather than how to interpret and manipulate data.In this paper, we propose DataLab, a unified data-oriented platform that not only allows users to interactively analyze...

Keeping web-browsing data safe from hackers

Studying a powerful type of cyberattack, researchers identified a flaw in how it’s been analyzed before, then developed new techniques that stop it in its tracks.

Machine Transliteration

It is challenging to translate names and technical terms across languages with different alphabets and sound inventories. These
items are commonly transliterated, i.e., replaced with approximate phonetic equivalents. For example, computer in English comes out
as ~ i/l:::'=--~-- (konpyuutaa) in...

The Algorithmic Imprint

When algorithmic harms emerge, a reasonable response is to stop using the algorithm to resolve concerns related to fairness, accountability, transparency, and ethics (FATE). However, just because an algorithm is removed does not imply its FATE-related issues cease to exist. In this paper, we...

Understanding Dataset Difficulty with V-Usable Information

Estimating the difficulty of a dataset typically involves comparing state-of-the-art models to humans; the bigger the performance gap, the harder the dataset is said to be. However, this comparison provides little understanding of how difficult each instance in a given distribution is, or what...

Seeing the whole from some of the parts

A new technique in computer vision may enhance our three-dimensional understanding of two-dimensional images.

Stay in the loop

Subscribe to our newsletter for a weekly update on the latest podcast, news, events, and jobs postings.