Search

Solving a machine-learning mystery

A new study shows how large language models like GPT-3 can learn a new task from just a few examples, without the need for any new training data.

Building Scalable Video Understanding Benchmarks through Sports

Existing benchmarks for evaluating long video understanding falls short on multiple aspects, either lacking in scale or quality of annotations. These limitations arise from the difficulty in collecting dense annotations for long videos (e.g. actions, dialogues, etc.), which are often obtained by...

Relative representations enable zero-shot latent space communication

Neural networks embed the geometric structure of a data manifold lying in a high-dimensional space into latent representations. Ideally, the distribution of the data points in the latent space should depend only on the task, the data, the loss, and other architecture-specific constraints. However...

Progress measures for grokking via mechanistic interpretability

Neural networks often exhibit emergent behavior, where qualitatively new capabilities arise from scaling up the amount of parameters, training data, or training steps. One approach to understanding emergence is to find continuous \textit{progress measures} that underlie the seemingly discontinuous...

Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image Retrieval

In Composed Image Retrieval (CIR), a user combines a query image with text to describe their intended target. Existing methods rely on supervised learning of CIR models using labeled triplets consisting of the query image, text specification, and the target image. Labeling such triplets is expensive...

Putting clear bounds on uncertainty

Computer scientists want to know the exact limits in our ability to clean up, and reconstruct, partly blurred images.

Cross-Modal Fine-Tuning: Align then Refine

Fine-tuning large-scale pretrained models has led to tremendous progress in well-studied modalities such as vision and NLP. However, similar gains have not been observed in many other modalities due to a lack of relevant pretrained models. In this work, we propose ORCA, a general cross-modal fine...

Do Users Write More Insecure Code with AI Assistants?

We conduct the first large-scale user study examining how users interact with an AI Code assistant to solve a variety of security related tasks across different programming languages. Overall, we find that participants who had access to an AI assistant based on OpenAI's codex-davinci-002 model wrote...

Predictive Multiplicity in Probabilistic Classification

There may exist multiple models that perform almost equally well for any given prediction task. We examine how predictions change across these competing models. In particular, we study predictive multiplicity -- in probabilistic classification. We formally define measures for our setting and develop...

The fight against antibiotic resistance is growing more urgent, but artificial intelligence can help

Efficient technique improves machine-learning models’ reliability

The method enables a model to determine its confidence in a prediction, while using no additional data and far fewer computing resources than other methods.

Task Ambiguity in Humans and Language Models

Language models have recently achieved strong performance across a wide range of NLP benchmarks. However, unlike benchmarks, real world tasks are often poorly specified, and agents must deduce the user's intended behavior from a combination of context, instructions, and examples. We investigate how...

Effective Data Augmentation With Diffusion Models

Data augmentation is one of the most prevalent tools in deep learning, underpinning many recent advances, including those from classification, generative models, and representation learning. The standard approach to data augmentation combines simple transformations like rotations and flips to...

AfriSenti: A Twitter Sentiment Analysis Benchmark for African Languages

Africa is home to over 2000 languages from over six language families and has the highest linguistic diversity among all continents. This includes 75 languages with at least one million speakers each. Yet, there is little NLP research conducted on African languages. Crucial in enabling such research...

We pitted ChatGPT against tools for detecting AI-written text, and the results are troubling

Integrating humans with AI in structural design

A process that seeks feedback from human specialists proves more effective at optimization than automated systems working alone.

Navigating the Grey Area: Expressions of Overconfidence and Uncertainty in Language Models

Despite increasingly fluent, relevant, and coherent language generation, major gaps remain between how humans and machines use language. We argue that a key dimension that is missing from our understanding of language models (LMs) is the model's ability to interpret and generate expressions of...

Language-Driven Representation Learning for Robotics

Recent work in visual representation learning for robotics demonstrates the viability of learning from large video datasets of humans performing everyday tasks. Leveraging methods such as masked autoencoding and contrastive learning, these representations exhibit strong transfer to policy learning...

A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others

Machine learning models have been found to learn shortcuts -- unintended decision rules that are unable to generalize -- undermining models' reliability. Previous works address this problem under the tenuous assumption that only a single shortcut exists in the training data. Real-world images are...

Our neurodata can reveal our most private selves. As brain implants become common, how will it be protected?

Stay in the loop

Subscribe to our newsletter for a weekly update on the latest podcast, news, events, and jobs postings.