Search

A pose-mapping technique could remotely evaluate patients with cerebral palsy

The machine-learning method works on most mobile devices and could be expanded to assess other motor disorders outside of the doctor’s office.

Multimodal Foundation Models: From Specialists to General-Purpose Assistants

This paper presents a comprehensive survey of the taxonomy and evolution of multimodal foundation models that demonstrate vision and vision-language capabilities, focusing on the transition from specialist models to general-purpose assistants. The research landscape encompasses five core topics...

Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff

Blockwise self-attentional encoder models have recently emerged as one promising end-to-end approach to simultaneous speech translation. These models employ a blockwise beam search with hypothesis reliability scoring to determine when to wait for more input speech before translating further. However...

Multi-view Fuzzy Representation Learning with Rules based Model

Unsupervised multi-view representation learning has been extensively studied for mining multi-view data. However, some critical challenges remain. On the one hand, the existing methods cannot explore multi-view data comprehensively since they usually learn a common representation between views...

World’s biggest bat colony gathers in Zambia every year: we used artificial intelligence to count them

Ethical Challenges in Gamified Education Research and Development: An Umbrella Review and Potential Directions

Gamification is a technological, economic, cultural, and societal development toward promoting a more game-like reality. As this emergent phenomenon has been gradually consolidated into our daily lives, especially in educational settings, many scholars and practitioners face a major challenge ahead...

Generating Visual Scenes from Touch

An emerging line of work has sought to generate plausible imagery from touch. Existing approaches, however, tackle only narrow aspects of the visuo-tactile synthesis problem, and lag significantly behind the quality of cross-modal synthesis methods in other domains. We draw on recent advances in...

The Surveillance AI Pipeline

A rapidly growing number of voices have argued that AI research, and computer vision in particular, is closely tied to mass surveillance. Yet the direct path from computer vision research to surveillance has remained obscured and difficult to assess. This study reveals the Surveillance AI pipeline...

Navigating the risks and benefits of AI: Lessons from nanotechnology on ensuring emerging technologies are safe as well as successful

Twenty years ago, nanotechnology was the artificial intelligence of its time. The specific details of these technologies are, of course, a world apart. But the challenges of ensuring each technology’s responsible and beneficial development are surprisingly alike. Nanotechnology, which is…

Is AI in the eye of the beholder?

Study shows users can be primed to believe certain things about an AI chatbot’s motives, which influences their interactions with the chatbot.

Who Audits the Auditors? Recommendations from a field scan of the algorithmic auditing ecosystem

AI audits are an increasingly popular mechanism for algorithmic accountability; however, they remain poorly defined. Without a clear understanding of audit practices, let alone widely used standards or regulatory guidance, claims that an AI product or system has been audited, whether by first-...

Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve

The widespread adoption of large language models (LLMs) makes it important to recognize their strengths and limitations. We argue that in order to develop a holistic understanding of these systems we need to consider the problem that they were trained to solve: next-word prediction over Internet...

Finger-shaped sensor enables more dexterous robots

MIT engineers develop a long, curved touch sensor that could enable a robot to grasp and manipulate objects in multiple ways.

“I Don’t Know If We’re Doing Good. I Don’t Know If We’re Doing Bad”: Investigating How Practitioners Scope, Motivate, and Conduct Privacy Work When Developing AI Products

How do practitioners who develop consumer AI products
scope, motivate, and conduct privacy work? Respecting privacy is a key principle for developing ethical, human-centered
AI systems, but we cannot hope to better support practitioners
without answers to that question. We interviewed 35 industry...

Who's Harry Potter? Approximate Unlearning in LLMs

Large language models (LLMs) are trained on massive internet corpora that often contain copyrighted content. This poses legal and ethical challenges for the developers and users of these models, as well as the original authors and publishers. In this paper, we propose a novel technique for...

To Build Our Future, We Must Know Our Past: Contextualizing Paradigm Shifts in Natural Language Processing

NLP is in a period of disruptive change that is impacting our methodologies, funding sources, and public perception. In this work, we seek to understand how to shape our future by better understanding our past. We study factors that shape NLP as a field, including culture, incentives, and...

A method to interpret AI might not be so interpretable after all

Some researchers see formal specifications as a way for autonomous systems to "explain themselves" to humans. But a new study finds that we aren't understanding.

NZ police are using AI to catch criminals – but the law urgently needs to catch up too

Evaluating the Fairness of Discriminative Foundation Models in Computer Vision

We propose a novel taxonomy for bias evaluation of discriminative foundation models, such as Contrastive Language-Pretraining (CLIP), that are used for labeling tasks. We then systematically evaluate existing methods for mitigating bias in these models with respect to our taxonomy. Specifically, we...

Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection

Long-tailed object detection (LTOD) aims to handle the extreme data imbalance in real-world datasets, where many tail classes have scarce instances. One popular strategy is to explore extra data with image-level labels, yet it produces limited results due to (1) semantic ambiguity -- an image-level...

Stay in the loop

Subscribe to our newsletter for a weekly update on the latest podcast, news, events, and jobs postings.