Search

I used facial recognition technology on birds

Do you know this downy woodpecker?
Lewis Barnett, CC BY-SA

Lewis Barnett, University of Richmond

As a birder, I had heard that if you paid careful attention to the head feathers on the downy woodpeckers that…

The Montréal Declaration: Why we must develop AI responsibly

Yoshua Bengio, Université de Montréal

I have been doing research on intelligence for 30 years. Like most of my colleagues, I did not get involved in the field with the aim of producing technological objects, but because I have an interest in the the abstract nature of the notion of intelligence. I...

Value Iteration Networks

We introduce the value iteration network (VIN): a fully differentiable neural network with a `planning module' embedded within. VINs can learn to plan and are suitable for predicting outcomes that involve planning-based reasoning, such as policies for reinforcement learning. Key to our approach is a...

Matrix Completion has No Spurious Local Minimum

Matrix completion is a basic machine learning problem that has wide applications, especially in collaborative filtering and recommender systems. Simple non-convex optimization algorithms are popular and effective in practice. Despite recent progress in proving various non-convex algorithms converge...

Competitive Distribution Estimation: Why is Good-Turing Good

Estimating distributions over large alphabets is a fundamental machine-learning tenet. Yet no method is known to estimate all distributions well. For example, add-constant estimators are nearly min-max optimal but often perform poorly in practice, and practical estimators such as absolute...

Fast Convergence of Regularized Learning in Games

We show that natural classes of regularized learning algorithms with a form of recency bias achieve faster convergence rates to approximate efficiency and to coarse correlated equilibria in multiplayer normal form games. When each player in a game uses an algorithm from our class, their individual...

Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples

We identify obfuscated gradients, a kind of gradient masking, as a phenomenon that leads to a false sense of security in defenses against adversarial examples. While defenses that cause obfuscated gradients appear to defeat iterative optimization-based attacks, we find defenses relying on this...

Delayed Impact of Fair Machine Learning

Fairness in machine learning has predominantly been studied in static classification settings without concern for how decisions change the underlying population over time. Conventional wisdom suggests that fairness criteria promote the long-term well-being of those groups they aim to protect.
We...

A unified architecture for natural language processing: deep neural networks with multitask learning

We describe a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic roles, semantically similar words and the likelihood that the sentence makes sense (grammatically and...

On the convergence of ADAM and beyond

Several recently proposed stochastic optimization methods that have been successfully used in training deep networks such as RMSProp, Adam, Adadelta, Nadam are based on using gradient updates scaled by square roots of exponential moving averages of squared past gradients. In many applications, e.g...

Spherical CNNs

Convolutional Neural Networks (CNNs) have become the method of choice for learning problems involving 2D planar images. However, a number of problems of recent interest have created a demand for models that can analyze spherical images. Examples include omnidirectional vision for drones, robots, and...

Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments

Ability to continuously learn and adapt from limited experience in nonstationary environments is an important milestone on the path towards general intelligence. In this paper, we cast the problem of continuous adaptation into the learning-to-learn framework. We develop a simple gradient-based meta...

3Q: Aleksander Madry on building trustworthy artificial intelligence

A recent MIT symposium explores methods for making artificial intelligence systems more reliable, secure, and transparent.

Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS)

We present the first provably sublinear time hashing algorithm for approximate Maximum Inner Product Search (MIPS). Searching with (un-normalized) inner product as the underlying similarity measure is a known difficult problem and finding hashing schemes for MIPS was considered hard. While the...

A* Sampling

The problem of drawing samples from a discrete distribution can be converted into a discrete optimization problem. In this work, we show how sampling from a continuous distribution can be converted into an optimization problem over continuous space. Central to the method is a stochastic process...

Understanding Deep Learning Requires Rethinking Generalization

Despite their massive size, successful deep artificial neural networks can exhibit a remarkably small difference between training and test performance. Conventional wisdom attributes small generalization error either to properties of the model family or to the regularization techniques used during...

Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data

Some machine learning applications involve training data that is sensitive, such as the medical histories of patients in a clinical trial. A model may inadvertently and implicitly store some of its training data; careful analysis of the model may, therefore, reveal sensitive information.
To address...

Making Neural Programming Architectures Generalize via Recursion

Empirically, neural networks that attempt to learn programs from data have exhibited poor generalizability. Moreover, it has traditionally been difficult to reason about the behavior of these models beyond a certain level of input complexity. In order to address these issues, we propose augmenting...

Combining Online and Offline Knowledge in UCT

The UCT algorithm learns a value function online using sample-based search. The TD(λ) algorithm can learn a value function offline for the on-policy distribution. We consider three approaches for combining offline and online value functions in the UCT algorithm. First, the offline value function is...

Understanding Black-box Predictions via Influence Functions

How can we explain the predictions of a black-box model? In this paper, we use influence functions -- a classic technique from robust statistics -- to trace a model's prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a...

Stay in the loop

Subscribe to our newsletter for a weekly update on the latest podcast, news, events, and jobs postings.