Content Tags

There are no tags.

More Efficient Estimation for Logistic Regression with Optimal Subsample.

RSS Source
Authors
HaiYing Wang

Facing large amounts of data, subsampling is a practical technique to extractuseful information. For this purpose, Wang et al. (2017) developed an OptimalSubsampling Method under the A-optimality Criterion (OSMAC) for logisticregression that samples more informative data points with higher probabilities.However, the original OSMAC estimator use inverse of optimal subsamplingprobabilities as weights in the likelihood function. This reduces contributionsof more informative data points and the resultant estimator may loseefficiency. In this paper, we propose a more efficient estimator based on OSMACsubsample without weighting the likelihood function. Both asymptotic resultsand numerical results show that the new estimator is more efficient. Inaddition, our focus in this paper is inference for the true parameter, whileWang et al. (2017) focuses on approximating the full data estimator. We alsodevelop a new algorithm based on Poisson sampling, which does not require toapproximate the optimal subsampling probabilities all at once. This iscomputationally advantageous when available random-access memory is not enoughto hold the full data. Interestingly, asymptotic distributions also show thatPoisson sampling produces more efficient estimator if the sampling rate, theratio of the subsample size to the full data sample size, does not converge tozero. We also obtain the unconditional asymptotic distribution for theestimator based on Poisson sampling.

Stay in the loop.

Subscribe to our newsletter for a weekly update on the latest podcast, news, events, and jobs postings.