Vol 27, No 1 (2018)
- Year: 2018
- Articles: 5
- URL: https://journal-vniispk.ru/1066-5307/issue/view/13897
Article
Boundary Crossing Probabilities for General Exponential Families
Abstract
We consider parametric exponential families of dimension K on the real line. We study a variant of boundary crossing probabilities coming from the multi-armed bandit literature, in the case when the real-valued distributions form an exponential family of dimension K. Formally, our result is a concentration inequality that bounds the probability that Bψ(θ̂n, θ*) ≥ f(t/n)/n, where θ* is the parameter of an unknown target distribution, θ̂n is the empirical parameter estimate built from n observations, ψ is the log-partition function of the exponential family and Bψ is the corresponding Bregman divergence. From the perspective of stochastic multi-armed bandits, we pay special attention to the case when the boundary function f is logarithmic, as it is enables to analyze the regret of the state-of-the-art KL-ucb and KL-ucb+ strategies, whose analysis was left open in such generality. Indeed, previous results only hold for the case when K = 1, while we provide results for arbitrary finite dimension K, thus considerably extending the existing results. Perhaps surprisingly, we highlight that the proof techniques to achieve these strong results already existed three decades ago in the work of T. L. Lai, and were apparently forgotten in the bandit community. We provide a modern rewriting of these beautiful techniques that we believe are useful beyond the application to stochastic multi-armed bandits.
1-31
Asymptotic Analysis of the Jittering Kernel Density Estimator
Abstract
Jittering estimators are nonparametric function estimators for mixed data. They extend arbitrary estimators from the continuous setting by adding random noise to discrete variables. We give an in-depth analysis of the jittering kernel density estimator, which reveals several appealing properties. The estimator is strongly consistent, asymptotically normal, and unbiased for discrete variables. It converges at minimax-optimal rates, which are established as a by-product of our analysis. To understand the effect of adding noise, we further study its asymptotic efficiency and finite sample bias in the univariate discrete case. Simulations show that the estimator is competitive on finite samples. The analysis suggests that similar properties can be expected for other jittering estimators.
32-46
Limit Theory of Bivariate Generalized Order Statistics with Random Sample Size
Abstract
The class of limit distribution functions (df’s) of the random bivariate extreme, central and intermediate generalized order statistics (gos) from independent and identically distributed random variables (rv’s) is fully characterized. The cases, when the random sample size is independent of the basic variables and when the interrelation between the random sample size and the basic variables is not restricted, are considered.
47-59
Entropic Moments and Domains of Attraction on Countable Alphabets
Abstract
Modern information theory is largely developed in connection with random elements residing in large, complex, and discrete data spaces, or alphabets. Lacking natural metrization and hence moments, the associated probability and statistics theory must rely on information measures in the form of various entropies, for example, Shannon’s entropy, mutual information and Kullback–Leibler divergence, which are functions of an entropic basis in the form of a sequence of entropic moments of varying order. The entropicmoments collectively characterize the underlying probability distribution on the alphabet, and hence provide an opportunity to develop statistical procedures for their estimation. As such statistical development becomes an increasingly important line of research in modern data science, the relationship between the underlying distribution and the asymptotic behavior of the entropic moments, as the order increases, becomes a technical issue of fundamental importance. This paper offers a general methodology to capture the relationship between the rates of divergence of the entropic moments and the types of underlying distributions, for a special class of distributions. As an application of the established results, it is demonstrated that the asymptotic normality of the remarkable Turing’s formula for missing probabilities holds under distributions with much thinner tails than those previously known.
60-70
On the Maximum Likelihood Estimation of a Covariance Matrix
Abstract
For a multivariate normal set-up, it is well known that themaximumlikelihood estimator (MLE) of covariance matrix is neither admissible nor minimax under the Stein loss function. In this paper, we reveal that the MLE based on the Iwasawa parameterization leads to minimaxity with respect to the Stein loss function. Furthermore, a novel class of loss functions is proposed so that the minimum risks of the MLEs are identical in different coordinate systems, Cholesky parameterization and full Iwasawa parameterization. In other words, the MLEs based on these two different parameterizations are characterized by the property of minimaxity, without a Stein paradox. The application of our novel method to the high-dimensional covariance matrix problem is also discussed.
71-82
