A nearest neighbor estimate of the residual variance

Luc Devroye, L. Györfi, Gábor Lugosi, Harro Walk

Research output: Contribution to journalArticle

1 Citation (Scopus)


We study the problem of estimating the smallest achievable mean-squared error in regression function estimation. The problem is equivalent to estimating the second moment of the regression function of Y on X ∈ ℝd. We introduce a nearest-neighbor-based estimate and obtain a normal limit law for the estimate when X has an absolutely continuous distribution, without any condition on the density. We also compute the asymptotic variance explicitly and derive a non-asymptotic bound on the variance that does not depend on the dimension d. The asymptotic variance does not depend on the smoothness of the density of X or of the regression function. A non-asymptotic exponential concentration inequal-ity is also proved. We illustrate the use of the new estimate through testing whether a component of the vector X carries information for predicting Y.

Original languageEnglish
Pages (from-to)1752-1778
Number of pages27
JournalElectronic Journal of Statistics
Issue number1
Publication statusPublished - Jan 1 2018



  • Asymptotic normality
  • Concentration inequalities
  • Dimension reduction
  • Nearest-neighbor-based estimate
  • Regression functional

ASJC Scopus subject areas

  • Statistics and Probability

Cite this