A nearest neighbor estimate of the residual variance

Luc Devroye, L. Györfi, Gábor Lugosi, Harro Walk

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

We study the problem of estimating the smallest achievable mean-squared error in regression function estimation. The problem is equivalent to estimating the second moment of the regression function of Y on X ∈ ℝd. We introduce a nearest-neighbor-based estimate and obtain a normal limit law for the estimate when X has an absolutely continuous distribution, without any condition on the density. We also compute the asymptotic variance explicitly and derive a non-asymptotic bound on the variance that does not depend on the dimension d. The asymptotic variance does not depend on the smoothness of the density of X or of the regression function. A non-asymptotic exponential concentration inequal-ity is also proved. We illustrate the use of the new estimate through testing whether a component of the vector X carries information for predicting Y.

Original languageEnglish
Pages (from-to)1752-1778
Number of pages27
JournalElectronic Journal of Statistics
Volume12
Issue number1
DOIs
Publication statusPublished - Jan 1 2018

Fingerprint

Regression Function
Nearest Neighbor
Asymptotic Variance
Estimate
Regression Estimation
Limit Laws
Function Estimation
Continuous Distributions
Absolutely Continuous
Mean Squared Error
Smoothness
Moment
Testing
Nearest neighbor
Asymptotic variance

Keywords

  • Asymptotic normality
  • Concentration inequalities
  • Dimension reduction
  • Nearest-neighbor-based estimate
  • Regression functional

ASJC Scopus subject areas

  • Statistics and Probability

Cite this

A nearest neighbor estimate of the residual variance. / Devroye, Luc; Györfi, L.; Lugosi, Gábor; Walk, Harro.

In: Electronic Journal of Statistics, Vol. 12, No. 1, 01.01.2018, p. 1752-1778.

Research output: Contribution to journalArticle

Devroye, Luc ; Györfi, L. ; Lugosi, Gábor ; Walk, Harro. / A nearest neighbor estimate of the residual variance. In: Electronic Journal of Statistics. 2018 ; Vol. 12, No. 1. pp. 1752-1778.
@article{2a09c669c2834a6da760e13dace43d2a,
title = "A nearest neighbor estimate of the residual variance",
abstract = "We study the problem of estimating the smallest achievable mean-squared error in regression function estimation. The problem is equivalent to estimating the second moment of the regression function of Y on X ∈ ℝd. We introduce a nearest-neighbor-based estimate and obtain a normal limit law for the estimate when X has an absolutely continuous distribution, without any condition on the density. We also compute the asymptotic variance explicitly and derive a non-asymptotic bound on the variance that does not depend on the dimension d. The asymptotic variance does not depend on the smoothness of the density of X or of the regression function. A non-asymptotic exponential concentration inequal-ity is also proved. We illustrate the use of the new estimate through testing whether a component of the vector X carries information for predicting Y.",
keywords = "Asymptotic normality, Concentration inequalities, Dimension reduction, Nearest-neighbor-based estimate, Regression functional",
author = "Luc Devroye and L. Gy{\"o}rfi and G{\'a}bor Lugosi and Harro Walk",
year = "2018",
month = "1",
day = "1",
doi = "10.1214/18-EJS1438",
language = "English",
volume = "12",
pages = "1752--1778",
journal = "Electronic Journal of Statistics",
issn = "1935-7524",
publisher = "Institute of Mathematical Statistics",
number = "1",

}

TY - JOUR

T1 - A nearest neighbor estimate of the residual variance

AU - Devroye, Luc

AU - Györfi, L.

AU - Lugosi, Gábor

AU - Walk, Harro

PY - 2018/1/1

Y1 - 2018/1/1

N2 - We study the problem of estimating the smallest achievable mean-squared error in regression function estimation. The problem is equivalent to estimating the second moment of the regression function of Y on X ∈ ℝd. We introduce a nearest-neighbor-based estimate and obtain a normal limit law for the estimate when X has an absolutely continuous distribution, without any condition on the density. We also compute the asymptotic variance explicitly and derive a non-asymptotic bound on the variance that does not depend on the dimension d. The asymptotic variance does not depend on the smoothness of the density of X or of the regression function. A non-asymptotic exponential concentration inequal-ity is also proved. We illustrate the use of the new estimate through testing whether a component of the vector X carries information for predicting Y.

AB - We study the problem of estimating the smallest achievable mean-squared error in regression function estimation. The problem is equivalent to estimating the second moment of the regression function of Y on X ∈ ℝd. We introduce a nearest-neighbor-based estimate and obtain a normal limit law for the estimate when X has an absolutely continuous distribution, without any condition on the density. We also compute the asymptotic variance explicitly and derive a non-asymptotic bound on the variance that does not depend on the dimension d. The asymptotic variance does not depend on the smoothness of the density of X or of the regression function. A non-asymptotic exponential concentration inequal-ity is also proved. We illustrate the use of the new estimate through testing whether a component of the vector X carries information for predicting Y.

KW - Asymptotic normality

KW - Concentration inequalities

KW - Dimension reduction

KW - Nearest-neighbor-based estimate

KW - Regression functional

UR - http://www.scopus.com/inward/record.url?scp=85048490062&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85048490062&partnerID=8YFLogxK

U2 - 10.1214/18-EJS1438

DO - 10.1214/18-EJS1438

M3 - Article

AN - SCOPUS:85048490062

VL - 12

SP - 1752

EP - 1778

JO - Electronic Journal of Statistics

JF - Electronic Journal of Statistics

SN - 1935-7524

IS - 1

ER -