Novel balanced feature representation for wikipedia vandalism detection task: Lab report for PAN at CLEF 2010

István Hegedüs, Róbert Ormándi, Richárd Farkas, M. Jelasity

Research output: Conference contribution

Abstract

In online communities, like Wikipedia, where content edition is available for every visitor users who deliberately make incorrect, vandal comments are sure to turn up. In this paper we propose a strong feature set and a method that can handle this problem and automatically decide whether an edit is a vandal contribution or not. We present a new feature set that is a balanced and extended version of the well known Vector Space Model (VSM) and show that this representation outperforms the original VSM and its attribute selected version as well. Moreover, we describe other features that we used in our vandalism detection system and a parameter estimation method for a weighted voting metaclassifier.

Original languageEnglish
Title of host publicationCLEF 2010 - Working Notes for CLEF 2010 Conference
PublisherCEUR-WS
Volume1176
Publication statusPublished - 2010
Event2010 Cross Language Evaluation Forum Conference, CLEF 2010 - Padua, Italy
Duration: szept. 22 2010szept. 23 2010

Other

Other2010 Cross Language Evaluation Forum Conference, CLEF 2010
CountryItaly
CityPadua
Period9/22/109/23/10

Fingerprint

Vector spaces
Parameter estimation

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Hegedüs, I., Ormándi, R., Farkas, R., & Jelasity, M. (2010). Novel balanced feature representation for wikipedia vandalism detection task: Lab report for PAN at CLEF 2010. In CLEF 2010 - Working Notes for CLEF 2010 Conference (Vol. 1176). CEUR-WS.

Novel balanced feature representation for wikipedia vandalism detection task : Lab report for PAN at CLEF 2010. / Hegedüs, István; Ormándi, Róbert; Farkas, Richárd; Jelasity, M.

CLEF 2010 - Working Notes for CLEF 2010 Conference. Vol. 1176 CEUR-WS, 2010.

Research output: Conference contribution

Hegedüs, I, Ormándi, R, Farkas, R & Jelasity, M 2010, Novel balanced feature representation for wikipedia vandalism detection task: Lab report for PAN at CLEF 2010. in CLEF 2010 - Working Notes for CLEF 2010 Conference. vol. 1176, CEUR-WS, 2010 Cross Language Evaluation Forum Conference, CLEF 2010, Padua, Italy, 9/22/10.
Hegedüs I, Ormándi R, Farkas R, Jelasity M. Novel balanced feature representation for wikipedia vandalism detection task: Lab report for PAN at CLEF 2010. In CLEF 2010 - Working Notes for CLEF 2010 Conference. Vol. 1176. CEUR-WS. 2010
Hegedüs, István ; Ormándi, Róbert ; Farkas, Richárd ; Jelasity, M. / Novel balanced feature representation for wikipedia vandalism detection task : Lab report for PAN at CLEF 2010. CLEF 2010 - Working Notes for CLEF 2010 Conference. Vol. 1176 CEUR-WS, 2010.
@inproceedings{2903bb168488485f80b9a08a18c5a640,
title = "Novel balanced feature representation for wikipedia vandalism detection task: Lab report for PAN at CLEF 2010",
abstract = "In online communities, like Wikipedia, where content edition is available for every visitor users who deliberately make incorrect, vandal comments are sure to turn up. In this paper we propose a strong feature set and a method that can handle this problem and automatically decide whether an edit is a vandal contribution or not. We present a new feature set that is a balanced and extended version of the well known Vector Space Model (VSM) and show that this representation outperforms the original VSM and its attribute selected version as well. Moreover, we describe other features that we used in our vandalism detection system and a parameter estimation method for a weighted voting metaclassifier.",
author = "Istv{\'a}n Heged{\"u}s and R{\'o}bert Orm{\'a}ndi and Rich{\'a}rd Farkas and M. Jelasity",
year = "2010",
language = "English",
volume = "1176",
booktitle = "CLEF 2010 - Working Notes for CLEF 2010 Conference",
publisher = "CEUR-WS",

}

TY - GEN

T1 - Novel balanced feature representation for wikipedia vandalism detection task

T2 - Lab report for PAN at CLEF 2010

AU - Hegedüs, István

AU - Ormándi, Róbert

AU - Farkas, Richárd

AU - Jelasity, M.

PY - 2010

Y1 - 2010

N2 - In online communities, like Wikipedia, where content edition is available for every visitor users who deliberately make incorrect, vandal comments are sure to turn up. In this paper we propose a strong feature set and a method that can handle this problem and automatically decide whether an edit is a vandal contribution or not. We present a new feature set that is a balanced and extended version of the well known Vector Space Model (VSM) and show that this representation outperforms the original VSM and its attribute selected version as well. Moreover, we describe other features that we used in our vandalism detection system and a parameter estimation method for a weighted voting metaclassifier.

AB - In online communities, like Wikipedia, where content edition is available for every visitor users who deliberately make incorrect, vandal comments are sure to turn up. In this paper we propose a strong feature set and a method that can handle this problem and automatically decide whether an edit is a vandal contribution or not. We present a new feature set that is a balanced and extended version of the well known Vector Space Model (VSM) and show that this representation outperforms the original VSM and its attribute selected version as well. Moreover, we describe other features that we used in our vandalism detection system and a parameter estimation method for a weighted voting metaclassifier.

UR - http://www.scopus.com/inward/record.url?scp=84922022148&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84922022148&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84922022148

VL - 1176

BT - CLEF 2010 - Working Notes for CLEF 2010 Conference

PB - CEUR-WS

ER -