Novel balanced feature representation for wikipedia vandalism detection task: Lab report for PAN at CLEF 2010

István Hegedüs, Róbert Ormándi, Richárd Farkas, Márk Jelasity

Research output: Contribution to journalConference article

Abstract

In online communities, like Wikipedia, where content edition is available for every visitor users who deliberately make incorrect, vandal comments are sure to turn up. In this paper we propose a strong feature set and a method that can handle this problem and automatically decide whether an edit is a vandal contribution or not. We present a new feature set that is a balanced and extended version of the well known Vector Space Model (VSM) and show that this representation outperforms the original VSM and its attribute selected version as well. Moreover, we describe other features that we used in our vandalism detection system and a parameter estimation method for a weighted voting metaclassifier.

Original languageEnglish
JournalCEUR Workshop Proceedings
Volume1176
Publication statusPublished - Jan 1 2010
Event2010 Cross Language Evaluation Forum Conference, CLEF 2010 - Padua, Italy
Duration: Sep 22 2010Sep 23 2010

    Fingerprint

ASJC Scopus subject areas

  • Computer Science(all)

Cite this