3D shape estimation in video sequences provides high precision evaluation of facial expressions

László A. Jeni, András Lorincz, Tamás Nagy, Zsolt Palotai, Judit Sebok, Zoltán Szabó, Dániel Takács

Research output: Contribution to journalArticle

34 Citations (Scopus)


Person independent and pose invariant estimations of facial expressions and action unit (AU) intensity estimation are important for situation analysis and for automated video annotation. We evaluated raw 2D shape data of the CK+ database, used Procrustes transformation and the multi-class SVM leave-one-out method for classification. We found close to 100% performance demonstrating the relevance and the strength of details of the shape. Precise 3D shape information was computed by means of constrained local models (CLM) on video sequences. Such sequences offer the opportunity to compute a time-averaged '3D personal mean shape' (PMS) from the estimated CLM shapes, which - upon subtraction - gives rise to person independent emotion estimation. On CK+ data PMS showed significant improvements over AU0 normalization; performance reached and sometimes surpassed state-of-the-art results on emotion classification and on AU intensity estimation. 3D PMS from 3D CLM offers pose invariant emotion estimation that we studied by rendering a 3D emotional database for different poses and different subjects from the BU 4DFE database. Frontal shapes derived from CLM fits of the 3D shape were evaluated. Results demonstrate that shape estimation alone can be used for robust, high quality pose invariant emotion classification and AU intensity estimation.

Original languageEnglish
Pages (from-to)785-795
Number of pages11
JournalImage and Vision Computing
Issue number10
Publication statusPublished - Oct 2012


  • Action unit recognition
  • BU-4DFE
  • Constrained Local Model
  • Emotion classification
  • Shape information

ASJC Scopus subject areas

  • Signal Processing
  • Computer Vision and Pattern Recognition

Fingerprint Dive into the research topics of '3D shape estimation in video sequences provides high precision evaluation of facial expressions'. Together they form a unique fingerprint.

  • Cite this