Comparative study of several feature transformation and learning methods for phoneme classification

András Kocsor, László Tóth, András Kuba, K. Kovács, M. Jelasity, T. Gyimóthy, J. Csirik

Research output: Contribution to journalArticle

19 Citations (Scopus)

Abstract

This paper examines the applicability of some learning techniques for speech recognition, more precisely, for the classification of phonemes represented by a particular segment model. The methods compared were the IB1 algorithm (TiMBL), ID3 tree learning (C4.5), oblique tree learning (OC1), artificial neural nets (ANN), and Gaussian mixture modeling (GMM), and, as a reference, a hidden Markov model (HMM) recognizer was also trained on the same corpus. Before feeding them into the learners, the segmental features were additionally transformed using either linear discriminant analysis (LDA), principal component analysis (PCA), or independent component analysis (ICA). Each learner was tested with each transformation in order to find the best combination. Furthermore, we experimented with several feature sets, such as filter-bank energies, mel-frequency cepstral coefficients (MFCC), and gravity centers. We found LDA helped all the learners, in several cases quite considerably. PCA was beneficial only for some of the algorithms, and ICA improved the results quite rarely and was bad for certain learning methods. From the learning viewpoint, ANN was the most effective and attained the same results independently of the transformation applied. GMM behaved worse, which shows the advantages of discriminative over generative learning. TiMBL produced reasonable results; C4.5 and OC1 could not compete, no matter what transformation was tried.

Original languageEnglish
Pages (from-to)263-276
Number of pages14
JournalInternational Journal of Speech Technology
Volume3
Issue number3-4
DOIs
Publication statusPublished - Dec 2000

Fingerprint

phonemes
Independent component analysis
Discriminant analysis
learning method
Principal component analysis
learning
Neural networks
Filter banks
Hidden Markov models
Speech recognition
neural nets
discriminant analysis
Gravitation
principal components analysis
speech recognition
bank
Comparative Study
Phoneme
energy
gravitation

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Acoustics and Ultrasonics

Cite this

Comparative study of several feature transformation and learning methods for phoneme classification. / Kocsor, András; Tóth, László; Kuba, András; Kovács, K.; Jelasity, M.; Gyimóthy, T.; Csirik, J.

In: International Journal of Speech Technology, Vol. 3, No. 3-4, 12.2000, p. 263-276.

Research output: Contribution to journalArticle

@article{efb11c23b0a04642ab59868a831dd422,
title = "Comparative study of several feature transformation and learning methods for phoneme classification",
abstract = "This paper examines the applicability of some learning techniques for speech recognition, more precisely, for the classification of phonemes represented by a particular segment model. The methods compared were the IB1 algorithm (TiMBL), ID3 tree learning (C4.5), oblique tree learning (OC1), artificial neural nets (ANN), and Gaussian mixture modeling (GMM), and, as a reference, a hidden Markov model (HMM) recognizer was also trained on the same corpus. Before feeding them into the learners, the segmental features were additionally transformed using either linear discriminant analysis (LDA), principal component analysis (PCA), or independent component analysis (ICA). Each learner was tested with each transformation in order to find the best combination. Furthermore, we experimented with several feature sets, such as filter-bank energies, mel-frequency cepstral coefficients (MFCC), and gravity centers. We found LDA helped all the learners, in several cases quite considerably. PCA was beneficial only for some of the algorithms, and ICA improved the results quite rarely and was bad for certain learning methods. From the learning viewpoint, ANN was the most effective and attained the same results independently of the transformation applied. GMM behaved worse, which shows the advantages of discriminative over generative learning. TiMBL produced reasonable results; C4.5 and OC1 could not compete, no matter what transformation was tried.",
author = "Andr{\'a}s Kocsor and L{\'a}szl{\'o} T{\'o}th and Andr{\'a}s Kuba and K. Kov{\'a}cs and M. Jelasity and T. Gyim{\'o}thy and J. Csirik",
year = "2000",
month = "12",
doi = "10.1023/A:1026554814106",
language = "English",
volume = "3",
pages = "263--276",
journal = "International Journal of Speech Technology",
issn = "1381-2416",
publisher = "Springer Netherlands",
number = "3-4",

}

TY - JOUR

T1 - Comparative study of several feature transformation and learning methods for phoneme classification

AU - Kocsor, András

AU - Tóth, László

AU - Kuba, András

AU - Kovács, K.

AU - Jelasity, M.

AU - Gyimóthy, T.

AU - Csirik, J.

PY - 2000/12

Y1 - 2000/12

N2 - This paper examines the applicability of some learning techniques for speech recognition, more precisely, for the classification of phonemes represented by a particular segment model. The methods compared were the IB1 algorithm (TiMBL), ID3 tree learning (C4.5), oblique tree learning (OC1), artificial neural nets (ANN), and Gaussian mixture modeling (GMM), and, as a reference, a hidden Markov model (HMM) recognizer was also trained on the same corpus. Before feeding them into the learners, the segmental features were additionally transformed using either linear discriminant analysis (LDA), principal component analysis (PCA), or independent component analysis (ICA). Each learner was tested with each transformation in order to find the best combination. Furthermore, we experimented with several feature sets, such as filter-bank energies, mel-frequency cepstral coefficients (MFCC), and gravity centers. We found LDA helped all the learners, in several cases quite considerably. PCA was beneficial only for some of the algorithms, and ICA improved the results quite rarely and was bad for certain learning methods. From the learning viewpoint, ANN was the most effective and attained the same results independently of the transformation applied. GMM behaved worse, which shows the advantages of discriminative over generative learning. TiMBL produced reasonable results; C4.5 and OC1 could not compete, no matter what transformation was tried.

AB - This paper examines the applicability of some learning techniques for speech recognition, more precisely, for the classification of phonemes represented by a particular segment model. The methods compared were the IB1 algorithm (TiMBL), ID3 tree learning (C4.5), oblique tree learning (OC1), artificial neural nets (ANN), and Gaussian mixture modeling (GMM), and, as a reference, a hidden Markov model (HMM) recognizer was also trained on the same corpus. Before feeding them into the learners, the segmental features were additionally transformed using either linear discriminant analysis (LDA), principal component analysis (PCA), or independent component analysis (ICA). Each learner was tested with each transformation in order to find the best combination. Furthermore, we experimented with several feature sets, such as filter-bank energies, mel-frequency cepstral coefficients (MFCC), and gravity centers. We found LDA helped all the learners, in several cases quite considerably. PCA was beneficial only for some of the algorithms, and ICA improved the results quite rarely and was bad for certain learning methods. From the learning viewpoint, ANN was the most effective and attained the same results independently of the transformation applied. GMM behaved worse, which shows the advantages of discriminative over generative learning. TiMBL produced reasonable results; C4.5 and OC1 could not compete, no matter what transformation was tried.

UR - http://www.scopus.com/inward/record.url?scp=0034476654&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0034476654&partnerID=8YFLogxK

U2 - 10.1023/A:1026554814106

DO - 10.1023/A:1026554814106

M3 - Article

AN - SCOPUS:0034476654

VL - 3

SP - 263

EP - 276

JO - International Journal of Speech Technology

JF - International Journal of Speech Technology

SN - 1381-2416

IS - 3-4

ER -