Comparison of skewness-based salient event detector algorithms in speech

Annamaria Kovacs, Gabor Kiss, Klara Vicsi, I. Winkler, Martin Coath

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

In this work, we compare two skewness-based salient event detector algorithms, which can detect transients in human speech signals. Speech transients are characterized by rapid changes in signal energy. The purpose of this study was to compare the identification of transients by two different methods based on skewness calculation in order to develop a method to be used in studying the processing of speech transients in the human brain. The first method, the skewness in variable time (SKV) finds transients using a cochlear model. The skewness of the energy distribution for a variable time window is implemented on artificial neural networks. The second method, the automatic segmentation method for transient detection (RoT) is more speech segmentation-based and developed for detecting transient-speech segment ratio in spoken records. In the current study, the test corpus included Hungarian and English speech recorded from different speakers (2 male and 2 female for both languages) Results were compared by the F-measure, the Jaccard similarity index, and the Hamming distance. The results of the two algorithms were also tested against a hand-labeled corpus annotated by linguistic experts for an absolute assessment of the performance of the two methods. Transient detection was tested once for onset events alone and, separately, for onset and offset events together. The results show that in most cases, the RoT method works better on the expert labeled databases. Using F measure with +-25ms window length the following results were obtained when all type of transient events were evaluated: 0,664 on English and 0,834 on Hungarian. Otherwise, the two methods identify the same stimulus features as the transients also coinciding with those hand-labeled by experts.

Original languageEnglish
Title of host publication6th IEEE Conference on Cognitive Infocommunications, CogInfoCom 2015 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages285-290
Number of pages6
ISBN (Print)9781467381291
DOIs
Publication statusPublished - Jan 25 2016
Event6th IEEE Conference on Cognitive Infocommunications, CogInfoCom 2015 - Gyor, Hungary
Duration: Oct 19 2015Oct 21 2015

Other

Other6th IEEE Conference on Cognitive Infocommunications, CogInfoCom 2015
CountryHungary
CityGyor
Period10/19/1510/21/15

Fingerprint

Detectors
Hand
Cochlea
Linguistics
Hamming distance
Language
Databases
Brain
Neural networks
Processing

Keywords

  • artificial neural networks
  • auditory events
  • auditory feature extraction
  • automatic speech segmentation
  • skewness
  • speech transients

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Human-Computer Interaction
  • Cognitive Neuroscience

Cite this

Kovacs, A., Kiss, G., Vicsi, K., Winkler, I., & Coath, M. (2016). Comparison of skewness-based salient event detector algorithms in speech. In 6th IEEE Conference on Cognitive Infocommunications, CogInfoCom 2015 - Proceedings (pp. 285-290). [7390605] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/CogInfoCom.2015.7390605

Comparison of skewness-based salient event detector algorithms in speech. / Kovacs, Annamaria; Kiss, Gabor; Vicsi, Klara; Winkler, I.; Coath, Martin.

6th IEEE Conference on Cognitive Infocommunications, CogInfoCom 2015 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2016. p. 285-290 7390605.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kovacs, A, Kiss, G, Vicsi, K, Winkler, I & Coath, M 2016, Comparison of skewness-based salient event detector algorithms in speech. in 6th IEEE Conference on Cognitive Infocommunications, CogInfoCom 2015 - Proceedings., 7390605, Institute of Electrical and Electronics Engineers Inc., pp. 285-290, 6th IEEE Conference on Cognitive Infocommunications, CogInfoCom 2015, Gyor, Hungary, 10/19/15. https://doi.org/10.1109/CogInfoCom.2015.7390605
Kovacs A, Kiss G, Vicsi K, Winkler I, Coath M. Comparison of skewness-based salient event detector algorithms in speech. In 6th IEEE Conference on Cognitive Infocommunications, CogInfoCom 2015 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2016. p. 285-290. 7390605 https://doi.org/10.1109/CogInfoCom.2015.7390605
Kovacs, Annamaria ; Kiss, Gabor ; Vicsi, Klara ; Winkler, I. ; Coath, Martin. / Comparison of skewness-based salient event detector algorithms in speech. 6th IEEE Conference on Cognitive Infocommunications, CogInfoCom 2015 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2016. pp. 285-290
@inproceedings{8c13c30add224a2fb3f3b5afb60bc415,
title = "Comparison of skewness-based salient event detector algorithms in speech",
abstract = "In this work, we compare two skewness-based salient event detector algorithms, which can detect transients in human speech signals. Speech transients are characterized by rapid changes in signal energy. The purpose of this study was to compare the identification of transients by two different methods based on skewness calculation in order to develop a method to be used in studying the processing of speech transients in the human brain. The first method, the skewness in variable time (SKV) finds transients using a cochlear model. The skewness of the energy distribution for a variable time window is implemented on artificial neural networks. The second method, the automatic segmentation method for transient detection (RoT) is more speech segmentation-based and developed for detecting transient-speech segment ratio in spoken records. In the current study, the test corpus included Hungarian and English speech recorded from different speakers (2 male and 2 female for both languages) Results were compared by the F-measure, the Jaccard similarity index, and the Hamming distance. The results of the two algorithms were also tested against a hand-labeled corpus annotated by linguistic experts for an absolute assessment of the performance of the two methods. Transient detection was tested once for onset events alone and, separately, for onset and offset events together. The results show that in most cases, the RoT method works better on the expert labeled databases. Using F measure with +-25ms window length the following results were obtained when all type of transient events were evaluated: 0,664 on English and 0,834 on Hungarian. Otherwise, the two methods identify the same stimulus features as the transients also coinciding with those hand-labeled by experts.",
keywords = "artificial neural networks, auditory events, auditory feature extraction, automatic speech segmentation, skewness, speech transients",
author = "Annamaria Kovacs and Gabor Kiss and Klara Vicsi and I. Winkler and Martin Coath",
year = "2016",
month = "1",
day = "25",
doi = "10.1109/CogInfoCom.2015.7390605",
language = "English",
isbn = "9781467381291",
pages = "285--290",
booktitle = "6th IEEE Conference on Cognitive Infocommunications, CogInfoCom 2015 - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Comparison of skewness-based salient event detector algorithms in speech

AU - Kovacs, Annamaria

AU - Kiss, Gabor

AU - Vicsi, Klara

AU - Winkler, I.

AU - Coath, Martin

PY - 2016/1/25

Y1 - 2016/1/25

N2 - In this work, we compare two skewness-based salient event detector algorithms, which can detect transients in human speech signals. Speech transients are characterized by rapid changes in signal energy. The purpose of this study was to compare the identification of transients by two different methods based on skewness calculation in order to develop a method to be used in studying the processing of speech transients in the human brain. The first method, the skewness in variable time (SKV) finds transients using a cochlear model. The skewness of the energy distribution for a variable time window is implemented on artificial neural networks. The second method, the automatic segmentation method for transient detection (RoT) is more speech segmentation-based and developed for detecting transient-speech segment ratio in spoken records. In the current study, the test corpus included Hungarian and English speech recorded from different speakers (2 male and 2 female for both languages) Results were compared by the F-measure, the Jaccard similarity index, and the Hamming distance. The results of the two algorithms were also tested against a hand-labeled corpus annotated by linguistic experts for an absolute assessment of the performance of the two methods. Transient detection was tested once for onset events alone and, separately, for onset and offset events together. The results show that in most cases, the RoT method works better on the expert labeled databases. Using F measure with +-25ms window length the following results were obtained when all type of transient events were evaluated: 0,664 on English and 0,834 on Hungarian. Otherwise, the two methods identify the same stimulus features as the transients also coinciding with those hand-labeled by experts.

AB - In this work, we compare two skewness-based salient event detector algorithms, which can detect transients in human speech signals. Speech transients are characterized by rapid changes in signal energy. The purpose of this study was to compare the identification of transients by two different methods based on skewness calculation in order to develop a method to be used in studying the processing of speech transients in the human brain. The first method, the skewness in variable time (SKV) finds transients using a cochlear model. The skewness of the energy distribution for a variable time window is implemented on artificial neural networks. The second method, the automatic segmentation method for transient detection (RoT) is more speech segmentation-based and developed for detecting transient-speech segment ratio in spoken records. In the current study, the test corpus included Hungarian and English speech recorded from different speakers (2 male and 2 female for both languages) Results were compared by the F-measure, the Jaccard similarity index, and the Hamming distance. The results of the two algorithms were also tested against a hand-labeled corpus annotated by linguistic experts for an absolute assessment of the performance of the two methods. Transient detection was tested once for onset events alone and, separately, for onset and offset events together. The results show that in most cases, the RoT method works better on the expert labeled databases. Using F measure with +-25ms window length the following results were obtained when all type of transient events were evaluated: 0,664 on English and 0,834 on Hungarian. Otherwise, the two methods identify the same stimulus features as the transients also coinciding with those hand-labeled by experts.

KW - artificial neural networks

KW - auditory events

KW - auditory feature extraction

KW - automatic speech segmentation

KW - skewness

KW - speech transients

UR - http://www.scopus.com/inward/record.url?scp=84966692166&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84966692166&partnerID=8YFLogxK

U2 - 10.1109/CogInfoCom.2015.7390605

DO - 10.1109/CogInfoCom.2015.7390605

M3 - Conference contribution

SN - 9781467381291

SP - 285

EP - 290

BT - 6th IEEE Conference on Cognitive Infocommunications, CogInfoCom 2015 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -