Clustering of Imperfect Transcripts Using a Novel Similarity Measure: Fid-Bau Portal

Fachinformationsdienst BAUdigital

A platform for research: civil engineering, architecture and urbanism

Clustering of Imperfect Transcripts Using a Novel Similarity Measure

Ibrahimov, Oktay / Sethi, Ishwar / Dimitrova, Nevenka

Abstract There has been a surge of interest in the last several years in methods for automatic generation of content indices for multimedia documents, particularly with respect to video and audio documents. As a result, there is much interest in methods for analyzing transcribed documents from audio and video broadcasts and telephone conversations and messages. The present paper deals with such an analysis by presenting a clustering technique to partition a set of transcribed documents into different meaningful topics. Our method determines the intersection between matching transcripts, evaluates the information contribution by each transcript, assesses the information closeness of overlapping words and calculates similarity based on Chi-square method. The main novelty of our method lies in the proposed similarity measure that is designed to withstand the imperfections of transcribed documents. Experimental results using documents of varying quality of transcription are presented to demonstrate the efficacy of the proposed methodology.

Access

Check availability in my library

Order at Subito €

Page navigation

Document information

Export, share and cite

Clustering of Imperfect Transcripts Using a Novel Similarity Measure

Ibrahimov, Oktay / Sethi, Ishwar / Dimitrova, Nevenka

Abstract There has been a surge of interest in the last several years in methods for automatic generation of content indices for multimedia documents, particularly with respect to video and audio documents. As a result, there is much interest in methods for analyzing transcribed documents from audio and video broadcasts and telephone conversations and messages. The present paper deals with such an analysis by presenting a clustering technique to partition a set of transcribed documents into different meaningful topics. Our method determines the intersection between matching transcripts, evaluates the information contribution by each transcript, assesses the information closeness of overlapping words and calculates similarity based on Chi-square method. The main novelty of our method lies in the proposed similarity measure that is designed to withstand the imperfections of transcribed documents. Experimental results using documents of varying quality of transcription are presented to demonstrate the efficacy of the proposed methodology.

Clustering of Imperfect Transcripts Using a Novel Similarity Measure

Ibrahimov, Oktay / Sethi, Ishwar / Dimitrova, Nevenka

Abstract There has been a surge of interest in the last several years in methods for automatic generation of content indices for multimedia documents, particularly with respect to video and audio documents. As a result, there is much interest in methods for analyzing transcribed documents from audio and video broadcasts and telephone conversations and messages. The present paper deals with such an analysis by presenting a clustering technique to partition a set of transcribed documents into different meaningful topics. Our method determines the intersection between matching transcripts, evaluates the information contribution by each transcript, assesses the information closeness of overlapping words and calculates similarity based on Chi-square method. The main novelty of our method lies in the proposed similarity measure that is designed to withstand the imperfections of transcribed documents. Experimental results using documents of varying quality of transcription are presented to demonstrate the efficacy of the proposed methodology.

Access

Check availability in my library

Order at Subito €

Page navigation

Document information

Export, share and cite

Document information

Title:

Clustering of Imperfect Transcripts Using a Novel Similarity Measure

Contributors:

Ibrahimov, Oktay (author) / Sethi, Ishwar (author) / Dimitrova, Nevenka (author)

Published in:

Information Retrieval Techniques for Speech Applications ; 23-35

Lecture Notes in Computer Science ; 2273

Publication date:

2002-01-01

Size:

13 pages

ISBN:

978-3-540-45637-7 , 978-3-540-43156-5

ISSN:

DOI:

https://doi.org/10.1007/3-540-45637-6_3

Type of media:

Article/Chapter (Book)

Type of material:

Electronic Resource

Language:

English

Keywords:

Automatic Speech Recognition , Multimedia Document , Broadcast News , Content Index , Transcription Error Computer Science , Information Storage and Retrieval , Language Translation and Linguistics

Similar titles

Clustering of Imperfect Transcripts Using a Novel Similarity Measure

Ibrahimov, O. / Sethi, I. / Dimitrova, N. | British Library Conference Proceedings | 2002

A novel similarity measure for pseudo-generalized fuzzy rough sets

Shi, Zhan-hong / Zhang, Ding-hai | British Library Online Contents | 2019

A novel method to measure the semantic similarity of HPO terms

Peng, Jiajie / Xue, Hansheng / Shao, Yukai et al. | British Library Online Contents | 2017

Comparing and Clustering Residential Layouts Using a Novel Measure of Grating Difference

Xiao, Ran | Springer Verlag | 2021

Free access

Fault detection and isolation of DURUMI-II using similarity measure

Park, W. / Lee, S.H. / Song, J. | British Library Online Contents | 2009