Skip to Main content Skip to Navigation
Conference papers

Towards phonetic interpretability in deep learning applied to voice comparison

Abstract : A deep convolutional neural network was trained to classify 45 speakers based on spectrograms of their productions of the French vowel /ɑ̃/ Although the model achieved fairly high accuracy – over 85 % – our primary focus here was phonetic interpretability rather than sheer performance. In order to better understand what kind of representations were learned by the model, i) several versions of the model were trained and tested with low-pass filtered spectrograms with a varying cut-off frequency and ii) classification was also performed with masked frequency bands. The resulting decline in accuracy was utilized to spot relevant frequencies for speaker classification and voice comparison, and to produce phonetically interpretable visualizations.
Document type :
Conference papers
Complete list of metadata

Cited literature [22 references]  Display  Hide  Download
Contributor : Emmanuel Ferragne Connect in order to contact the contributor
Submitted on : Monday, December 16, 2019 - 10:58:07 AM
Last modification on : Monday, July 4, 2022 - 9:23:40 AM
Long-term archiving on: : Tuesday, March 17, 2020 - 2:31:10 PM


Files produced by the author(s)


  • HAL Id : halshs-02412948, version 1


Emmanuel Ferragne, Cédric Gendrot, Thomas Pellegrini. Towards phonetic interpretability in deep learning applied to voice comparison. ICPhS, Aug 2019, Melbourne, Australia. pp.ISBN 978-0-646-80069-1. ⟨halshs-02412948⟩



Record views


Files downloads