M. Ajili, J. Bonastre, W. Ben-kheder, S. Rossato, and J. Kahn, Phonetic content impact on Forensic Voice Comparison, IEEE Spoken Language Technology Workshop (SLT), pp.210-217, 2016.
URL : https://hal.archives-ouvertes.fr/hal-02065374

P. Boersma, Praat, a system for doing phonetics by computer, Glot International, vol.5, issue.9, pp.341-345, 2001.

S. Galliano, E. Geoffrois, D. Mostefa, J. Bonastre, and G. Gravier, ESTER Phase II Evaluation Campaign for the Rich Transcription of French Broadcast News, Proc. Interspeech Lisboa, Portugal, pp.1149-1152, 2005.

C. Gendrot and M. Adda-decker, Impact of duration on F1/F2 formant values of oral vowels: an automatic analysis of large broadcast news corpora in French and German, Proc. Interspeech Lisbon, pp.2453-2456, 2005.
URL : https://hal.archives-ouvertes.fr/halshs-00188096

I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Adaptive computation and machine learning, 2016.

S. Hershey, S. Chaudhuri, D. P. Ellis, J. F. Gemmeke, A. Jansen et al., CNN architectures for largescale audio classification, Proc. ICASSP New Orleans, pp.131-135, 2017.

G. Hinton, L. Deng, D. Yu, G. E. Dahl, . Mohamed et al., Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, IEEE Signal processing magazine, vol.29, issue.6, pp.82-97, 2012.

D. P. Kingma and L. J. Ba, Adam: A Method for Stochastic Optimization, International Conference on Learning Representations (ICLR). arXiv.org, 2015.

W. Koenig, H. K. Dunn, and L. Y. Lacy, The Sound Spectrograph, The Journal of the Acoustical Society of America, vol.18, issue.1, pp.19-49, 1946.

A. Lozano-diez, O. Plchot, P. Matejka, and J. Gonzalez-rodriguez, DNN based embeddings for language recognition, Proc. of ICASSP Calgary, pp.5184-5188, 2018.

P. Mat?jka, O. Glembek, O. Novotn?, O. Plchot, F. Grézl et al., Analysis of dnn approaches to speaker identification, Proc. ICASSP. IEEE p, pp.5100-5104, 2016.

G. S. Morrison, W. Thompson, and C. , Assessing the admissibility of a new generation of forensic voice comparison testimony. Columbia Science and Technology Law Review 18, pp.326-434, 2017.

T. Nagamine, M. L. Seltzer, and N. Mesgarani, Exploring how deep neural networks form phonemic categories, Proc. Interspeech Dresden, pp.1912-1916, 2015.

A. Nagrani, J. S. Chung, and A. Zisserman, Voxceleb: a large-scale speaker identification dataset, Proc. Interspeech, 2017.

T. Pellegrini, Densely connected CNNs for bird audio detection, Proc. EUSIPCO Kos, pp.1784-88, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01913975

T. Pellegrini and S. Mouysset, Inferring phonemic classes from CNN activation maps using clustering techniques, Proc. Interspeech San Franscico, pp.1290-1294, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01474886

K. Simonyan and A. Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition, 2015.

J. O. Smith, accessed 13/11/2018. Spectral audio signal processing, 2011.

D. Snyder, D. Garcia-romero, D. Povey, and S. Khudanpur, Deep neural network embeddings for text-independent speaker verification, Proc. Interspeech Stockholm, pp.999-1003, 2017.

S. V. Stevenage, Drawing a distinction between familiar and unfamiliar voice processing: A review of neuropsychological, clinical and empirical findings, Neuropsychologia, vol.116, pp.162-178, 2018.

M. D. Zeiler and R. Fergus, Visualizing and understanding convolutional networks, Computer Vision -ECCV, vol.8689, pp.818-833, 2014.

B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, Learning deep features for discriminative localization, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Las Vegas. IEEE p, pp.2921-2929, 2016.