Discussions and opinions
For discussions about current topics in reverse engineering and infant development, follow and comment our Synthetic Learner Blog.
The team is involved in teaching, see: ITI-PSL Cognitive Engineering.
The bootphon team develops pipelines for data analysis, speech processing or machine learning and distribute them in an open source format in the bootphon repo on github.
The ABXpy package
This package computes an ABX discrimination score on a large database of tokens encoded with domain-specific features. It assumes that the tokens have particular properties (listed in an item file), and that the features (listed in a feature file) can be compared using a particular distance metric.
The ABX task consists in presenting three tokens: A, B, and X, and deciding wether the distance between A and X is greater or smaller than the distance between B and X. The package runs a systematic evaluation of all of the ABX triplets that match particular constraints, and if the number is too large, it samples a smaller number of them.
This particular triplet tests for the /b/ vs /g/ contrast, across talkers T1 and T2, and the context is the vowel /a/. The package will likewise find all of the triplets of the same general shape.
The code, written in python, has been optimized for multicore processing and can compute the scores of around 1M ABX triplets of spoken word speech features in about 45 min on a 10 cores machine. The implemented distances are the euclidian, cosine and KL distances plus DTW for alignment of tokens with variable length feature matrices as in speech. See Schatz et al. (2013; 2014) for applications with the evaluation of speech features.
This package can be downloaded on github.
Articulation Index upgrade
... under construction ...
Buckeye corpus speech recognition layer
... under construction ...
Other zero resource tools
Here is a list of papers and open source implementations of these papers regarding unsupervised speech learning. This is given for documentation purposes without any warranty that these implementations will actually work or do anything on a new corpus. However, we are very interested in large scale testing and evaluation of these algorithms. Please report to us what you've found.
Discovery of subword units or subword representations
- Discrete units, Bayesian approaches:
- Lee, C. & Glass, J. (2012). A Nonparametric Bayesian Approach to Acoustic Model Discovery, ACL [github]
- Ondel, L., Burget, L., & Cernocky, J. (2016). Variational Inference for Acoustic Unit Discovery. Procedia Computer Science, 81, 80-86. [github]
- Continuous representations, posteriorgrams:
- Chen, H., Leung, C. C., Xie, L., Ma, B., & Li, H. (2015). Parallel inference of dirichlet process gaussian mixture models for unsupervised acoustic modeling: A feasibility study. In Proceedings of Interspeech. [code]
- Michael Heck, Sakriani Sakti, Satoshi Nakamura (2016). Unsupervised Linear Discriminant Analysis for Supporting DPGMM Clustering in the Zero Resource Scenario. Procedia Computer Science, Volume 81, pp73-79. [the code is the same, plus kaldi]
- Continuous representations, DNNs (this requires spoken term discovery):
- Synnaeve, G., Schatz, T., & Dupoux, E. (2014, December). Phonetics embedding learning with side information. In Spoken Language Technology Workshop (SLT), 2014 IEEE (pp. 106-111). IEEE.[github]
- Thiolliere, R., Dunbar, E., Synnaeve, G., Versteegh, M., & Dupoux, E. (2015). A hybrid dynamic time warping-deep neural network architecture for unsupervised acoustic modeling. In Sixteenth Annual Conference of the International Speech Communication Association. [github]
Spoken Term Discovery
- MODIS: Catanese, L., Souviraa-Labastie, N., Qu, B., Campion, S., Gravier, G., Vincent, E., & Bimbot, F. (2013, August). MODIS: an audio motif discovery software. In Show & Tell-Interspeech 2013.[code]
- Jansen, A., & Van Durme, B. (2011, December). Efficient spoken term discovery using randomized algorithms. In Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on (pp. 401-406). IEEE. [github]
- Bayesian approaches (text based):
- Adaptor Grammar: Johnson, M., Griffiths, T. L., & Goldwater, S. (2006). Adaptor grammars: A framework for specifying compositional nonparametric Bayesian models. In Advances in neural information processing systems (pp. 641-648). [website]
- Bayesian approaches (signal based):
- Lee, C., O'Donnell, T., Glass, J. (2015). Unsupervised Lexicon Discovery from Acoustic Input, Transactions of Association for Computational Linguistics (TACL). [github]