See the summary of the main results in:

Publications, modeling studies

  • pdf Ludusan, B., Mazuka, R., Bernard, M., Cristia, A. & Dupoux, E. (2017). The Role of Prosody and Speech Register in Word Segmentation: A Computational Modelling Perspective. In ACL 2017. [abstract] ABSTRACT = This study explores the role of speech register and prosody for the task of word segmentation. Since these two factors are thought to play an important role in early language acquisition, we aim to quantify their contribution for this task. We study a Japanese corpus containing both infant- and adult-directed speech and we apply four different word segmentation models, with and without knowledge of prosodic boundaries. The results showed that the difference between registers is smaller than previously reported and that prosodic boundary information helps more adult- than infant-directed speech.
  • pdf Le Godais, G., Linzen, T. & Dupoux, E. (2017). Comparing character-level neural language models using a lexical decision task. In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics.. [abstract] ABSTRACT = We present in this paper an evaluation of the role of prosodic boundaries in the process of unsupervised word discovery. The tests performed on a corpus of English broadcast news showed that the system precision increases systematically when prosodic boundaries are incorporated, with respect to the baseline. We also investigated whether pauses, a simpler phenomenon to extract automatically, would offer the same advantage, and we discovered that prosodic boundaries offer more information to the word discovery process.

  • pdf Zeghidour, N., Synnaeve, G., Versteegh, M. & Dupoux, E. (2016). A Deep Scattering Spectrum - Deep Siamese Network Pipeline For Unsupervised Acoustic Modeling. In ICASSP-2016, (pp 4965-4969) . [abstract] ABSTRACT = Recent work has explored deep architectures for learning acoustic features in an unsupervised or weakly supervised way for phone recognition. Here we investigate the role of the input features, and in particular we test whether standard mel-scaled filterbanks could be replaced by inherently richer representations, such as derived from an analytic scattering spectrum. We use a Siamese network using lexical side information similar to a well performing architecture used in the Zero Resource Speech Challenge (2015), and show a substantial improvement when the filterbanks are replaced by scattering features, even though these features yield similar performance when tested without training. This shows that unsupervised and weakly-supervised architectures can benefit from richer features than the traditional ones.
  • pdf Zeghidour, N., Synnaeve, G., Usunier, N. & Dupoux, E. (2016). Joint Learning of Speaker and Phonetic Similarities with Siamese Networks. In INTERSPEECH-2016, (pp 1295-1299) . [abstract] ABSTRACT = Recent work has demonstrated, on small datasets, the feasibility of jointly learning specialized speaker and phone embeddings, in a weakly supervised siamese DNN architecture using word and speaker identity as side information. Here, we scale up these architectures to the 360 hours of the Librispeech corpus by implementing a sampling method to efficiently select pairs of words from the dataset and improving the loss function. We also compare the standard siamese networks fed with same (AA) or different (AB) pairs, to a 'triamese' network fed with AAB triplets. We use ABX discrimination tasks to evaluate the discriminability and invariance properties of the obtained joined embeddings, and compare these results with mono-embeddings architectures. We find that the joined embeddings architectures succeed in effectively disentangling speaker from phoneme information, with around 10% errors for the matching tasks and embeddings (speaker task on speaker embeddings, and phone task on phone embedding) and near chance for the mismatched task. Furthermore, the results carry over in out-of-domain datasets, even beating the best results obtained with similar weakly supervised techniques.
  • pdf Versteegh, M., Anguera, X., Jansen, A. & Dupoux, E. (2016). The Zero Resource Speech Challenge 2015: Proposed Approaches and Results. In SLTU-2016 Procedia Computer Science, 81, (pp 67-72) . [abstract] This paper reports on the results of the Zero Resource Speech Challenge 2015, the first unified benchmark for zero resource speech technology, which aims at the unsupervised discovery of subword and word units from raw speech. This paper dis- cusses the motivation for the challenge, its data sets, tasks and baseline systems. We outline the ideas behind the systems that were submitted for the two challenge tracks: unsuper- vised subword unit modeling and spoken term discovery, and summarize their results. The results obtained by participating teams show great promise; many systems beat the provided baselines and some even perform better than comparable su- pervised systems.
  • pdf Synnaeve, G. & Dupoux, E. (2016). A temporal coherence loss function for learning unsupervised acoustic embeddings. In SLTU-2016 Procedia Computer Science, 81, (pp 95-100) . [abstract] ABSTRACT = We train Neural Networks of varying depth with a loss function which imposes the output representations to have a temporal profile which looks like that of phonemes. We show that a simple loss function which maximizes the dissimilarity between near frames and long distance frames helps to construct a speech embedding that improves phoneme discriminability, both within and across speakers, even though the loss function only uses within speaker information. However, with too deep an architecture, this loss function yields overfitting, suggesting the need for more data and/or regularization.
  • pdf Ogawa, T., Mallidi, S.H., Dupoux, E., Cohen, J., Feldman, N. & Hermansky, H. (2016). A new efficient measure for accuracy prediction and its application to multistream-based unsupervised adaptation. In ICPR. [abstract] ABSTRACT = Abstract---A new efficient measure for predicting estimation accuracy is proposed and successfully applied to multistream-based unsupervised adaptation of ASR systems to address data uncertainty when the ground-truth is unknown. The proposed measure is an extension of the M-measure, which predicts confidence in the output of a probability estimator by measuring the divergences of probability estimates spaced at specific time intervals. In this study, the M-measure was extended by considering the latent phoneme information, resulting in an improved reliability. Experimental comparisons carried out in a multistream-based ASR paradigm demonstrated that the extended M-measure yields a significant improvement over the original M-measure, especially under narrow-band noise conditions.
  • pdf Ludusan, B., Cristia, A., Martin, A., Mazuka, R. & Dupoux, E. (2016). Learnability of prosodic boundaries: Is infant-directed speech easier? Journal of the Acoustical Society of America, 140(2), 1239-1250. [abstract] ABSTRACT = This study explores the long-standing hypothesis that the acoustic cues to prosodic boundaries in infant-directed speech (IDS) make those boundaries easier to learn than those in adult-directed speech (ADS). Three cues (pause duration, nucleus duration and pitch change) were investigated, by means of a systematic review of the literature, statistical analyses of a new corpus, and machine learning experiments. The review of previous work revealed that the effect of register on boundary cues is less well established than previously thought, and that results often vary across studies for certain cues. Statistical analyses run on a large database of mother-child and mother-interviewer interactions showed that the duration of a pause and the duration of the syllable nucleus preceding the boundary are two cues which are enhanced in IDS, while f0 change is actually degraded in IDS. Supervised and unsupervised machine learning techniques applied to these acoustic cues revealed that IDS boundaries were consistently better classified than ADS ones, regardless of the learning method used. The role of the cues examined in this study and the importance of these findings in the more general context of early linguistic structure acquisition is discussed.
  • pdf Ludusan, B. & Dupoux, E. (2016). The role of prosodic boundaries in word discovery: Evidence from a computational model. Journal of the Acoustical Society of America, 140(1), EL1. [abstract] ABSTRACT = This study aims to quantify the role of prosodic boundaries in early language acquisition using a computational modeling approach. A spoken term discovery system that models early word learning was used with and without a prosodic component on speech corpora of English, Spanish, and Japanese. The results showed that prosodic information induces a consistent improvement both in the alignment of the terms to actual word boundaries and in the phonemic homogeneity of the discovered clusters of terms. This benefit was found also when automatically discovered prosodic boundaries were used, boundaries which did not perfectly match the linguistically defined ones.
  • pdf Ludusan, B. & Dupoux, E. (2016). Automatic syllable segmentation using broad phonetic class information. In SLTU-2016 Procedia Computer Science, 81, (pp 101-106) . [abstract] ABSTRACT = We propose in this paper a language-independent method for syllable segmentation. The method is based on the Sonor- ity Sequencing Principle, by which the sonority inside a syl- lable increases from its boundaries towards the syllabic nu- cleus. The sonority function employed was derived from the posterior probabilities of a broad phonetic class recognizer, trained with data coming from an open-source corpus of En- glish stories. We tested our approach on English, Spanish and Catalan and compared the results obtained to those given by an energy-based system. The proposed method outperformed the energy-based system on all three languages, showing a good generalizability to the two unseen languages. We con- clude with a discussion of the implications of this work for under-resourced languages.
  • pdf Linzen, T., Dupoux, E. & Spector, B. (2016). Quantificational features in distributional word representations. In Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics, (pp pages 1 -- 1-11) . [abstract] ABSTRACT = We present in this paper an evaluation of the role of prosodic boundaries in the process of unsupervised word discovery. The tests performed on a corpus of English broadcast news showed that the system precision increases systematically when prosodic boundaries are incorporated, with respect to the baseline. We also investigated whether pauses, a simpler phenomenon to extract automatically, would offer the same advantage, and we discovered that prosodic boundaries offer more information to the word discovery process.
  • pdf Linzen, T., Dupoux, E. & Goldberg, Y. (2016). Assessing the ability of LSTMs to learn syntax-sensitive dependencies. Transactions of the Association for Computational Linguistics, 4, 521-535. [abstract] ABSTRACT = We present in this paper an evaluation of the role of prosodic boundaries in the process of unsupervised word discovery. The tests performed on a corpus of English broadcast news showed that the system precision increases systematically when prosodic boundaries are incorporated, with respect to the baseline. We also investigated whether pauses, a simpler phenomenon to extract automatically, would offer the same advantage, and we discovered that prosodic boundaries offer more information to the word discovery process.
  • pdf Fourtassi, A. & Dupoux, E. (2016). The role of word-word co-occurrence in word learning. In Proceedings of the 38th Annual Conference of the Cognitive Science Society, (pp 662-667) . [abstract] ABSTRACT = A growing body of research on early word learning suggests that learners gather word-object co-occurrence statistics across learning situations. Here we test a new mechanism whereby learners are also sensitive to word-word co-occurrence statistics. Indeed, we find that participants can infer the likely referent of a novel word based on its co-occurrence with other words, in a way that mimics a machine learning algorithm dubbed `zero-shot learning'. We suggest that the interaction between referential and distributional regularities can bring robustness to the process of word acquisition.
  • pdf Dunbar, E. & Dupoux, E. (2016). Geometric constraints on human speech sound inventories. Frontiers in Psychology, 7(1061). [abstract] We investigate the idea that the languages of the world have developed coherent sound systems in which having one sound increases or decreases the chances of having certain other sounds, depending on shared properties of those sounds. We investigate the geometries of sound systems that are defined by the inherent properties of sounds. We document three typological tendencies in sound system geometries: economy, a tendency for the differences between sounds in a system to be definable on a relatively small number of independent dimensions; local symmetry, a tendency for sound systems to have relatively large numbers of pairs of sounds that differ only on one dimension; and global symmetry, a tendency for sound systems to be relatively balanced. The finding of economy corroborates previous results; the two symmetry properties have not been previously documented. We also investigate the relation between the typology of inventory geometries and the typology of individual sounds, showing that the frequency distribution with which individual sounds occur across languages works in favour of both local and global symmetry.
  • pdf Carbajal, J., Fér, R. & Dupoux, E. (2016). Modeling language discrimination in infants using i-vector representations. In Proceedings of the 38th Annual Conference of the Cognitive Science Society, (pp 889-896) . [abstract] ABSTRACT = Experimental research suggests that at birth infants can discriminate two languages if they belong to different rhythmic classes, and by 4 months of age they can discriminate two languages within the same class provided they have been previously exposed to at least one of them. In this paper, we present a novel application of speech technology tools to model language discrimination, which may help to understand how infants achieve this task. By combining a Gaussian Mixture Model of the acoustic space and low-dimensional representations of novel utterances with a model of a habituation paradigm, we show that brief exposure to French does not allow to discriminate between two previously unheard languages belonging to the same rhythmic class, but allows to discriminate two languages across rhythmic class. The implications of these findings are discussed.
  • pdf Carbajal, J., Dawud, A., Thiollière, R. & Dupoux, E. (2016). The 'Language Filter' Hypothesis: Modeling Language Separation in Infants using I-vectors. In EPIROB 2016, (pp 195-201) . [abstract] ABSTRACT = Experimental research suggests that at birth infants can discriminate two languages if they belong to different rhythmic classes, and by 4 months of age they can discriminate two languages within the same class provided they have been previously exposed to at least one of them. In this paper, we present a novel application of speech technology tools to model language discrimination, which may help to understand how infants achieve this task. By combining a Gaussian Mixture Model of the acoustic space and low-dimensional representations of novel utterances with a model of a habituation paradigm, we show that brief exposure to French does not allow to discriminate between two previously unheard languages belonging to the same rhythmic class, but allows to discriminate two languages across rhythmic class. The implications of these findings are discussed.
  • pdf Bergmann, C., Cristia, A. & Dupoux, E. (2016). Discriminability of sound contrasts in the face of speaker variation quantified. In Proceedings of the 38th Annual Conference of the Cognitive Science Society, (pp 1331-1336) . [abstract] ABSTRACT = How does a naive language learner deal with speaker variation irrelevant to distinguish word meanings? Experimental data is conflicting and incompatible models have been proposed. In this paper we examine the basic assumptions of these models regarding the signal the learner deals with: Is speaker variability a hurdle in discriminating sounds or can it easily be abstracted over? To this end we summarize existing infant data and compare them to machine-based discriminability scores of sound pairs obtained without added language knowledge. Our results show consistently that speaker variability decreases sound contrast discriminability, and that some pairs are affected more than others. Further, chance performance is a rare exception; contrasts remain discriminable in the face of speaker variation. Our data offer a way to reunite seemingly conflicting findings in the infant literature and show a path forward in testing whether and how speaker variation plays a role for language acquisition.
  • pdf Versteegh, M., Thiollière, R., Schatz, T., Cao, X.N., Anguera, X., Jansen, A. & Dupoux, E. (2015). The Zero Resource Speech Challenge 2015. In INTERSPEECH-2015, (pp 3169-3173) . [abstract] ABSTRACT = The Interspeech 2015 Zero Resource Speech Challenge aims at discovering subword and word units from raw speech. The challenge provides the first unified and open source suite of evaluation metrics and data sets to compare and analyse the results of unsupervised linguistic unit discovery algorithms. It consists of two tracks. In the first, a psychophysically inspired evaluation task (minimal pair ABX discrimination) is used to assess how well speech feature representations discriminate between contrastive subword units. In the second, several metrics gauge the quality of discovered word-like patterns. Two data sets are provided, one for English, one for Xitsonga. Both data sets are provided without any annotation except for voice activity and talker identity. This paper introduces the evaluation metrics, presents the results of baseline systems and discusses some of the key issues in unsupervised unit discovery.
  • pdf Thiollière, R., Dunbar, E., Synnaeve, G., Versteegh, M. & Dupoux, E. (2015). A Hybrid Dynamic Time Warping-Deep Neural Network Architecture for Unsupervised Acoustic Modeling. In INTERSPEECH-2015, (pp 3179-3183) . [abstract] ABSTRACT = We report on an architecture for the unsupervised discovery of talker-invariant subword embeddings. It is made out of two components: a dynamic-time warping based spoken term discovery (STD) system and a Siamese deep neural network (DNN). The STD system clusters word-sized repeated fragments in the acoustic streams while the DNN is trained to minimize the distance between time aligned frames of tokens of the same cluster, and maximize the distance between tokens of different clusters. We use additional side information regarding the average duration of phonemic units, as well as talker identity tags. For evaluation we use the datasets and metrics of the Zero Resource Speech Challenge. The model shows improvement over the baseline in subword unit modeling.
  • pdf Synnaeve, G. & Dupoux, E. (2015). Weakly Supervised Multi-Embeddings Learning of Acoustic Models. In ICLR Workshop, (pp ArXiv 1412.6645 [cs.SD]) . [abstract] ABSTRACT = We trained a Siamese network with multi-task same/different information on a speech dataset, and found that it was possible to share a network for both tasks without a loss in performance. The first task was to discriminate between two same or different words, and the second was to discriminate between two same or different talkers.
  • pdf Michon, E., Dupoux, E. & Cristia, A. (2015). Salient dimensions in implicit phonotactic learning. In INTERSPEECH-2015, (pp 2665-2669) . [abstract] ABSTRACT = Adults are able to learn sound co-occurrences without conscious knowledge after brief exposures. But which dimensions of sounds are most salient in this process? Using an artificial phonology paradigm, we explored potential learnability differences involving consonant-, speaker-, and tone-vowel co-occurrences. Results revealed that participants, whose native language was not tonal, implicitly encoded consonant-vowel patterns with a high level of accuracy; were above chance for tone-vowel co-occurrences; and were at chance for speaker-vowel co-occurrences. This pattern of results is exactly what would be expected if both language-specific experience and innate biases to encode potentially contrastive linguistic dimensions affect the salience of different dimensions during implicit learning of sound patterns.
  • pdf Martin, A., Schatz, T., Versteegh, M., Miyazawa, K., Mazuka, R., Dupoux, E. & Cristia, A. (2015). Mothers speak less clearly to infants: A comprehensive test of the hyperarticulation hypothesis. Psychological Science, 26(3), 341-347. [abstract] ABSTRACT = Infants learn language at an incredible speed, and one of the first steps in this voyage includes learning the basic sound units of their native language. It is widely thought that caregivers facilitate this task by hyperarticulating when speaking to their infants. Utilizing state-of-the-art speech technology, we address this key theoretical question: Are sound categories clearer in infant- than in adult-directed speech? A comprehensive examination of sound contrasts in a large corpus of spontaneous Japanese demonstrates that there is a small but significant tendency for contrasts in infant-directed speech to be less clear than those in adult-directed speech, contrary to the idea that caregivers actively enhance phonetic categories in infant-directed speech. These results suggest that the ability to learn from noisy data must be a crucial component of plausible theories of infant language acquisition.
  • pdf Ludusan, B., Synnaeve, G. & Dupoux, E. (2015). Prosodic boundary information helps unsupervised word segmentation. In NAACL HLT 2015, (pp 953-963) .
  • pdf Ludusan, B., Seidl, A., Dupoux, E. & Cristia, A. (2015). Motif discovery in infant- and adult-directed speech. In Proceedings of CogACLL2015, (pp 93-102) . [abstract] ABSTRACT = Infant-directed speech (IDS) is thought to play a key role in determining infant language acquisition. It is thus important to describe to what extent it differs from adult-directed speech (ADS) in dimensions that could affect learnability. In this paper, we explore how an acoustic motif discovery algorithm fares when presented with spontaneous speech from both registers. Results show small but significant differences in performance, with lower recall and higher fragmentation in IDS than ADS. Such a result is inconsistent with a view of IDS where clarity and ease of lexical recognition is a primary consideration. Additionally, it predicts that learners who extract acoustic word-forms should do worse with IDS than ADS. Similarities and differences with human infants' performance on word segmentation tasks are discussed.
  • pdf Ludusan, B., Origlia, A. & Dupoux, E. (2015). Rhythm-Based Syllabic Stress Learning without Labelled Data. In Proceedings of Statistical Language and Speech Processing -SLSP 2015, (pp 185-196) . [abstract] ABSTRACT = In this paper we propose a method for syllabic stress annotation which does not require manual labels for the learning process, but uses stress labels automatically generated from a multiscale model of rhythm perception. The model gives in its output a sequence of events, corresponding the sequences of strong-weak syllables present in speech, based on which a stressed/unstressed decision is taken. We tested our approach on two languages, Catalan and Spanish, and we found that a supervised system employing the automatic labels for learning improves the performance over the baseline, for both languages. We also compared the results of this system with that of an identical learning algorithm, but which employs manual labels for stress, as well as to that of an unsupervised learning algorithm using the same features. It showed that the system using automatic labels has a similar performance to the one using manual labels, with both supervised systems outperforming the clustering algorithm.
  • pdf Ludusan, B., Caranica, A., Cucu, H., Buzo, A., Burileanu, C. & Dupoux, E. (2015). Exploring multi-language resources for unsupervised spoken term discovery. In Speech Technology and Human-Computer Dialogue (SpeD), 2015 International Conference on, (pp 1-6) . [abstract] With information processing and retrieval of spoken documents becoming an important topic, there is a need of systems performing automatic segmentation of audio streams. Among such algorithms, spoken term discovery allows the extraction of word-like units (terms) directly from the continuous speech signal, in an unsupervised manner and without any knowledge of the language at hand. Since the performance of any downstream application depends on the goodness of the terms found, it is relevant to try to obtain higher quality automatic terms. In this paper we investigate whether the use input features derived from of multi-language resources helps the process of term discovery. For this, we employ an open-source phone recognizer to extract posterior probabilities and phone segment decisions, for several languages. We examine the features obtained from a single language and from combinations of languages based on the spoken term discovery results attained on two different datasets of English and Xitsonga. Furthermore, a comparison to the results obtained with standard spectral features is performed and the implications of the work discussed.
  • pdf Ludusan, B. & Dupoux, E. (2015). A multilingual study on intensity as a cue for marking prosodic boundaries. In ICPhS, (pp e982) . [abstract] ABSTRACT = Speech intensity is one of the main prosodic cues, playing a role in most of the suprasegmental phenomena. Despite this, its contribution to the signalling of prosodic hierarchy is still relatively under-studied, compared to the other cues, like duration or fundamental frequency. We present here an investigation on the role of intensity in prosodic boundary detection in four different languages, by testing several intensity measures. The statistical analysis performed showed significant correlates of prosodic boundaries, for most intensity measures employed and in all languages. Our findings were further validated with a classification experiment in which the boundary/non-boundary distinction was learned in unsupervised manner, using only intensity cues. It showed that intensity range measures outperform absolute intensity measures, with the total intensity range being consistently the best feature.
  • pdf Johnson, M., Pater, J., Staub, R. & Dupoux, E. (2015). Sign constraints on feature weights improve a joint model of word segmentation and phonology. In NAACL HLT 2015, (pp 303-313) . [abstract] ABSTRACT = This paper describes a joint model of word segmentation and phonological alternations, which takes unsegmented utterances as input and infers word segmentations and underlying phonological representations. The model is a Maximum Entropy or log-linear model, which can express a probabilistic version of Optimality Theory (OT; Prince and Smolensky, 2004), a standard phonological framework. The features in our model are inspired by OT's Markedness and Faithfulness constraints. Following the OT principle that such features indicate ``violations'', we require their weights to be non-positive. We apply our model to a modified version of the Buckeye corpus (Pitt et al., 2007) in which the only phonological alternations are deletions of word-final /d/ and /t/ segments. The model sets a new state-of-the-art for this corpus for word segmentation, identification of underlying forms, and identification of /d/ and /t/ deletions. We also show that the OT-inspired sign constraints on feature weights are crucial for accurate identification of deleted /d/s; without them our model posits approximately 10 times more deleted underlying /d/s than appear in the manually annotated data.
  • pdf Hermansky, H., Burget, L., Cohen, J., Dupoux, E., Feldman, N., Godfrey, J., Khudanpur, S., Maciejewski, M., Mallidi, S.H., Menon, A., Ogawa, T., Peddinti, V., Rose, R., Stern, R., Wiesner, M. & Vesely, K. (2015). Towards machines that know when they do not know: Summary of work done at 2014 Frederick Jelinek memorial workshop in Prague. In ICASSP-2015 (IEEE International Conference on Acoustics Speech and Signal Processing), (pp 5009-5013) . [abstract] ABSTRACT = A group of junior and senior researchers gathered as a part of the 2014 Frederick Jelinek Memorial Workshop in Prague to address the problem of predicting the accuracy of a nonlinear Deep Neural Network probability estimator for unknown data in a different application domain from the domain in which the estimator was trained. The paper describes the problem and summarizes approaches that were taken by the group.
  • pdf Fourtassi, A. (2015). Acquiring phonemes with early semantics..
  • pdf Dunbar, E., Synnaeve, G. & Dupoux, E. (2015). Quantitative methods for comparing featural representations. In ICPhS, (pp paper number 1024) . [abstract] ABSTRACT = The basic representational hypothesis in phonology is that segments are coded using a universal set of discrete features. We propose a method for quantitatively measuring how well such features align with arbitrary segment representations. We assess articulatory, spectral, and phonotactic representations of English consonants. Our procedure constructs a concrete representation of a feature in terms of the pairs it distinguishes, and can be extended to any pair of representations to test the consistency of one with the individual dimensions of the other. We validate the method on our phonetic representations and then show that major natural classes are not well represented in the surface phonotactics.
  • pdf Synnaeve, G., Versteegh, M. & Dupoux, E. (2014). Learning words from images and speech. In NIPS Workshop on Learning Semantics. [abstract] ABSTRACT = The Interspeech 2015 Zero Resource Speech Challenge aims at discovering subword and word units from raw speech. The challenge provides the first unified and open source suite of evaluation metrics and data sets to compare and analyse the results of unsupervised linguistic unit discovery algorithms. It consists of two tracks. In the first, a psychophysically inspired evaluation task (minimal pair ABX discrimination) is used to assess how well speech feature representations discriminate between contrastive subword units. In the second, several metrics gauge the quality of discovered word-like patterns. Two data sets are provided, one for English, one for Xitsonga. Both data sets are provided without any annotation except for voice activity and talker identity. This paper introduces the evaluation metrics, presents the results of baseline systems and discusses some of the key issues in unsupervised unit discovery.
  • pdf Synnaeve, G., Schatz, T. & Dupoux, E. (2014). Phonetics embedding learning with side information. In IEEE Spoken Language Technology Workshop, (pp 106 - 111) . [abstract] We show that it is possible to learn an efficient acoustic model using only a small amount of easily available word-level similarity nnotations. In contrast to the detailed phonetic label- ing required by classical speech recognition technologies, the only information our method requires are pairs of speech ex- cerpts which are known to be similar (same word) and pairs of speech excerpts which are known to be different (different words). An acoustic model is obtained by training shallow and deep neural networks, using an architecture and a cost function well-adapted to the nature of the provided informa- tion. The resulting model is evaluated on an ABX minimal- pair discrimination task and is shown to perform much better (11.8% ABX error rate) than raw speech features (19.6%), not far from a fully supervised baseline (best neural network: 9.2%, HMM-GMM: 11%).
  • pdf Synnaeve, G., Dautriche, I., Boerschinger, B., Johnson, M. & Dupoux, E. (2014). Unsupervised word segmentation in context. In Proceedings of 25th International Conference on Computational Linguistics (CoLing), (pp 2326-2334) . [abstract] ABSTRACT = This paper extends existing word segmentation models to take non-linguistic context into account. It improves the token F-score of well-performing segmentation models by 2.5% on a 27k utterances dataset. We posit that word segmentation is easier in-context because the learner is not trying to access irrelevant lexical items. We use topics from Latent Dirichlet Allocation as a proxy for activities context, to label the Providence corpus. We present Adaptor Grammar models that use these context labels, and we study their performance with and without context annotations at test time.
  • pdf Schatz, T., Peddinti, V., Cao, X.N., Bach, F., Hynek, H. & Dupoux, E. (2014). Evaluating speech features with the Minimal-Pair ABX task (II): Resistance to noise. In INTERSPEECH-2014, (pp 915-919) . [abstract] ABSTRACT = The Minimal-Pair ABX (MP-ABX) paradigm has been proposed as a method for evaluating speech features for zero-resource/unsupervised speech technologies. We apply it in a phoneme discrimination task on the Articulation Index corpus to evaluate the resistance to noise of various speech features. In Experiment 1, we evaluate the robustness to additive noise at different signal-to-noise ratios, using car and babble noise from the Aurora-4 database and white noise. In Experiment 2, we examine the robustness to different kinds of convolutional noise. In both experiments we consider two classes of techniques to induce noise resistance: smoothing of the time-frequency representation and short-term adaptation in the time-domain. We consider smoothing along the spectral axis (as in PLP) and along the time axis (as in FDLP). For short-term adaptation in the time-domain, we compare the use of a static compressive non-linearity followed by RASTA filtering to an adaptive compression scheme.
  • pdf Ludusan, B., Versteegh, M., Jansen, A., Gravier, G., Cao, X.N., Johnson, M. & Dupoux, E. (2014). Bridging the gap between speech technology and natural language processing: an evaluation toolbox for term discovery systems. In Proceedings of LREC 2014, (pp 560-567) . [abstract] ABSTRACT = The unsupervised discovery of linguistic terms from either continuous phoneme transcriptions or from raw speech has seen an increasing interest in the past years both from a theoretical and a practical standpoint. Yet, there exists no common accepted evaluation method for the systems performing term discovery. Here, we propose such an evaluation toolbox, drawing ideas from both speech technology and natural language processing. We first transform the speech-based output into a symbolic representation and compute five types of evaluation metrics on this representation: the quality of acoustic matching, the quality of the clusters found, and the quality of the alignment with real words (type, token, and boundary scores). We tested our approach on two term discovery systems taking speech as input, and one using symbolic input. The latter was run using both the gold transcription and a transcription obtained from anautomatic speech recognizer, in order to simulate the case when only imperfect symbolic information is available. The results obtained are analysed through the use of the proposed evaluation metrics and the implications of these metrics are discussed.
  • pdf Ludusan, B., Gravier, G. & Dupoux, E. (2014). Incorporating Prosodic Boundaries in Unsupervised Term Discovery. In Proceedings of Speech Prosody, 7, (pp 939-943) . [abstract] We present a preliminary investigation on the usefulness of prosodic boundaries for unsupervised term discovery (UTD). Studies in language acquisition show that infants use prosodic boundaries to segment continuous speech into word-like units. We evaluate whether such a strategy could also help UTD algorithms. Running a previously published UTD algorithm (MODIS) on a corpus of prosodically annotated English broadcast news revealed that many discovered terms straddle prosodic boundaries. We then implemented two variants of this algorithm: one that discards straddling items and one that truncates them to the nearest boundary (either prosodic or pause marker). Both algorithms showed a better term matching Fscore compared to the baseline and higher level prosodic boundaries were found to be better than lower level boundaries or pause markers. In addition, we observed that the truncation algorithm, but not the discard algorithm, increased word boundary F-score over the baseline.
  • pdf Ludusan, B. & Dupoux, E. (2014). Towards Low Resource Prosodic Boundary Detection. In Proceedings of International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU'14), (pp 231-237) . [abstract] ABSTRACT = In this study we propose a method of prosodic boundary detection based only on acoustic cues which are easily extractable from the speech signal and without any supervision. Drawing a parallel between the process of language acquisition in babies and the speech processing techniques for under-resourced languages, we take advantage of the findings of several psycholinguistic studies relative to the cues used by babies for the identification of prosodic boundaries. Several durational and pitch cues were investigated, by themselves or in a combination, and relatively good performances were achieved. The best result obtained, a combination of all the cues, compares well against a previously proposed approach, without relying on any learning method or any lexical or syntactic cues.
  • pdf Johnson, M., Christophe, A., Demuth K.D., & Dupoux, E. (2014). Modelling function words improves unsupervised word segmentation. In Proceedings of the 52nd Annual meeting of the ACL, (pp 282--292) . [abstract] ABSTRACT = Inspired by experimental psychological findings suggesting that function words play a special role in word learning, we make a simple modification to an Adaptor Grammar based Bayesian word segmentation model to allow it to learn sequences of monosyllabic "function words" at the beginnings and endings of collocations of (possibly multi-syllabic) words. This modification improves unsupervised word segmentation on the standard Bernstein-Ratner (1987) corpus of child-directed English by more than 4% token f-score compared to a model identical except that it does not special-case "function words", setting a new state-of-the-art of 92.4% token f-score. Our function word model assumes that function words appear at the left periphery, and while this is true of languages such as English, it is not true universally. We show that a learner can use Bayesian model selection to determine the location of function words in their language, even though the input to the model only consists of unsegmented sequences of phones. Thus our computational models support the hypothesis that function words play a special role in word learning.
  • pdf Johnson, M. & Börschinger, B. (2014). Exploring the Role of Stress in Bayesian Word Segmentation using Adaptor Grammars. In Transactions of the Association for Computational Linguistics-2014, 2(Feb), (pp 93-104) . [abstract] Stress has long been established as a major cue in word segmentation for English infants. We show that enabling a current state-of-the-art Bayesian word segmentation model to take advantage of stress cues noticeably improves its performance. We find that the improvements range from 10 to 4%, depending on both the use of phonotactic cues and, to a lesser extent, the amount of evidence available to the learner. We also find that in particular early on, stress cues are much more useful for our model than phonotactic cues by themselves, consistent with the finding that children do seem to use stress cues before they use phonotactic cues. Finally, we study how the model's knowledge about stress patterns evolves over time. We not only find that our model correctly acquires the most frequent patterns relatively quickly but also that the Unique Stress Constraint that is at the heart of a previously proposed model does not need to be built in but can be acquired jointly with word segmentation.
  • pdf Fourtassi, A., Schatz, T., Varadarajan, B. & Dupoux, E. (2014). Exploring the Relative Role of Bottom-up and Top-down Information in Phoneme Learning. In Proceedings of the 52nd Annual meeting of the ACL, 2, (pp 1-6) Association for Computational Linguistics. [abstract] We test both bottom-up and top-down approaches in learning the phonemic status of the sounds of English and Japanese. We used large corpora of spontaneous speech to provide the learner with an input that models both the linguistic properties and statistical regularities of each language. We found both approaches to help discriminate between allophonic and phonemic contrasts with a high degree of accuracy, although top-down cues proved to be effective only on an interesting subset of the data. cues based of the properties of the lexicon. We test their performance in a task that consists on discriminating within category contrasts from between category contrasts. Finally we discuss the role and scope of each approach in learning phonemes.
  • pdf Fourtassi, A., Dunbar, E. & Dupoux, E. (2014). Self Consistency as an Inductive Bias in Early Language Acquisition. In Proceedings of the 36th Annual Meeting of the Cognitive Science Society, (pp 469-474) . [abstract] ABSTRACT = In this paper we introduce an inductive bias for language acquisition. It is based on a holistic approach, whereby the levels of representations are not treated in isolation, but as different interacting parts. The best representation of the sound system is the one that leads to the best lexicon, defined as the one that sustains the most coherent semantics. We quantify this coherence through an intrinsic and unsupervised measure called "Self Consistency". We found this measure to be optimal under the true phonemic inventory and the correct word segmentation in English and Japanese.
  • pdf Fourtassi, A. & Dupoux, E. (2014). A Rudimentary Lexicon and Semantics Help Bootstrap Phoneme Acquisition. In Proceedings of the 18th Conference on Computational Natural Language Learning (CoNLL), (pp 191-200) Association for Computational Linguistics. [abstract] Infants spontaneously discover the relevant phonemes of their language without any direct supervision. This acquisition is puzzling because it seems to require the availability of high levels of linguistic structures (lexicon, semantics), that logically suppose the infants having a set of phonemes already. We show how this circularity can be broken by testing, in real-size language corpora, a scenario whereby infants would learn approximate representations at all levels, and then refine them in a mutual constraining way. We start with corpora of spontaneous speech that have been encoded in a varying number of detailed context-dependent allophones. We derive an approximate lexicon and a rudimentary semantic representation. Despite the fact that all these representations are poor approximations of the ground truth, they help reorganize the fine grained categories into phoneme-like categories with a ahigh degree of accuracy.
  • pdf Dupoux, E. (2014). Towards Quantitative Studies of Early Cognitive Development. Autonomous Mental Development Technical Committee Newsletter, 11(1), 10-11. [abstract] ABSTRACT = We present a new framework for the evaluation of speech representations in zero-resource settings, that extends and complements previous work by Carlin, Jansen and Hermansky [1]. In particular, we replace their Same/Different discrimination task by several Minimal-Pair ABX (MP-ABX) tasks. We explain the analytical advantages of this new framework and apply it to decompose the standard signal processing pipelines for computing PLP and MFC coefficients. This method enables us to confirm and quantify a variety of well-known and not-so-well-known results in a single framework.
  • pdf Synnaeve, G. & Dupoux, E. (2013). In Depth Deep Beliefs Networks for Phone Recognition. In Poster presented in NIPS-2013.
  • pdf Schatz, T., Peddinti, V., Bach, F., Jansen, A., Hynek, H. & Dupoux, E. (2013). Evaluating speech features with the Minimal-Pair ABX task: Analysis of the classical MFC/PLP pipeline. In INTERSPEECH-2013, (pp 1781-1785) . [abstract] ABSTRACT = We present a new framework for the evaluation of speech representations in zero-resource settings, that extends and complements previous work by Carlin, Jansen and Hermansky [1]. In particular, we replace their Same/Different discrimination task by several Minimal-Pair ABX (MP-ABX) tasks. We explain the analytical advantages of this new framework and apply it to decompose the standard signal processing pipelines for computing PLP and MFC coefficients. This method enables us to confirm and quantify a variety of well-known and not-so-well-known results in a single framework.
  • Ontanon, S., Synnaeve, G., Uriarte, A., Richoux, F., Churchill, D. & Preuss, M. (2013). A Survey of Real-Time Strategy Game AI Research and Competition in StarCraft. In Computational Intelligence and AI in Games, IEEE Transactions on, 5(4), (pp 293-311) .
  • pdf Martin, A., Peperkamp, S. & Dupoux, E. (2013). Learning Phonemes with a Proto-lexicon. Cognitive Science, 37, 103-124. [abstract] ABSTRACT = Before the end of the first year of life, infants begin to lose the ability to perceive distinctions between sounds that are not phonemic in their native language. It is typically assumed that this developmental change reflects the construction of language-specific phoneme categories, but how these categories are learned largely remains a mystery. Peperkamp, Le Calvez, Nadal, & Dupoux (2006) present an algorithm that can discover phonemes using the distributions of allophones as well as the phonetic properties of the allophones and their contexts. We show that a third type of information source, the occurrence of pairs of minimally-differing word forms in speech heard by the infant, is also useful for learning phonemic categories, and is in fact more reliable than purely distributional information in data containing a large number of allophones. In our model, learners build an approximation of the lexicon consisting of the high-frequency n-grams present in their speech input, allowing them to take advantage of top-down lexical information without needing to learn words. This may explain how infants have already begun to exhibit sensitivity to phonemic categories before they have a large receptive lexicon.
  • pdf Jansen, A., Dupoux, E., Goldwater, S., Johnson, M., Khudanpur, S., Church, K., Feldman, N., Hermansky, H., Metze, F., Rose, R., Seltzer, M., Clark, P., McGraw, I., Varadarajan, B., Bennett, E., Borschinger, B., Chiu, J., Dunbar, E., Fourtassi, A., Harwath, D., Lee, C.y., Levin, K., Norouzian, A., Peddinti, V., Richardson, R., Schatz, T. & Thomas, S. (2013). A summary of the 2012 JH CLSP Workshop on zero resource speech technologies and models of early language acquisition. In ICASSP-2013 (IEEE International Conference on Acoustics Speech and Signal Processing), (pp 8111-8115) . [abstract] ABSTRACT = We summarize the accomplishments of a multi-disciplinary workshop exploring the computational and scientific issues surrounding zero resource (unsupervised) speech technologies and related models of early language acquisition. Centered around the tasks of phonetic and lexical discovery, we consider unified evaluation metrics, present two new approaches for improving speaker independence in the absence of supervision, and evaluate the application of Bayesian word segmentation algorithms to automatic subword unit tokenizations. Finally, we present two strategies for integrating zero resource techniques into supervised settings, demonstrating the potential of unsupervised methods to improve mainstream technologies.
  • pdf Fourtassi, A., Boerschinger, B., Johnson, M. & Dupoux, E. (2013). WhyisEnglishsoeasytosegment. In Proceedings of the 4th Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2013), (pp 1-10) . [abstract] ABSTRACT = Cross-linguistic studies on unsupervised word segmentation have consistently shown that English is easier to segment than other languages. In this paper, we propose an explanation based on the notion of segmentation ambiguity. We show that English has a very low segmentation ambiguity compared to Japanese and that this difference correlates with the segmentation performance in a unigram model. We suggest that segmentation ambiguity is linked to a trade-off between syllable structure complexity and word length distribution.
  • pdf Fourtassi, A. & Dupoux, E. (2013). A corpus-based evaluation method for Distributional Semantic Models. In Proceedings of ACL-SRW 2013, (pp 165-171) . [abstract] ABSTRACT = Evaluation methods for Distributional Semantic Models typically rely on behaviorally derived gold standards. These methods are difficult to deploy in languages with scarce linguistic/behavioral resources. We introduce a corpus-based measure that evaluates the stability of the lexical semantic similarity space using a pseudo-synonym same-different detection task and no external resources. We show that it enables to predict two behavior-based measures across a range of parameters in a Latent Semantic Analysis model.
  • pdf Dupoux, E., Beraud-Sudreau, G. & Sagayama, S. (2011). Templatic features for modeling phoneme acquisition. In Proceedings of the 33rd Annual Conference of the Cognitive Science Society, Boston, Mass.. [abstract] We describe a model for the coding of speech sounds into a high dimensional space. This code is obtained by computing the similarity between speech sounds and stored syllable-sized templates. We show that this code yields a better linear separation of phonemes than the standard MFCC code. Additional experiments show that the code is tuned to a particular language, and is able to use temporal cues for the purpose of phoneme recognition. Optimal templates seem to correspond to chunks of speech of around 120ms containing transitions between phonemes or syllables.
  • pdf Boruta, L. (2011). Combining Indicators of Allophony. In Proceedings ACL-SRW, (pp 88-93) .
  • pdf Boruta, L., Peperkamp, S., Crabbé, B. & Dupoux, E. (2011). Testing the robustness of online word segmentation: effects of linguistic diversity and phonetic variation. In Proceedings of the 2011 Workshop on Cognitive Modeling and Computational Linguistics, ACL, 1-9, Portland, Oregon. [abstract] Models of the acquisition of word segmentation are typically evaluated using phonemically transcribed corpora. Accordingly, they implicitly assume that children know how to undo phonetic variation when they learn to extract words from speech. Moreover, whereas models of language acquisition should perform similarly across languages, evaluation is often limited to English samples. Using child-directed corpora of English, French and Japanese, we evaluate the performance of state-of-the-art statistical models given inputs where phonetic variation has not been reduced. To do so, we measure segmentation robustness across different levels of segmental variation, simulating systematic allophonic variation or errors in phoneme recognition. We show that these models do not resist an increase in such variations and do not generalize to typologically different languages. From the perspective of early language acquisition, the results strengthen the hypothesis according to which phonological knowledge is acquired in large part before the construction of a lexicon.
  • pdf Varadarajan, B., Khudanpur, S. & Dupoux, E. (2008). Unsupervised Learning of Acoustic Subword Units. In Proceedings of ACL-08: HLT, (pp 165-168) . [abstract] Accurate unsupervised learning of phonemes of a language directly from speech is demonstrated via an algorithm for joint unsupervised learning of the topology and parameters of a hidden Markov model (HMM); states and short state-sequences through this HMM correspond to the learnt sub-word units. The algorithm, originally proposed for unsupervised learning of allophonic variations within a given phoneme set, has been adapted to learn without any knowledge of the phonemes. An evaluation methodology is also proposed, whereby the state-sequence that aligns to a test utterance is transduced in an automatic manner to a phoneme-sequence and compared to its manual transcription. Over 85% phoneme recognition accuracy is demonstrated for speaker-dependent learning from fluent, large-vocabulary speech.
  • Peperkamp, S. & Dupoux, E. (2007). Learning the mapping from surface to underlying representations in an artificial language. In J. Cole & J. Hualde (eds) Laboratory Phonology, 9, Mouton de Gruyter. [abstract] ABSTRACT = When infants acquire their native language they not only extract language-specific segmental categories and the words of their language, they also learn the underlying form of these words. This is difficult because words can have multiple phonetic realizations, according to the phonological context. In a series of artificial language-learning experiments with a phrase-picture matching task, we consider the respective contributions of word meaning and distributional information for the acquisition of underlying representations in the presence of an allophonic rule. We show that on the basis of semantic information, French adults can learn to map voiced and voiceless stops or fricatives onto the same underlying phonemes, whereas in their native language voicing is phonemic in all obstruents. They do not extend this knowledge to novel stops or fricatives, though. In the presence of distributional cues only, learning is much reduced and limited to the words subjects are trained on. We also test if phonological naturalness plays a role in this type of learning, and find that if semantic information is present, French adults can learn to map different segments onto a single underlying phoneme even if the mappings are highly unnatural. We discuss our findings in light of current statistical learning approaches to language acquisition.
  • pdf Le Calvez, R., Peperkamp, S. & Dupoux, E. (2007). Bottom-up learning of phonemes: A computational study. In S. Vosniadou, D. Kayser & A. Protopapas (eds) Proceedings of the Second European Cognitive Science Conference, Taylor and Francis. (French translation in Mathematiques et Sciences Humaines 2007(4), 99-111). [abstract] We present a computational evaluation of a hypothesis according to which distributional information is suffic ient to acquire allophonic rules (and hence phonemes) in a bottom-up fashion. The hypothesis was tested using a measure based on information theory that com- pares distributions. The test was conducted on several artificial language corpora and on two natural corpora containing transcriptions of speech directed to infants from two typologically distant languages (French and Japanese). The measure was complemented with three filters, one concerning the statistical reliability due to sample size and two concerning the following univer- sal properties of allophonic rules: constituents of an al- lophonic rule should be phonetically similar, and allo- phonic rules should be assimilatory in nature.
  • pdf Peperkamp, S., Le Calvez, R., Nadal, J.P. & Dupoux, E. (2006). The acquisition of allophonic rules: Statistical learning with linguistic constraints. Cognition, 101(3), B31-B41. [abstract] Phonological rules relate surface phonetic word forms to abstract underlying forms that are stored in the lexicon. Infants must thus acquire these rules in order to infer the abstract representation of words. We implement a statistical learning algorithm for the acquisition of one type of rule, namely allophony, which introduces context-sensitive phonetic variants of phonemes. This algorithm is based on the observation that different realizations of a single phoneme typically do not appear in the same contexts (ideally, they have complementary distributions). In particular, it measures the discrepancies in context probabilities for each pair of phonetic segments. In Experiment 1, we test the algorithm's performances on a pseudo-language and show that it is robust to statistical noise due to sampling and coding errors, and to non-systematic rule application. In Experiment 2, we show that a natural corpus of semiphonetically transcribed child-directed speech in French presents a very large number of near-complementary distributions that do not correspond to existing allophonic rules. These spurious allophonic rules can be eliminated by a linguistically motivated filtering mechanism based on a phonetic representation of segments. We discuss the role of a priori linguistic knowledge in the statistical learning of phonology.
  • pdf Dupoux, E. (2004). The Acquisition of Discrete Segmental Categories: Data and Model. In Proceedings of the 18th International Congress of Acoustics, Kyoto. [abstract] The way in which we parse continuous speech into discrete phonemes is highly language-dependant. Here, we first report that this phenomenon not only depends on the inventory of phonetic distinctions in the language, but also on the inventory of syllabic types. This is illustrated by studies showing that Japanese listeners perceptually insert epenthetic vowels inside illegal consonant clusters in order to make them legal. We then argue that this raises a bootstrapping problem for language acquisition, as the learning of phonetic inventories and syllabic types depend on each other. We present an acquisition model based on the storing and analysis of phonetic syllabic templates. We argue that this model has the potential of solving the bootstrapping problem as well as a range of observation regarding perceptual categorization for speech sounds.
  • pdf Peperkamp, S. & Dupoux, E. (2003). Reinterpreting loanword adaptations: The role of perception. In Proceedings of the 15th International Congress of Phonetic Sciences, (pp 367-370) . [abstract] Standard phonological accounts of loanword adaptations state that the inputs to the adaptations are constituted by the surface forms of the words in the source language and that the adaptations are computed by the phonological grammar of the borrowing language. In processing terms, this means that in perception, the phonetic form of the source words is faithfully copied onto an abstract underlying form, and that adaptations are produced by the standard phonological processes in production. We argue that this is at odds with speech perception models and propose that loanword adaptations take place in perception and are defined as phonetically minimal transformations.
  • pdf Peperkamp, S. & Dupoux, E. (2002). Coping with phonological variation in early lexical acquisition. In I. Lasser(ed) The Process of Language Acquisition, (pp 359-385) Berlin: Peter Lang Verlag. [abstract] Models of lexical acquisition assume that infants can somehow extract unique word forms out of the speech stream before they acquire the meaning of words (e.g. Siskind 1996). However, words often surface with different phonetic forms due to the application of postlexical phonological processes; that is, surface word forms exhibit what we call phonological variation. In this paper, we will examine if and how infants that do not have a semantic lexicon might undo phonological variation, i.e. deduce which phonological processes apply and infer unique underlying word forms that will constitute lexical entries. We will propose a learning mechanism that deduces which rule applies and infers underlying phonemes and word forms. This mechanism is based on an examination of the distribution of either surface segments or surface word forms. The distribution of segments will be shown to provide sufficient information in the case of allophonic rules, i.e. rules that involves segments that do not otherwise occur in the language; the distribution of segments that are introduced by this type of rule is complementary to that of segments that are the direct phonetic realization of certain phonemes. The distribution of word forms will be shown to be necessary in cases in which all surface segments have a phonemic status in the language. In particular, infants can make use of the fact that certain word forms - i.e. the ones that have undergone the rule - fail to occur at the left or right edge of certain phrasal constituents, where the context for application of the rule is never met. This proposal makes predictions regarding the order in which various types of phonological variations can be coped with in the infant.
  • pdf Dupoux, E. & Peperkamp, S. (2002). Fossil markers of language development: phonological deafnesses in adult speech processing. In B. Laks & J. Durand (eds) Phonetics, Phonology, and Cognition, (pp 168-190) Oxford: Oxford University Press.. [abstract] The sound pattern of the language(s) we have heard as infants affects the way in which we perceive linguistic sounds as adults. Typically, some foreign sounds are very difficult to perceive accurately, even after extensive training. For instance, native speakers of French have troubles distinguishing foreign words that differ only in the position of main stress, French being a language in which stress is not contrastive. In this paper, we propose to explore the perception of foreign sounds cross- linguistically in order to understand the processes that govern early language acquisition. Specifically, we propose to test the hypothesis that early language acquisition begins by using only regularities that infants can observe in the surface speech stream (Bottom-Up Bootstrapping), and compare it with the hypothesis that they use all possible sources of information, including, for instance, word boundaries (Interactive Bootstrapping). We set up a research paradigm using the stress system, since it allows to test the various options at hand within a single test procedure. We distinguish four types of regular stress systems the acquisition of which requires different sources of information. We show that the two hypotheses make contrastive predictions as to the pattern of stress perception of adults in these four types of languages. We conclude that cross-linguistic research of adults speech perception, when coupled with detailed linguistic analysis, can be brought to bear on important issues of language acquisition.
  • pdf Christophe, A., Guasti, T., Nespor, M., Dupoux, E. & Van Ooyen, B. (1997). Reflections on phonological bootstrapping: Its role for lexical and syntactic acquisition. Language and Cognitive Processes, 12(5-6), 585-612. [abstract] ``Phonological bootstrapping'' is the hypothesis that a purely phonological analysis of the speech signal may allow infants to start acquiring the lexicon and syntax of their native language (Morgan & Demuth, 1996a) To assess this hypothesis, a first step is to estimate how much information is provided by a phonological analysis of the speech input conducted in the absence of any prior (language-specific) knowledge in other domains such as syntax or semantics. We first review existing work on how babies may start acquiring a lexicon by relying on distributional regularities, phonotactics, typical word shape and prosodic boundary cues. Taken together, these sources of information may enable babies to learn the sound pattern of a reasonable number of the words in their native language. We then focus on syntax acquisition and discuss how babies may set one of the major structural syntactic parameters, the head direction parameter, by listening to prominence within phonological phrases and before they possess any words. Next, we discuss how babies may hope to acquire function words early, and how this knowledge would help lexical segmentation and acquisition, as well as syntactic analysis and acquisition. We then present a model of phonological bootstrapping of the lexicon and syntax that helps us to illustrate the congruence between problems. Some sources of information appear to be useful for more than one purpose; for example, phonological phrases and function words may help lexical segmentation as well as segmentation into syntactic phrases and labelling (NP, VP, etc.). Although our model derives directly from our reflection on acquisition, we argue that it may also be adequate as a model of adult speech processing. Since adults allow a greater variety of experimental paradigms, an advantage of our approach is that specific hypotheses can be tested on both populations. We illustrate this aspect in the final section of the paper, where we present the results of an adult experiment which indicates that prosodic boundaries and function words play an important role in continuous speech processing.
  • pdf Christophe, A. & Dupoux, E. (1996). Bootstrapping lexical acquisition: The role of prosodic structure. Linguistic Review, 13(3-4), 383-412.
  • Databases and software

    Schatz, T., Thiolliere, R., Synnaeve, G. & Dupoux, E. (2016). ABXpy: ABXpy v0.2. Zenodo. [doi:10.5281/zenodo.45268] (download:zenodo.org/record/45268#)

    Schatz, T., Xuan-Nga, C., Kolesnikova, A., Bergvelt, T., Wright, J., & Dupoux, E. (2015). Articulation Index LSCP LDC2015S12. Web Download. Philadelphia: Linguistic Data Consortium. (download:catalog.ldc.upenn.edu/LDC2015S12)

    Publications, experimental studies

  • pdf Gvozdic, K., Moutier, S., Dupoux, E. & Buon, M. (2016). Priming Children's Use of Intentions in Moral Judgement with Metacognitive Training. Frontiers in Language Sciences, 7(190).
  • pdf Dupoux, E. (2015). Category Learning in Songbirds: top-down effects are not unique to humans. Current Biology, 25(16), R718-R720. [abstract] ABSTRACT = Human infants use higher order patterns (words) to learn the sound category of their language. A new study using artificial patterns made up of naturally occurring vocalizations shows that a similar mechanism may also exist in songbirds.
  • pdf Cristia, A., Minagawa-Kawai, Y., Vendelin, I., Cabrol, D. & Dupoux, E. (2014). Responses to vocalizations and auditory controls in the human newborn brain. Plos One, 9(12), e115162. [abstract] The functional organization of the human adult brain allows selective activation of specific regions in response to stimuli. In the adult, linguistic processing has been associated with left-dominant activations in perisylvian regions, whereas emotional vocalizations can give place to right-dominant activation in posterior temporal cortices. Near Infrared Spectroscopy was used to register the response of 40 newborns' temporal regions when stimulated with speech, human and macaque emotional vocalizations, and auditory controls where the formant structure was destroyed but the long-term spectrum was retained. Speech elicited left-dominant activation in one channel in left posterior temporal cortices, as well as in more anterior, deeper tissue with no clear lateralization. Emotional vocalizations induced left-dominant, large activations in more anterior regions, and induced activation. Finally, activation elicited by the control stimuli was right-dominant, and more variable across infants. Overall, these results suggest that left-dominance for speech processing in newborns may be partially modulated by the presence of formant structure, which is shared between speech and non-linguistic vocalizations. Moreover, they indicate that development plays an important role in shaping the cortical networks involved in the processing of emotional vocalizations.
  • pdf Cristia, A., Minagawa-Kawai, Y., Egorova, N., Gervain, J., Filippin, L., Cabrol, D. & Dupoux, E. (2014). Neural correlates of infant dialect discrimination: A fNIRS study. Developmental Science, 17(4), 628-635. [abstract] ABSTRACT = The present study investigated the neural correlates of infant discrimination of very similar linguistic varieties (Quebecois and Parisian French) using functional Near InfraRed Spectroscopy. In line with previous behavioral and electrophysiological data, there was no evidence that 3-month-olds discriminated the two regional accents, whereas 5-month-olds did, with the locus of discrimination in left anterior perisylvian regions. These neuroimaging results suggest that a developing language network relying crucially on left perisylvian cortices sustains infants' discrimination of similar linguistic varieties within this early period of infancy.
  • pdf Ngon, C., Martin, A., Dupoux, E., Cabrol, D. & Peperkamp, S. (2013). Nonwords, nonwords, nonwords: Evidence for a proto-lexicon during the first year of life. Developmental Science, 16(1), 24-34. [abstract] ABSTRACT = Previous research with artificial language learning paradigms has shown that infants are sensitive to statistical cues to word boundaries (Saffran, Aslin & Newport, 1996) and that they can use these cues to extract word-like units (Saffran, 2001). However, it is unknown whether infants use statistical information to construct a recognition lexicon when acquiring their native language. In order to investigate this issue, we rely on the fact that besides real words a statistical algorithm extracts sound sequences that are highly frequent in infant-directed speech but constitute nonwords. In two experiments, we use a preferential listening paradigm to test French-learning 11-month-old infants' recognition of highly frequent disyllabic sequences from their native language. In Experiment 1, we use nonword stimuli and find that infants listen longer to high-frequency than to low-frequency sequences. In Experiment 2, we compare high-frequency nonwords to real words in the same frequency range, and find that infants show no preference. Thus, at 11 months, French-learning infants recognize highly frequent sound sequences from their native language and fail to differentiate between words and nonwords among these sequences. These results are evidence that they have used statistical information to extract word candidates from their input and store them in a ``proto-lexicon'', containing both words and nonwords.
  • pdf Minagawa-Kawai, Y., Cristia, A., Long, B., Vendelin, I., Hakuno, Y., Dutat, M., Filippin, L., Cabrol, D. & Dupoux, E. (2013). Insights on NIRS sensitivity from a cross-linguistic study on the emergence of phonological grammar. Frontiers in Language Sciences, 4(170), 10.3389/fpsyg.2013.00170. [abstract] ABSTRACT = Each language has a unique set of phonemic categories and phonotactic rules which determine permissible sound sequences in that language. Behavioral research demonstrates that one's native language shapes the perception of both sound categories and sound sequences in adults, and neuroimaging results further indicate that the processing of native phonemes and phonotactics involves a left-dominant perisylvian brain network. Recent work using a novel technique, functional Near InfraRed Spectroscopy (NIRS), has suggested that a left-dominant network becomes evident toward the end of the first year of life as infants process phonemic contrasts. The present research project attempted to assess whether the same pattern would be seen for native phonotactics. We measured brain responses in Japanese- and French-learning infants to two contrasts: Abuna vs. Abna (a phonotactic contrast that is native in French, but not in Japanese) and Abuna vs. Abuuna (a vowel length contrast that is native in Japanese, but not in French). Results did not show a significant response to either contrast in either group, unlike both previous behavioral research on phonotactic processing and NIRS work on phonemic processing. To understand these null results, we performed similar NIRS experiments with Japanese adult participants. These data suggest that the infant null results arise from an interaction of multiple factors, involving the suitability of the experimental paradigm for NIRS measurements and stimulus perceptibility. We discuss the challenges facing this novel technique, particularly focusing on the optimal stimulus presentation which could yield strong enough hemodynamic responses when using the change detection paradigm.
  • pdf Ramus, F., Peperkamp, S., Christophe, A., Jacquemot, C., Kouider, S. & Dupoux, E. (2011). A psycholinguistic perspective on the acquisition of phonology. In C. Fougeron, B. Kühnert, d'Imperio M. & Vallée N. (eds) Laboratory Phonology, 10, Berlin: Mouton de Gruyter. [abstract] This paper discusses the target articles by Fikkert, Vihman, and Goldrick & Larson, which address diverse aspects of the acquisition of phonology. These topics are examined using a wide range of tasks and experimental paradigms across different ages. Various levels of processing and representation are thus involved. The main point of the present paper is that such data can be coherently interpreted only within a particular information-processing model that specifies in sufficient detail the different levels of processing and representation. In this paper, we first present the basic architecture of a model of speech perception and production, justifying it with psycholinguistic and neuropsychological data. We then use this model to interpret data from the target articles relative to the acquisition of phonology.
  • pdf Minagawa-Kawai, Y., van der Lely, H., Ramus, F., Sato, Y., Mazuka, R. & Dupoux, E. (2011). Optical Brain Imaging Reveals General Auditory and Language-Specific Processing in Early Infant Development. Cerebral Cortex, 21(2), 254-261. [abstract] This study uses near-infrared spectroscopy in young infants in order to elucidate the nature of functional cerebral processing for speech. Previous imaging studies of infants' speech perception revealed left-lateralized responses to native language. However, it is unclear if these activations were due to language per se rather than to some low-level acoustic correlate of spoken language. Here we compare native (L1) and non-native (L2) languages with 3 different nonspeech conditions including emotional voices, monkey calls, and phase scrambled sounds that provide more stringent controls. Hemodynamic responses to these stimuli were measured in the temporal areas of Japanese 4 month-olds. The results show clear left-lateralized responses to speech, prominently to L1, as opposed to various activation patterns in the nonspeech conditions. Furthermore, implementing a new analysis method designed for infants, we discovered a slower hemodynamic time course in awake infants. Our results are largely explained by signal-driven auditory processing. However, stronger activations to L1 than to L2 indicate a language-specific neural factor that modulates these responses. This study is the first to discover a significantly higher sensitivity to L1 in 4 month-olds and reveals a neural precursor of the functional specialization for the higher cognitive network.
  • pdf Minagawa-Kawai, Y., Cristia, A., Vendelin, I., Cabrol, D. & Dupoux, E. (2011). Assessing signal-driven mechanisms in neonates: Brain responses to temporally and spectrally different sounds. Frontiers in Language Sciences, 2(135). [abstract] ABSTRACT = Past studies have found that, in adults, the acoustic properties of sound signals (such as fast vs. slow temporal features) differentially activate the left and right hemispheres, and some have hypothesized that left-lateralization for speech processing may follow from left-lateralization to rapidly changing signals. Here, we tested whether newborns' brains show some evidence of signal-specific lateralization responses using near-infrared spectroscopy (NIRS) and auditory stimuli that elicits lateralized responses in adults, composed of segments that vary in duration and spectral diversity. We found significantly greater bilateral responses of oxygenated hemoglobin (oxy-Hb) in the temporal areas for stimuli with a minimum segment duration of 21 ms, than stimuli with a minimum segment duration of 667 ms. However, we found no evidence for hemispheric asymmetries dependent on the stimulus characteristics. We hypothesize that acoustic-based functional brain asymmetries may develop throughout early infancy, and discuss their possible relationship with brain asymmetries for language.
  • pdf Minagawa-Kawai, Y., Cristià, A. & Dupoux, E. (2011). Cerebral lateralization and early speech acquisition: A developmental scenario. Developmental Cognitive Neuroscience, 1(3), 217-232. [abstract] During the past ten years, research using Near-InfraRed Spectroscopy (NIRS) to study the developing brain has provided groundbreaking evidence of brain functions in infants. We review three competing classes of hypotheses, (signal-driven, domain-driven, and learning biases hypotheses) regarding the causes of hemispheric specialization for speech processing. We assess the fit between each of these hypotheses and neuroimaging evidence in speech perception and show that none of the three hypotheses can account for the entire set of observations on its own. However, we argue that they provide a good fit when combined within a developmental perspec- tive. According to our proposed scenario, lateralization for language emerges out of the interaction between pre-existing left--right biases in generic auditory processing (signal- driven hypothesis), and a left-hemisphere predominance of particular learning mechanisms (learning-biases hypothesis). As a result of thiscompleted developmental process, the native language is represented in the left hemisphere predominantly. The integrated sce- nario enables to link infant and adult data, and points to many empirical avenues that need to be explored more systematically.
  • pdf Mazuka, R., Cao, Y., Dupoux, E. & Christophe, A. (2011). The development of a phonological illusion: A cross-linguistic study with Japanese and French infants. Developmental Science, 14(4), 693-699. [abstract] ABSTRACT = In adults, the native language phonology has strong perceptual effects. Previous work showed that Japanese speakers, unlike French speakers, break up illegal sequences of consonants with illusory vowels: they report hearing abna as abuna. To study the development of the phonological grammar, we compared Japanese and French infants in a discrimination task. In Experiment 1, we observed that 14-month-old Japanese infants, in contrast with French infants, failed to discriminate phonetically varied sets of abna-type and abuna-type stimuli. In Experiment 2, 8 month-old French and Japanese did not differ significantly from each other. In Experiment 3, we found that, like adults, Japanese infants can discriminate abna from abuna when phonetic variability is reduced (single item). These results show that the phonologically- induced /u/ illusion is already experienced by Japanese infants at the age of 14 months. Hence, before having acquired many words of their language, they have grasped enough of their native phonological grammar to constrain their perception of speech sound sequences.
  • pdf Dupoux, E., Peperkamp, S. & Sebastian-Galles, N. (2010). Limits on bilingualism revisited: Stress "deafness" in simultaneous French-Spanish bilinguals. Cognition, 114(2), 266-275. [abstract] We probed simultaneous French-Spanish bilinguals for the perception of Spanish lexical stress using three tasks, two short-term memory encoding tasks and a speeded lexical decision. In all three tasks, the performance of the group of simultaneous bilinguals was intermediate between that of native speakers of Spanish on the one hand and French late learners of Spanish on the other hand. Using a composite stress `deafness' index measure computed over the results of the three tasks, we found that the performance of the simultaneous bilinguals is best fitted by a bimodal distribution that corresponds to a mixture of the performance distributions of the two control groups. Correlation analyses showed that the variables explaining language dominance are linked to early language exposure. These findings are discussed in light of theories of language processing in bilinguals.
  • pdf Skoruppa, K., Pons, F., Christophe, A., Bosch, L., Dupoux, E., Sebastian-Galles, N., Limissuri, R.A. & Peperkamp, S. (2009). Language-specific stress perception by 9-month-old French and Spanish infants. Developmental Science, 12(6), 914-919. [abstract] During the first year of life, infants begin to have difficulties perceiving non-native vowel and consonant contrasts, thus adapting their perception to the phonetic categories of the target language. In this paper, we examine the perception of a non-segmental feature, i.e. stress. Previous research with adults has shown that speakers of French (a language with fixed stress) have great difficulties in perceiving stress contrasts (Dupoux, Pallier, Sebastian & Mehler, 1997), whereas speakers of Spanish (a language with lexically contrastive stress) perceive these contrasts as accurately as segmental contrasts. We show that language-specific differences in the perception of stress likewise arise during the first year of life. Specifically, 9-month-old Spanish infants successfully distinguish between stress-initial and stress-final pseudo-words, while French infants of this age show no sign of discrimination. In a second experiment using multiple tokens of a single pseudo-word, French infants of the same age successfully discriminate between the two stress patterns, showing that they are able to perceive the acoustic correlates of stress. Their failure to discriminate stress patterns in the first experiment thus reflects an inability to process stress at an abstract, phonological level.
  • pdf Darcy, I., Ramus, F., Christophe, A., Kinzler, K.D. & Dupoux, E. (2009). Phonological knowledge in compensation for native and non-native assimilation. In F. Kügler, C. Féry & R. van de Vijver (eds) Variation and Gradience in Phonetics and Phonology, (pp 265-309) Berlin: Mouton De Gruyter. [abstract] We investigated whether compensation for phonological assimilation depends on language-universal or language-specific processes. To this end, we tested two different assimilation rules, one that exists in English and involves place of articulation, and another that exists in French and involves voicing. Both contrasts were tested on speakers of French, British English and American English. In three experiments using a word detection task, we observed that monolingual participants showed a significantly higher degree of compensation for phonological changes that correspond to rules existing in their language than to rules that do not exist in their language (even though they are phonologically possible since they exist in another language). Thus, French participants compensated more for voicing than place assimilation, while British and American English participants compensated more for place than voicing assimilation. In all three experiments, we also found that the non-native rule induced a very small but significant compensation effect, suggesting that both a language-specific and a language-universal mechanism are at play. In Experiment 4, we studied native speakers of British English who were late learners of French: they showed the British pattern of results even when listening to French stimuli, confirming that compensation for assimilation is induced by language-specific phonological processes rather than specific phonetic cues. The results are discussed in light of current models of lexical access and phonological processing.
  • pdf Minagawa-Kawai, Y., Mori, K., Hebden, J.C. & Dupoux, E. (2008). Optical Imaging of infants' neurocognitive development: Recent advances and perspectives. Developmental Neurobiology, 68(6), 712-728. [abstract] Near-infrared spectroscopy (NIRS) provides a unique method of monitoring infant brain function by measuring the changes in the concentrations of oxygenated and deoxygenated hemoglobin. During the past 10 years, NIRS measurement of the developing brain has rapidly expanded. In this article, a brief discussion of the general principles of NIRS, including its technical advantages and limitations, is followed by a detailed review of the role played so far by NIRS in the study of infant perception and cognition, including language, and visual and auditory functions. Results have highlighted, in particular, the developmental changes of cerebral asymmetry associated with speech acquisition. Finally, suggestions for future studies of neurocognitive development using NIRS are presented. Although NIRS studies of the infant brain have yet to fulfill their potential, a review of the work done so far indicates that NIRS is likely to provide many unique insights in the field of developmental neuroscience.
  • pdf Dupoux, E., Sebastian-Galles, N., Navarrete, E. & Peperkamp, S. (2008). Persistent stress "deafness": The case of French learners of Spanish. Cognition, 106(2), 682-706. [abstract] Previous research by Dupoux et al. [Dupoux, E., Pallier, C., Sebastian, N., & Mehler, J. (1997). A destressing `deafness{''} in French? Journal of Memory Language 36, 406-421; Dupoux, E., Peperkamp, S., & Sebastian-Galles (2001). A robust method to study stress' deafness. Journal of the Acoustical Society of America 110, 1608-1618.] found that French speakers, as opposed to Spanish ones, are impaired in discrimination tasks with stimuli that vary only in the position of stress. However, what was called stressdeafness' was only found in tasks that used high phonetic variability and memory load, not in cognitively less demanding tasks such as single token AX discrimination. This raised the possibility that instead of a perceptual problem, monolingual French speakers might simply lack a metalinguistic representation of contrastive stress, which would impair them in memory tasks. We examined a sample of 39 native speakers of French who underwent formal teaching of Spanish after age 10, and varied in degree of practice in this language. Using a sequence recall task, we observed in all our groups of late learners of Spanish the same impairment in short-term memory encoding of stress contrasts that was previously found in French monolinguals. Furthermore, using a speeded lexical decision task with word-nonword minimal pairs that differ only in the position of stress, we found that all late learners had much difficulty in the use of stress to access the lexicon. Our results show that stress deafness' is better interpreted as a lasting processing problem resulting from the impossibility for French speakers to encode contrastive stress in their phonological representations. This affects their memory encoding as well as their lexical access in on-line tasks. The generality of such a persistent suprasegmentaldeafness' is discussed in relation to current findings and models on the perception of non-native phonological contrasts.
  • pdf Minagawa-Kawai, Y., Naoi, N., Nishijima, N., Kojima, S. & Dupoux, E. (2007). Developmental changes in cerebral responses to native and non-native vowels: a NIRS study., 1877--1880. [abstract] ABSTRACT = While newborn infants discriminate speech sounds from languages that they have never heard, 6-month-olds demonstrate the beginnings of vowel classification specific to their native-language. The neuronal correlates involved in such a dramatic perceptual reorganization process, however, are not well understood. Using near-infrared spectroscopy (NIRS), this study compares the neural responses of Japanese infants at 3-4 months and 7-8 months of age as well as of adults to native ([i] vs. [w] ) and non-native vowel contrasts ([w] vs. [u]) within pseudo-word contexts. The findings demonstrated longitudinal developmental changes of functional temporal cortex asymmetries associated with the exposure of the native language.
  • pdf Peperkamp, S., Skoruppa, K. & Dupoux, E. (2006). The role of phonetic naturalness in phonological rule acquisition. In D. Bamman, T. Magnitskaia & C. Zaller (eds) Proceedings of the 30th Annual Boston University Conference on Language Development, Vols 1 and 2, (pp 464-475) . [abstract] The role of naturalness constraints in phonological learning is of considerable theoretical importance for linguistically motivated models of language acquisition. However, the existence of naturalness effects is still not resting on firm empirical grounds. P&D (in press) exposed French subjects to an artificial language consisting of determiner + noun phrases which obey either a natural allophonic rule that voices a subclass of obstruents intervocalically, or an unnatural one that defines arbitrary relationships among certain obstruents intervocalically. After exposure, a phrase-picture matching task was used to assess whether subjects had learned the allophonic distributions and hence distinguished between phonemic and allophonic contrasts among obstruents for the purposes of word identification. Surprisingly, P&D (in press) found that natural assimilatory rules and unnatural arbitrary rules were learned with equal ease. In the present study, we use exactly the same exposure phase, but change the test phase: here, subjects have to produce a noun phrase upon the presentation of a picture, both for nouns that they have been trained on during the exposure phase, and for novel nouns. We find that with this more ecologically valid, but also more demanding task, a naturalness effect emerges: subjects learned the rule on old items and extended it to novel items, but ony for the natural assimilatory rules, not for the nonntatural arbitrary rules. We discuss these findings in relation to existing studies of the acquisition of phonological rules. We distinguish at least three constraints that characterize rule naturalness, and discuss the role of task demands and response strategies in relation to the emergence of naturalness effects in learning studies using artificial languages.
  • pdf Peperkamp, S., Pettinato, M. & Dupoux, E. (2003). Allophonic variation and the acquisition of phoneme categories. In B. Beachley, A. Brown & F. Conlin (eds) BUCLD 27: Annual Boston University Conference on Language Development, Vols 1 and 2, Proceedings, (pp 650-661) .
  • pdf Pallier, C., Dahaene, S., Poline, J., LeBihan, D., Argenti, A., Dupoux, E. & Mehler, J. (2003). Brain imaging of language plasticity in adopted adults: Can a second language replace the first? Cerebral Cortex, 13(2), 155-161. [abstract] Do the neural circuits that subserve language acquisition lose plasticity as they become tuned to the maternal language? We tested adult subjects born in Korea and adopted by French families in childhood; they have become fluent in their second language and report no conscious recollection of their native language. In behavioral tests assessing their memory for Korean, we found that they do not perform better than a control group of native French subjects who have never been exposed to Korean. We also used event-related functional magnetic resonance imaging to monitor cortical activations while the Korean adoptees and native French listened to sentences spoken in Korean, French and other, unknown, foreign languages. The adopted subjects did not show any specific activations to Korean stimuli relative to unknown languages. The areas activated more by French stimuli than by foreign stimuli were similar in the Korean adoptees and in the French native subjects, but with relatively larger extents of activation in the latter group. We discuss these data in light of the critical period hypothesis for language acquisition.
  • pdf Jacquemot, C., Pallier, C., LeBihan, D., Dehaene, S. & Dupoux, E. (2003). Phonological grammar shapes the auditory cortex: A functional magnetic resonance imaging study. Journal of Neuroscience, 23(29), 9541-9546. [abstract] Languages differ depending on the set of basic sounds they use (the inventory of consonants and vowels) and on the way in which these sounds can be combined to make up words and phrases (phonological grammar). Previous research has shown that our inventory of consonants and vowels affects the way in which our brains decode foreign sounds (Goto, 1971; Naatanen et al., 1997; Kuhl, 2000). Here, we show that phonological grammar has an equally potent effect. We build on previous research, which shows that stimuli that are phonologically ungrammatical are assimilated to the closest grammatical form in the language (Dupoux et al., 1999). In a cross-linguistic design using French and Japanese participants and a fast event-related functional magnetic resonance imaging (fMRI) paradigm, we show that phonological grammar involves the left superior temporal and the left anterior supramarginal gyri, two regions previously associated with the processing of human vocal sounds.
  • Workshops and Symposia

    The team organized two workshops on zero ressource speech technologies and computational modeling of early language acquisition.

    The team also organized :