Emmanuel Dupoux
Email: emmanuel.dupoux at gmail dot com

Directeur d'Études
École des Hautes Etudes en Sciences Sociales

Laboratoire de Science Cognitive et Psycholinguistique.
29 rue d'Ulm, 75005 Paris, France.
tel: (+33 1) 44 32 26 16, fax: (+33 1) 44 32 26 30.

My research focuses on the mecanisms and representations specific to the human brain that allow the human baby to acquire one or several languages and become cognitively functional in his or her culture. This investigation is conducted using behavioral methods in adult and infants, brain imagery, and computational modeling with machine learning techniques.


Updated: Dec 2018


[go up]
Born November 30th 1964 in Paris.

2002- Co-creator and director of the Cognitive Science Master program (see the CogMaster site).
1998-2009 Director of Laboratoire de Sciences Cognitives et Psycholinguistique (LSCP).
1992 Diploma in Telecom Engineering at Télécom Paris.
1989-1990 Post-doc at the Cognitive Science Program, Univ. of Arizona.
1989 PhD in Cognitive Psychology, EHESS, Paris.
1984-1988 Student at École Normale Supérieure

Complete Vitae

Research topics

[go up]
In my research, I have been focusing on the early acquisition of linguistic and social skills in infants and their more or less reversible consequences in adults, in terms of a cognitive specialization for a particular language or culture. My approach is to run comparative studies in adults and infants, and test theoretical models that take into account both types of studies. More recently, I explore how machine learning and artificial intelligence can provide quantitative models of processing and learning in infants. For more details on this current activity, see the Cognitive Machine Learning (CoML) INRIA team website. Below is three of my major research interest:


[go up]

Note. For some paper, you can read an abstract and/or download a ready-to-print pdf file. To print or preview a pdf file, use Acrobat Reader.

Peer Reviewed Articles

  • pdf Zeghidour, N., Usunier, N., Synnaeve, G., Collobert, R. & Dupoux, E. (2018). End-to-End Speech Recognition from the raw waveform. In Interspeech-2018. [abstract] ABSTRACT = State-of-the-art speech recognition systems rely on fixed, hand-crafted features such as mel-filterbanks to preprocess the wave- form before the training pipeline. In this paper, we study end-to-end systems trained directly from the raw waveform, building on two alternatives for trainable replacements of mel-filterbanks that use a convolutional architecture. The first one is inspired by gammatone filterbanks (Hoshen et al., 2015; Sainath et al, 2015), and the second one by the scattering transform (Zeghidour et al., 2017). We propose two modifications to these architectures and systematically compare them to mel-filterbanks, on the Wall Street Journal dataset. The first modification is the addition of an instance normalization layer, which greatly improves on the gammatone-based trainable filterbanks and speeds up the training of the scattering-based filterbanks. The second one relates to the low-pass filter used in these approaches. These modifications consistently improve performances for both ap- proaches, and remove the need for a careful initialization in scattering-based trainable filterbanks. In particular, we show a consistent improvement in word error rate of the trainable filterbanks relatively to comparable mel-filterbanks. It is the first time end-to-end models trained from the raw signal significantly outperform mel-filterbanks on a large vocabulary task under clean recording conditions.
  • pdf Zeghidour, N., Usunier, N., Kokkinos, I., Schatz, T., Synnaeve, G. & Dupoux, E. (2018). Learning filterbanks from raw speech for phoneme recognition. In ICASSP-2018. [abstract] ABSTRACT = In this work we train a bank of complex filters that operates at the level of the raw speech signal and feeds into a convolutional neural network for phone recognition. These time-domain filterbanks (TD-filterbanks) are initialized as an ap- proximation of MFSC, and then fine-tuned jointly with the remaining convolutional network. We perform phone recognition experiments on TIMIT and show that for several architectures, models trained on TD-filterbanks consistently out-perform their counterparts trained on comparable MFSC. We get our best performance by learning all front-end steps, from pre-emphasis up to averaging. Finally, we observe that the filters at convergence have an asymmetric impulse response while preserving some analyticity.
  • pdf Thual, A., Dancette, C., Karadayi, J., Benjumea, J. & Dupoux, E. (2018). A K-nearest neighbours approach to unsupervised spoken term discovery. In IEEE SLT-2018. [abstract] ABSTRACT = Unsupervised spoken term discovery is the task of finding recurrent acoustic patterns in speech without any annotations. Current approaches consists of two steps: (1) discovering similar patterns in speech, and (2) partitioning those pairs of acoustic tokens using graph clustering methods. We propose a new approach for the first step. Previous systems used various approximation algorithms to make the search tractable on large amounts of data. Our approach is based on an optimized k-nearest neighbours (KNN) search coupled with a fixed word embedding algorithm. The results show that the KNN algorithm is robust across languages, consistently outperforms the DTW-based baseline, and is competitive with current state-of-the-art spoken term discovery systems.
  • pdf Schatz, T., Bach, F. & Dupoux, E. (2018). Evaluating automatic speech recognition systems as quantitative models of cross-lingual phonetic category perception. Journal of the Acoustical Society of America: Express Letters. [abstract] ABSTRACT = Existing theories of cross-linguistic phonetic category perception agree that listeners perceive foreign sounds by mapping them onto their native phonetic categories. Yet, none of the available theories specify a way to compute this mapping. As a result, they cannot provide systematic quantitative predictions and remain mainly descriptive. In this paper, Automatic Speech Recognition (ASR) systems are used to provide a fully specified mapping between foreign and native sounds. This is shown to provide a quantitative model that can account for several empirically attested effects in human cross-linguistic phonetic category perception.
  • pdf Scharenborg, O., Besacier, L., Black, A., Hasegawa-Johnson, M., Metze, F., Neubig, G., Stüker, S., Godard, P., Müller, M., Ondel, L., Palaskar, S., Arthur, P., Ciannella, F., Du, M., Larsen, E., Merkx, D., Riad, R., Wang, L. & Dupoux, E. (2018). Linguistic unit discovery from multimodal inputs in unwritten languages: Summary of the "Speaking Rosetta" JSALT 2017 Workshop. In ICASSP-2018. [abstract] ABSTRACT = We summarize the accomplishments of a multi-disciplinary workshop exploring the computational and scientific issues surrounding the discovery of linguistic units (subwords and words) in a language without orthography. We study the re- placement of orthographic transcriptions by images and/or translated text in a well-resourced language to help unsuper- vised discovery from raw speech.
  • pdf Riad, R., Dancette, C., Karadayi, J., Zeghidour, N., Schatz, T. & Dupoux, E. (2018). Sampling strategies in Siamese Networks for unsupervised speech representation learning. In Interspeech-2018. [abstract] ABSTRACT = Recent studies have investigated siamese network architec- tures for learning invariant speech representations using same-different side information at the word level. Here we investigate systematically an often ignored component of siamese networks: the sampling procedure (how pairs of same vs. different tokens are selected). We show that sampling strategies taking into account Zipf's Law, the distribution of speakers and the proportions of same and different pairs of words significantly impact the performance of the network. In particular, we show that word frequency compression improves learning across a large range of variations in number of training pairs. This effect does not apply to the same extent to the fully unsupervised setting, where the pairs of same-different words are obtained by spoken term discovery. We apply these results to pairs of words discovered using an unsupervised algorithm and show an improvement on state-of-the-art in unsupervised representation learning using siamese networks.
  • pdf Ondel, L., Godard, P., Besacier, L., Larsen, E., Hasegawa-Johnson, M., Scharenborg, O., Dupoux, E., Burget, L.s., Yvon, F.c. & Khudanpur, S. (2018). Bayesian models for unit discovery on a very low resource language. In ICASSP-2018. [abstract] ABSTRACT = Developing speech technologies for low-resource languages has become a very active research field over the last decade. Among others, Bayesian models have shown some promising results on artificial examples but still lack of in situ exper- iments. Our work applies state-of-the-art Bayesian models to unsupervised Acoustic Unit Discovery (AUD) in a real low-resource language scenario. We also show that Bayesian models can naturally integrate information from other re- sourceful languages by means of informative prior leading to more consistent discovered units. Finally, discovered acoustic units are used, either as the 1-best sequence or as a lattice, to perform word segmentation. Word segmentation results show that this Bayesian approach clearly outperforms a Segmental-DTW baseline on the same corpus.
  • pdf Holzenberger, N., Du, M., Karadayi, J., Riad, R. & Dupoux, E. (2018). Learning word embeddings: unsupervised methods for fixed-size representations of variable-length speech segments. In Interspeech-2018. [abstract] ABSTRACT = Fixed-length embeddings of words are very useful for a variety of tasks in speech and language processing. Here we sys- tematically explore two methods of computing fixed-length embeddings for variable-length sequences. We evaluate their sus- ceptibility to phonetic and speaker-specific variability on English, a high resource language, and Xitsonga, a low resource language, using two evaluation metrics: ABX word discrimina- tion and ROC-AUC on same-different phoneme n-grams. We show that a simple downsampling method supplemented with length information can be competitive with the variable-length input feature representation on both evaluations. Recurrent au- toencoders trained without supervision can yield even better re- sults at the expense of increased computational complexity.
  • pdf Guevara-Rukoz, A., Cristia, A., Ludusan, B., Thiollière, R., Martin, A., Mazuka, R. & Dupoux, E. (2018). Are words easier to learn from infant- than adult- directed speech? A quantitative corpus-based investigation Cognitive Science. [abstract] ABSTRACT = We investigate whether infant-directed speech (IDS) facilitates lexical learning when compared to adult-directed speech (ADS). To study this, we compare the distinctiveness of the lexicon at two levels, acoustic and phonological, using a large database of spontaneous speech in Japanese. At the acoustic level we show that, as has been documented before for phonemes, the realizations of words are more variable and less discriminable in IDS. At the phonological level, we find that despite a slight increase in the number of phonological neighbors, the IDS lexicon contains more distinctive words (such as onomatopeias). Combining the acoustic and phonological metrics together in a global discrimination score, the two effects cancel each other out and the IDS lexicon winds up being as discriminable as its ADS counterpart. We discuss the implication of these findings for the view of IDS as hyperspeech, i.e., a register whose purpose is to facilitate language acquisition.
  • pdf Dupoux, E. (2018). Cognitive Science in the era of Artificial Intelligence: A roadmap for reverse-engineering the infant language-learner. Cognition, 173, 34-59. [abstract] ABSTRACT = Spectacular progress in the information processing sciences (machine learning, wearable sensors) promises to revolutionize the study of cognitive development. Here, we analyse the conditions under which 'reverse engineering' language development, i.e., building an effective system that mimics infant's achievements, can contribute to our scientific understanding of early language development. We argue that, on the computational side, it is important to move from toy problems to the full complexity of the learning situation, and take as input as faithful reconstructions of the sensory signals available to infants as possible. On the data side, accessible but privacy-preserving repositories of home data have to be setup. On the psycholinguistic side, specific tests have to be constructed to benchmark humans and machines at different linguistic levels. We discuss the feasibility of this approach and present an overview of current results.
  • pdf Cao, X.N., Dakhlia, C., del Carmen, P., Jaouani, M.A., Ould-Arbi, M. & Dupoux, E. (2018). Baby Cloud, a technological platform for parents and researchers. In Nicoletta Calzolari (Conference chair), Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis & Takenobu Tokunaga (eds) Proceedings of LREC 2018, European Language Resources Association (ELRA). [abstract] ABSTRACT = In this paper, we present BabyCloud, a platform for capturing, storing and analyzing daylong audio recordings and photographs of children's linguistic environments, for the purpose of studying infant's cognitive and linguistic development and interactions with the environment. The proposed platform connects two communities of users: families and academics, with strong innovation potential for each type of users. For families, the platform offers a novel functionality: the ability for parents to follow the development of their child on a daily basis through language and cognitive metrics (growth curves in number of words, verbal complexity, social skills, etc). For academic research, the platform provides a novel means for studying language and cognitive development at an unprecedented scale and level of detail. They will submit algorithms to the secure server which will only output anonymized aggregate statistics. Ultimately, BabyCloud aims at creating an ecosystem of third parties (public and private research labs...) gravitating around developmental data, entirely controlled by the party whose data originate from, i.e. families.

  • pdf Tsuji, S., Fikkert, P., Minagawa-Kawai, Y., Dupoux, E., Filippin, L., Versteegh, M., Hagoort, P. & Cristia, A. (2017). The more, the better? Behavioral and neural correlates of frequent and infrequent vowel exposure Developmental Psychobiology, 59, 603-612. [abstract] ABSTRACT = A central assumption in the perceptual attunement literature holds that exposure to a speech sound contrast leads to improvement in native speech sound processing. However, whether the amount of exposure matters for this process has not been put to a direct test. We elucidated indicators of frequency-dependent perceptual attunement by comparing 5--8-month-old Dutch infants' discrimination of tokens containing a highly frequent [hɪt-he:t] and a highly infrequent [hYt-h\o:t] native vowel contrast as well as a non-native [hɛt-h\aet] vowel contrast in a behavioral visual habituation paradigm (Experiment 1). Infants discriminated both native contrasts similarly well, but did not discriminate the non-native contrast. We sought further evidence for subtle differences in the processing of the two native contrasts using near-infrared spectroscopy and a within-participant design (Experiment 2). The neuroimaging data did not provide additional evidence that responses to native contrasts are modulated by frequency of exposure. These results suggest that even large differences in exposure to a native contrast may not directly translate to behavioral and neural indicators of perceptual attunement, raising the possibility that frequency of exposure does not influence improvements in discriminating native contrasts.
  • pdf Schatz, T., Turnbull, R., Bach, F. & Dupoux, E. (2017). A Quantitative Measure of the Impact of Coarticulation on Phone Discriminability. In INTERSPEECH-2017. [abstract] ABSTRACT = Acoustic realizations of a given phonetic segment are typically affected by coarticulation with the preceding and following phonetic context. While coarticulation has been extensively studied using descriptive phonetic measurements, little is known about the functional impact of coarticulation for speech processing. Here, we use DTW-based similarity defined on raw acoustic features and ABX scores to derive a measure of the effect of coarticulation on phonetic discriminability. This measure does not rely on defining segment-specific phonetic cues (formants, duration, etc.) and can be applied systematically and automatically to any segment in large scale corpora. We illustrate our method using stimuli in English and Japanese. We replicate some well-known results, i.e., stronger anticipatory than perseveratory coarticulation and stronger coarticulation for lax/short vowels than for tense/long vowels. We then quantify for the first time the impact of coarticulation across different segment types (like vowels and consonants). We discuss how our metric and its possible extensions can help addressing current challenges in the systematic study of coarticulation.
  • pdf Michel, P., Räsänen, O., Thiollière, R. & Dupoux, E. (2017). Blind phoneme segmentation with temporal prediction errors. In Proceedings of ACL: Student Research Workshop, 62-68. [abstract] Phonemic segmentation of speech is a crit- ical step of speech recognition systems. We propose a novel unsupervised algo- rithm based on sequence prediction mod- els such as Markov chains and recurrent neural networks. Our approach consists in analyzing the error profile of a model trained to predict speech features frame- by-frame. Specifically, we try to learn the dynamics of speech in the MFCC space and hypothesize boundaries from lo- cal maxima in the prediction error. We evaluate our system on the TIMIT dataset, with improvements over similar methods.
  • pdf Ludusan, B., Mazuka, R., Bernard, M., Cristia, A. & Dupoux, E. (2017). The Role of Prosody and Speech Register in Word Segmentation: A Computational Modelling Perspective. In ACL 2017, 2, (pp 178-183) . [abstract] ABSTRACT = This study explores the role of speech register and prosody for the task of word segmentation. Since these two factors are thought to play an important role in early language acquisition, we aim to quantify their contribution for this task. We study a Japanese corpus containing both infant- and adult-directed speech and we apply four different word segmentation models, with and without knowledge of prosodic boundaries. The results showed that the difference between registers is smaller than previously reported and that prosodic boundary information helps more adult- than infant-directed speech.
  • pdf Le Godais, G., Linzen, T. & Dupoux, E. (2017). Comparing character-level neural language models using a lexical decision task. In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics., 2, (pp 125--130) . [abstract] ABSTRACT = What is the information captured by neural network models of language? We address this question in the case of character-level recurrent neural language models. These models do not have explicit word repre- sentations; do they acquire implicit ones? We assess the lexical capacity of a network using the lexical decision task common in psycholinguistics: the system is required to decide whether or not a string of charac- ters forms a word. We explore how accu- racy on this task is affected by the architec- ture of the network, focusing on cell type (LSTM vs. SRN), depth and width. We also compare these architectural properties to a simple count of the parameters of the network. The overall number of parame- ters in the network turns out to be the most important predictor of accuracy; in partic- ular, there is little evidence that deeper net- works are beneficial for this task.
  • pdf Larsen, E., Cristia, A. & Dupoux, E. (2017). Relating unsupervised word segmentation to reported vocabulary acquisition. In INTERSPEECH-2017. [abstract] ABSTRACT = A range of computational approaches have been used to model the discovery of word forms from continuous speech by infants. Typically, these algorithms are evaluated with respect to the ideal 'gold standard' word segmentation and lexicon. These metrics assess how well an algorithm matches the adult state, but may not reflect the intermediate states of the child's lexical development. We set up a new evaluation method based on the correlation between word frequency counts derived from the application of an algorithm onto a corpus of child-directed speech, and the proportion of infants knowing the words according to parental reports. We evaluate a representative set of 4 algorithms, applied to transcriptions of the Brent corpus, which have been phonologized using either phonemes or syllables as basic units. Results show remarkable variation in the extent to which these 8 algorithm-unit combinations predicted infant vocabulary, with some of these predictions surpassing those derived from the adult gold standard segmentation. We argue that infant vocabulary prediction provides a useful complement to traditional evaluation; for example, the best predictor model was also one of the worst in terms of segmentation score, and there was no clear relationship between token or boundary F-score and vocabulary prediction.
  • pdf Guevara-Rukoz, A., Parlato-Oliveira, E., Yu, S., Hirose, Y., Peperkamp, S. & Dupoux, E. (2017). Predicting epenthetic vowel quality from acoustics. In INTERSPEECH-2017. [abstract] ABSTRACT = Past research has shown that sound sequences not permitted in our native language may be distorted by our perceptual system. A well documented example is vowel epenthesis, a phenomenon in which non-existent vowels are hallucinated by listeners, in order to repair illegal consonantal sequences. As reported in previous work, this occurs in Japanese (JP) and Brazilian Portuguese (BP), languages for which the 'default' epenthetic vowels are /u/ and /i/, respectively. In a perceptual experiment, we corroborate the finding that the quality of this illusory vowel is language-dependent, but also that this default choice can be overridden by coarticulatory information present on the consonant cluster. In a second step, we analyse recordings of JP and BP speakers producing 'epenthesized' versions of stimuli from the perceptual task. Results reveal that the default vowel corresponds to the vowel with the most reduced acoustic characteristics, also the one for which formants are acoustically closest to formant transitions present in consonantal clusters. Lastly, we model behavioural responses from the perceptual experiment with an exemplar model using dynamic time warping (DTW)-based similarity measures on MFCCs.
  • pdf Guevara-Rukoz, A., Lin, I., Morii, M., Minagawa, Y., Dupoux, E. & Peperkamp, S. (2017). Which epenthetic vowel? Phonetic categories versus acoustic detail in perceptual vowel epenthesis Journal of the Acoustical Society of America: Express Letters, 142(2), EL211-2017. [abstract] ABSTRACT = This study aims to quantify the relative contributions of phonetic categories and acoustic detail on phonotactically induced perceptual vowel epenthesis in Japanese listeners. A vowel identification task tested whether a vowel was perceived within illegal consonant clusters and, if so, which vowel was heard. Cross-spliced stimuli were used in which vowel coarticulation present in the cluster did not match the quality of the flanking vowel. Two clusters were used, /hp/ and /kp/, the former containing larger amounts of resonances of the preceding vowel. While both flanking vowel and coarticulation influenced vowel quality, the influence of coarticulation was larger, especially for /hp/.
  • pdf Dunbar, E., Xuan-Nga, C., Benjumea, J., Karadayi, J., Bernard, M., Besacier, L., Anguera, X. & Dupoux, E. (2017). The Zero Resource Speech Challenge 2017. In ASRU-2017. [abstract] ABSTRACT = We describe a new challenge aimed at discovering subword and word units from raw speech. This challenge is the fol- lowup to the Zero Resource Speech Challenge 2015. It aims at constructing systems that generalize across languages and adapt to new speakers. The design features and evaluation metrics of the challenge are presented and the results of sev- enteen models are discussed.
  • pdf Cristia, A., Dupoux, E., Gurven, M. & Stieglitz, J. (2017). Child-directed speech is infrequent in a forager-farmer population: a time allocation study. Child Development. [abstract] This article provides an estimation of how frequently, and from whom, children aged 0-11 years (Ns between 9 and 24) receive one-on-one verbal input among Tsimane forager-horticulturalists of lowland Bolivia. Analyses of systematic daytime behavioral observations reveal < 1 min per daylight hour is spent talking to children younger than 4 years of age, which is 4 times less than estimates for others present at the same time and place. Adults provide a majority of the input at 0--3 years of age but not afterward. When integrated with previous work, these results reveal large cross-cultural variation in the linguistic experiences provided to young children. Consideration of more diverse human populations is necessary to build generalizable theories of language acquisition.
  • pdf Chaabouni, R., Dunbar, E., Zeghidour, N. & Dupoux, E. (2017). Learning weakly supervised multimodal phoneme embeddings. In INTERSPEECH-2017. [abstract] ABSTRACT = Recent works have explored deep architectures for learning multimodal speech representation (e.g. audio and images, articulation and audio) in a supervised way. Here we investigate the role of combining different speech modalities, i.e. audio and visual information representing the lips' movements, in a weakly-supervised way using Siamese networks and lexical same-different side information. In particular, we ask whether one modality can benefit from the other to provide a richer representation for phone recognition in a weakly supervised setting. We introduce mono-task and multi-task methods for merging speech and visual modalities for phone recognition. The mono-task learning consists in applying a Siamese network on the concatenation of the two modalities, while the multi-task learning receives several different combinations of modalities at train time. We show that multi-task learning enhances discriminability for visual and multimodal inputs while minimally impacting auditory inputs. Furthermore, we present a qualitative analysis of the obtained phone embeddings, and show that cross-modal visual input can improve the discriminability of phonetic features which are visually discernable (rounding, open/close, labial place of articulation), resulting in representations that are closer to abstract linguistic features than those based on audio only.
  • pdf de Diego-Balaguer, R., Schramm, C., Rebeix, I., Dupoux, E., Durr, A., Brice, A., Charles, P., Cleret de Langavant, L., Youssov, K., Verny, C., Damotte, V., Azulay, J.P., Goizet, C., Simonin, C., Tranchant, C., Maison, P., Rialland, A., Schmitz, D., Jacquemot, C., Fontaine, B. & Bachoud-Lévi, A.C. (2016). COMT Val158Met Polymorphism Modulates Huntington's Disease Progression. Plos One, 11(9), e0161106.
  • pdf Zeghidour, N., Synnaeve, G., Versteegh, M. & Dupoux, E. (2016). A Deep Scattering Spectrum - Deep Siamese Network Pipeline For Unsupervised Acoustic Modeling. In ICASSP-2016, (pp 4965-4969) . [abstract] ABSTRACT = Recent work has explored deep architectures for learning acoustic features in an unsupervised or weakly supervised way for phone recognition. Here we investigate the role of the input features, and in particular we test whether standard mel-scaled filterbanks could be replaced by inherently richer representations, such as derived from an analytic scattering spectrum. We use a Siamese network using lexical side information similar to a well performing architecture used in the Zero Resource Speech Challenge (2015), and show a substantial improvement when the filterbanks are replaced by scattering features, even though these features yield similar performance when tested without training. This shows that unsupervised and weakly-supervised architectures can benefit from richer features than the traditional ones.
  • pdf Zeghidour, N., Synnaeve, G., Usunier, N. & Dupoux, E. (2016). Joint Learning of Speaker and Phonetic Similarities with Siamese Networks. In INTERSPEECH-2016, (pp 1295-1299) . [abstract] ABSTRACT = Recent work has demonstrated, on small datasets, the feasibility of jointly learning specialized speaker and phone embeddings, in a weakly supervised siamese DNN architecture using word and speaker identity as side information. Here, we scale up these architectures to the 360 hours of the Librispeech corpus by implementing a sampling method to efficiently select pairs of words from the dataset and improving the loss function. We also compare the standard siamese networks fed with same (AA) or different (AB) pairs, to a 'triamese' network fed with AAB triplets. We use ABX discrimination tasks to evaluate the discriminability and invariance properties of the obtained joined embeddings, and compare these results with mono-embeddings architectures. We find that the joined embeddings architectures succeed in effectively disentangling speaker from phoneme information, with around 10% errors for the matching tasks and embeddings (speaker task on speaker embeddings, and phone task on phone embedding) and near chance for the mismatched task. Furthermore, the results carry over in out-of-domain datasets, even beating the best results obtained with similar weakly supervised techniques.
  • pdf Versteegh, M., Anguera, X., Jansen, A. & Dupoux, E. (2016). The Zero Resource Speech Challenge 2015: Proposed Approaches and Results. In SLTU-2016 Procedia Computer Science, 81, (pp 67-72) . [abstract] This paper reports on the results of the Zero Resource Speech Challenge 2015, the first unified benchmark for zero resource speech technology, which aims at the unsupervised discovery of subword and word units from raw speech. This paper dis- cusses the motivation for the challenge, its data sets, tasks and baseline systems. We outline the ideas behind the systems that were submitted for the two challenge tracks: unsuper- vised subword unit modeling and spoken term discovery, and summarize their results. The results obtained by participating teams show great promise; many systems beat the provided baselines and some even perform better than comparable su- pervised systems.
  • pdf Synnaeve, G. & Dupoux, E. (2016). A temporal coherence loss function for learning unsupervised acoustic embeddings. In SLTU-2016 Procedia Computer Science, 81, (pp 95-100) . [abstract] ABSTRACT = We train Neural Networks of varying depth with a loss function which imposes the output representations to have a temporal profile which looks like that of phonemes. We show that a simple loss function which maximizes the dissimilarity between near frames and long distance frames helps to construct a speech embedding that improves phoneme discriminability, both within and across speakers, even though the loss function only uses within speaker information. However, with too deep an architecture, this loss function yields overfitting, suggesting the need for more data and/or regularization.
  • pdf Ogawa, T., Mallidi, S.H., Dupoux, E., Cohen, J., Feldman, N. & Hermansky, H. (2016). A new efficient measure for accuracy prediction and its application to multistream-based unsupervised adaptation. In ICPR. [abstract] ABSTRACT = Abstract---A new efficient measure for predicting estimation accuracy is proposed and successfully applied to multistream-based unsupervised adaptation of ASR systems to address data uncertainty when the ground-truth is unknown. The proposed measure is an extension of the M-measure, which predicts confidence in the output of a probability estimator by measuring the divergences of probability estimates spaced at specific time intervals. In this study, the M-measure was extended by considering the latent phoneme information, resulting in an improved reliability. Experimental comparisons carried out in a multistream-based ASR paradigm demonstrated that the extended M-measure yields a significant improvement over the original M-measure, especially under narrow-band noise conditions.
  • pdf Ludusan, B., Cristia, A., Martin, A., Mazuka, R. & Dupoux, E. (2016). Learnability of prosodic boundaries: Is infant-directed speech easier? Journal of the Acoustical Society of America, 140(2), 1239-1250. [abstract] ABSTRACT = This study explores the long-standing hypothesis that the acoustic cues to prosodic boundaries in infant-directed speech (IDS) make those boundaries easier to learn than those in adult-directed speech (ADS). Three cues (pause duration, nucleus duration and pitch change) were investigated, by means of a systematic review of the literature, statistical analyses of a new corpus, and machine learning experiments. The review of previous work revealed that the effect of register on boundary cues is less well established than previously thought, and that results often vary across studies for certain cues. Statistical analyses run on a large database of mother-child and mother-interviewer interactions showed that the duration of a pause and the duration of the syllable nucleus preceding the boundary are two cues which are enhanced in IDS, while f0 change is actually degraded in IDS. Supervised and unsupervised machine learning techniques applied to these acoustic cues revealed that IDS boundaries were consistently better classified than ADS ones, regardless of the learning method used. The role of the cues examined in this study and the importance of these findings in the more general context of early linguistic structure acquisition is discussed.
  • pdf Ludusan, B. & Dupoux, E. (2016). The role of prosodic boundaries in word discovery: Evidence from a computational model. Journal of the Acoustical Society of America, 140(1), EL1. [abstract] ABSTRACT = This study aims to quantify the role of prosodic boundaries in early language acquisition using a computational modeling approach. A spoken term discovery system that models early word learning was used with and without a prosodic component on speech corpora of English, Spanish, and Japanese. The results showed that prosodic information induces a consistent improvement both in the alignment of the terms to actual word boundaries and in the phonemic homogeneity of the discovered clusters of terms. This benefit was found also when automatically discovered prosodic boundaries were used, boundaries which did not perfectly match the linguistically defined ones.
  • pdf Ludusan, B. & Dupoux, E. (2016). Automatic syllable segmentation using broad phonetic class information. In SLTU-2016 Procedia Computer Science, 81, (pp 101-106) . [abstract] ABSTRACT = We propose in this paper a language-independent method for syllable segmentation. The method is based on the Sonor- ity Sequencing Principle, by which the sonority inside a syl- lable increases from its boundaries towards the syllabic nu- cleus. The sonority function employed was derived from the posterior probabilities of a broad phonetic class recognizer, trained with data coming from an open-source corpus of En- glish stories. We tested our approach on English, Spanish and Catalan and compared the results obtained to those given by an energy-based system. The proposed method outperformed the energy-based system on all three languages, showing a good generalizability to the two unseen languages. We con- clude with a discussion of the implications of this work for under-resourced languages.
  • pdf Linzen, T., Dupoux, E. & Spector, B. (2016). Quantificational features in distributional word representations. In Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics, (pp pages 1 -- 1-11) . [abstract] ABSTRACT = We present in this paper an evaluation of the role of prosodic boundaries in the process of unsupervised word discovery. The tests performed on a corpus of English broadcast news showed that the system precision increases systematically when prosodic boundaries are incorporated, with respect to the baseline. We also investigated whether pauses, a simpler phenomenon to extract automatically, would offer the same advantage, and we discovered that prosodic boundaries offer more information to the word discovery process.
  • pdf Linzen, T., Dupoux, E. & Goldberg, Y. (2016). Assessing the ability of LSTMs to learn syntax-sensitive dependencies. Transactions of the Association for Computational Linguistics, 4, 521-535. [abstract] ABSTRACT = We present in this paper an evaluation of the role of prosodic boundaries in the process of unsupervised word discovery. The tests performed on a corpus of English broadcast news showed that the system precision increases systematically when prosodic boundaries are incorporated, with respect to the baseline. We also investigated whether pauses, a simpler phenomenon to extract automatically, would offer the same advantage, and we discovered that prosodic boundaries offer more information to the word discovery process.
  • pdf Gvozdic, K., Moutier, S., Dupoux, E. & Buon, M. (2016). Priming Children's Use of Intentions in Moral Judgement with Metacognitive Training. Frontiers in Language Sciences, 7(190).
  • pdf Fourtassi, A. & Dupoux, E. (2016). The role of word-word co-occurrence in word learning. In Proceedings of the 38th Annual Conference of the Cognitive Science Society, (pp 662-667) . [abstract] ABSTRACT = A growing body of research on early word learning suggests that learners gather word-object co-occurrence statistics across learning situations. Here we test a new mechanism whereby learners are also sensitive to word-word co-occurrence statistics. Indeed, we find that participants can infer the likely referent of a novel word based on its co-occurrence with other words, in a way that mimics a machine learning algorithm dubbed `zero-shot learning'. We suggest that the interaction between referential and distributional regularities can bring robustness to the process of word acquisition.
  • pdf Dunbar, E. & Dupoux, E. (2016). Geometric constraints on human speech sound inventories. Frontiers in Psychology, 7(1061). [abstract] We investigate the idea that the languages of the world have developed coherent sound systems in which having one sound increases or decreases the chances of having certain other sounds, depending on shared properties of those sounds. We investigate the geometries of sound systems that are defined by the inherent properties of sounds. We document three typological tendencies in sound system geometries: economy, a tendency for the differences between sounds in a system to be definable on a relatively small number of independent dimensions; local symmetry, a tendency for sound systems to have relatively large numbers of pairs of sounds that differ only on one dimension; and global symmetry, a tendency for sound systems to be relatively balanced. The finding of economy corroborates previous results; the two symmetry properties have not been previously documented. We also investigate the relation between the typology of inventory geometries and the typology of individual sounds, showing that the frequency distribution with which individual sounds occur across languages works in favour of both local and global symmetry.
  • pdf Carbajal, J., Fér, R. & Dupoux, E. (2016). Modeling language discrimination in infants using i-vector representations. In Proceedings of the 38th Annual Conference of the Cognitive Science Society, (pp 889-896) . [abstract] ABSTRACT = Experimental research suggests that at birth infants can discriminate two languages if they belong to different rhythmic classes, and by 4 months of age they can discriminate two languages within the same class provided they have been previously exposed to at least one of them. In this paper, we present a novel application of speech technology tools to model language discrimination, which may help to understand how infants achieve this task. By combining a Gaussian Mixture Model of the acoustic space and low-dimensional representations of novel utterances with a model of a habituation paradigm, we show that brief exposure to French does not allow to discriminate between two previously unheard languages belonging to the same rhythmic class, but allows to discriminate two languages across rhythmic class. The implications of these findings are discussed.
  • pdf Carbajal, J., Dawud, A., Thiollière, R. & Dupoux, E. (2016). The 'Language Filter' Hypothesis: Modeling Language Separation in Infants using I-vectors. In EPIROB 2016, (pp 195-201) . [abstract] ABSTRACT = Experimental research suggests that at birth infants can discriminate two languages if they belong to different rhythmic classes, and by 4 months of age they can discriminate two languages within the same class provided they have been previously exposed to at least one of them. In this paper, we present a novel application of speech technology tools to model language discrimination, which may help to understand how infants achieve this task. By combining a Gaussian Mixture Model of the acoustic space and low-dimensional representations of novel utterances with a model of a habituation paradigm, we show that brief exposure to French does not allow to discriminate between two previously unheard languages belonging to the same rhythmic class, but allows to discriminate two languages across rhythmic class. The implications of these findings are discussed.
  • pdf Bergmann, C., Cristia, A. & Dupoux, E. (2016). Discriminability of sound contrasts in the face of speaker variation quantified. In Proceedings of the 38th Annual Conference of the Cognitive Science Society, (pp 1331-1336) . [abstract] ABSTRACT = How does a naive language learner deal with speaker variation irrelevant to distinguish word meanings? Experimental data is conflicting and incompatible models have been proposed. In this paper we examine the basic assumptions of these models regarding the signal the learner deals with: Is speaker variability a hurdle in discriminating sounds or can it easily be abstracted over? To this end we summarize existing infant data and compare them to machine-based discriminability scores of sound pairs obtained without added language knowledge. Our results show consistently that speaker variability decreases sound contrast discriminability, and that some pairs are affected more than others. Further, chance performance is a rare exception; contrasts remain discriminable in the face of speaker variation. Our data offer a way to reunite seemingly conflicting findings in the infant literature and show a path forward in testing whether and how speaker variation plays a role for language acquisition.
  • pdf Versteegh, M., Thiollière, R., Schatz, T., Cao, X.N., Anguera, X., Jansen, A. & Dupoux, E. (2015). The Zero Resource Speech Challenge 2015. In INTERSPEECH-2015, (pp 3169-3173) . [abstract] ABSTRACT = The Interspeech 2015 Zero Resource Speech Challenge aims at discovering subword and word units from raw speech. The challenge provides the first unified and open source suite of evaluation metrics and data sets to compare and analyse the results of unsupervised linguistic unit discovery algorithms. It consists of two tracks. In the first, a psychophysically inspired evaluation task (minimal pair ABX discrimination) is used to assess how well speech feature representations discriminate between contrastive subword units. In the second, several metrics gauge the quality of discovered word-like patterns. Two data sets are provided, one for English, one for Xitsonga. Both data sets are provided without any annotation except for voice activity and talker identity. This paper introduces the evaluation metrics, presents the results of baseline systems and discusses some of the key issues in unsupervised unit discovery.
  • pdf Thiollière, R., Dunbar, E., Synnaeve, G., Versteegh, M. & Dupoux, E. (2015). A Hybrid Dynamic Time Warping-Deep Neural Network Architecture for Unsupervised Acoustic Modeling. In INTERSPEECH-2015, (pp 3179-3183) . [abstract] ABSTRACT = We report on an architecture for the unsupervised discovery of talker-invariant subword embeddings. It is made out of two components: a dynamic-time warping based spoken term discovery (STD) system and a Siamese deep neural network (DNN). The STD system clusters word-sized repeated fragments in the acoustic streams while the DNN is trained to minimize the distance between time aligned frames of tokens of the same cluster, and maximize the distance between tokens of different clusters. We use additional side information regarding the average duration of phonemic units, as well as talker identity tags. For evaluation we use the datasets and metrics of the Zero Resource Speech Challenge. The model shows improvement over the baseline in subword unit modeling.
  • pdf Michon, E., Dupoux, E. & Cristia, A. (2015). Salient dimensions in implicit phonotactic learning. In INTERSPEECH-2015, (pp 2665-2669) . [abstract] ABSTRACT = Adults are able to learn sound co-occurrences without conscious knowledge after brief exposures. But which dimensions of sounds are most salient in this process? Using an artificial phonology paradigm, we explored potential learnability differences involving consonant-, speaker-, and tone-vowel co-occurrences. Results revealed that participants, whose native language was not tonal, implicitly encoded consonant-vowel patterns with a high level of accuracy; were above chance for tone-vowel co-occurrences; and were at chance for speaker-vowel co-occurrences. This pattern of results is exactly what would be expected if both language-specific experience and innate biases to encode potentially contrastive linguistic dimensions affect the salience of different dimensions during implicit learning of sound patterns.
  • pdf Martin, A., Schatz, T., Versteegh, M., Miyazawa, K., Mazuka, R., Dupoux, E. & Cristia, A. (2015). Mothers speak less clearly to infants: A comprehensive test of the hyperarticulation hypothesis. Psychological Science, 26(3), 341-347. [abstract] ABSTRACT = Infants learn language at an incredible speed, and one of the first steps in this voyage includes learning the basic sound units of their native language. It is widely thought that caregivers facilitate this task by hyperarticulating when speaking to their infants. Utilizing state-of-the-art speech technology, we address this key theoretical question: Are sound categories clearer in infant- than in adult-directed speech? A comprehensive examination of sound contrasts in a large corpus of spontaneous Japanese demonstrates that there is a small but significant tendency for contrasts in infant-directed speech to be less clear than those in adult-directed speech, contrary to the idea that caregivers actively enhance phonetic categories in infant-directed speech. These results suggest that the ability to learn from noisy data must be a crucial component of plausible theories of infant language acquisition.
  • pdf Ludusan, B., Synnaeve, G. & Dupoux, E. (2015). Prosodic boundary information helps unsupervised word segmentation. In NAACL HLT 2015, (pp 953-963) .
  • pdf Ludusan, B., Seidl, A., Dupoux, E. & Cristia, A. (2015). Motif discovery in infant- and adult-directed speech. In Proceedings of CogACLL2015, (pp 93-102) . [abstract] ABSTRACT = Infant-directed speech (IDS) is thought to play a key role in determining infant language acquisition. It is thus important to describe to what extent it differs from adult-directed speech (ADS) in dimensions that could affect learnability. In this paper, we explore how an acoustic motif discovery algorithm fares when presented with spontaneous speech from both registers. Results show small but significant differences in performance, with lower recall and higher fragmentation in IDS than ADS. Such a result is inconsistent with a view of IDS where clarity and ease of lexical recognition is a primary consideration. Additionally, it predicts that learners who extract acoustic word-forms should do worse with IDS than ADS. Similarities and differences with human infants' performance on word segmentation tasks are discussed.
  • pdf Ludusan, B., Origlia, A. & Dupoux, E. (2015). Rhythm-Based Syllabic Stress Learning without Labelled Data. In Proceedings of Statistical Language and Speech Processing -SLSP 2015, (pp 185-196) . [abstract] ABSTRACT = In this paper we propose a method for syllabic stress annotation which does not require manual labels for the learning process, but uses stress labels automatically generated from a multiscale model of rhythm perception. The model gives in its output a sequence of events, corresponding the sequences of strong-weak syllables present in speech, based on which a stressed/unstressed decision is taken. We tested our approach on two languages, Catalan and Spanish, and we found that a supervised system employing the automatic labels for learning improves the performance over the baseline, for both languages. We also compared the results of this system with that of an identical learning algorithm, but which employs manual labels for stress, as well as to that of an unsupervised learning algorithm using the same features. It showed that the system using automatic labels has a similar performance to the one using manual labels, with both supervised systems outperforming the clustering algorithm.
  • pdf Ludusan, B., Caranica, A., Cucu, H., Buzo, A., Burileanu, C. & Dupoux, E. (2015). Exploring multi-language resources for unsupervised spoken term discovery. In Speech Technology and Human-Computer Dialogue (SpeD), 2015 International Conference on, (pp 1-6) . [abstract] With information processing and retrieval of spoken documents becoming an important topic, there is a need of systems performing automatic segmentation of audio streams. Among such algorithms, spoken term discovery allows the extraction of word-like units (terms) directly from the continuous speech signal, in an unsupervised manner and without any knowledge of the language at hand. Since the performance of any downstream application depends on the goodness of the terms found, it is relevant to try to obtain higher quality automatic terms. In this paper we investigate whether the use input features derived from of multi-language resources helps the process of term discovery. For this, we employ an open-source phone recognizer to extract posterior probabilities and phone segment decisions, for several languages. We examine the features obtained from a single language and from combinations of languages based on the spoken term discovery results attained on two different datasets of English and Xitsonga. Furthermore, a comparison to the results obtained with standard spectral features is performed and the implications of the work discussed.
  • pdf Ludusan, B. & Dupoux, E. (2015). A multilingual study on intensity as a cue for marking prosodic boundaries. In ICPhS, (pp e982) . [abstract] ABSTRACT = Speech intensity is one of the main prosodic cues, playing a role in most of the suprasegmental phenomena. Despite this, its contribution to the signalling of prosodic hierarchy is still relatively under-studied, compared to the other cues, like duration or fundamental frequency. We present here an investigation on the role of intensity in prosodic boundary detection in four different languages, by testing several intensity measures. The statistical analysis performed showed significant correlates of prosodic boundaries, for most intensity measures employed and in all languages. Our findings were further validated with a classification experiment in which the boundary/non-boundary distinction was learned in unsupervised manner, using only intensity cues. It showed that intensity range measures outperform absolute intensity measures, with the total intensity range being consistently the best feature.
  • pdf Johnson, M., Pater, J., Staub, R. & Dupoux, E. (2015). Sign constraints on feature weights improve a joint model of word segmentation and phonology. In NAACL HLT 2015, (pp 303-313) . [abstract] ABSTRACT = This paper describes a joint model of word segmentation and phonological alternations, which takes unsegmented utterances as input and infers word segmentations and underlying phonological representations. The model is a Maximum Entropy or log-linear model, which can express a probabilistic version of Optimality Theory (OT; Prince and Smolensky, 2004), a standard phonological framework. The features in our model are inspired by OT's Markedness and Faithfulness constraints. Following the OT principle that such features indicate ``violations'', we require their weights to be non-positive. We apply our model to a modified version of the Buckeye corpus (Pitt et al., 2007) in which the only phonological alternations are deletions of word-final /d/ and /t/ segments. The model sets a new state-of-the-art for this corpus for word segmentation, identification of underlying forms, and identification of /d/ and /t/ deletions. We also show that the OT-inspired sign constraints on feature weights are crucial for accurate identification of deleted /d/s; without them our model posits approximately 10 times more deleted underlying /d/s than appear in the manually annotated data.
  • pdf Hermansky, H., Burget, L., Cohen, J., Dupoux, E., Feldman, N., Godfrey, J., Khudanpur, S., Maciejewski, M., Mallidi, S.H., Menon, A., Ogawa, T., Peddinti, V., Rose, R., Stern, R., Wiesner, M. & Vesely, K. (2015). Towards machines that know when they do not know: Summary of work done at 2014 Frederick Jelinek memorial workshop in Prague. In ICASSP-2015 (IEEE International Conference on Acoustics Speech and Signal Processing), (pp 5009-5013) . [abstract] ABSTRACT = A group of junior and senior researchers gathered as a part of the 2014 Frederick Jelinek Memorial Workshop in Prague to address the problem of predicting the accuracy of a nonlinear Deep Neural Network probability estimator for unknown data in a different application domain from the domain in which the estimator was trained. The paper describes the problem and summarizes approaches that were taken by the group.
  • pdf Dunbar, E., Synnaeve, G. & Dupoux, E. (2015). Quantitative methods for comparing featural representations. In ICPhS, (pp paper number 1024) . [abstract] ABSTRACT = The basic representational hypothesis in phonology is that segments are coded using a universal set of discrete features. We propose a method for quantitatively measuring how well such features align with arbitrary segment representations. We assess articulatory, spectral, and phonotactic representations of English consonants. Our procedure constructs a concrete representation of a feature in terms of the pairs it distinguishes, and can be extended to any pair of representations to test the consistency of one with the individual dimensions of the other. We validate the method on our phonetic representations and then show that major natural classes are not well represented in the surface phonotactics.
  • pdf Synnaeve, G., Versteegh, M. & Dupoux, E. (2014). Learning words from images and speech. In NIPS Workshop on Learning Semantics. [abstract] ABSTRACT = The Interspeech 2015 Zero Resource Speech Challenge aims at discovering subword and word units from raw speech. The challenge provides the first unified and open source suite of evaluation metrics and data sets to compare and analyse the results of unsupervised linguistic unit discovery algorithms. It consists of two tracks. In the first, a psychophysically inspired evaluation task (minimal pair ABX discrimination) is used to assess how well speech feature representations discriminate between contrastive subword units. In the second, several metrics gauge the quality of discovered word-like patterns. Two data sets are provided, one for English, one for Xitsonga. Both data sets are provided without any annotation except for voice activity and talker identity. This paper introduces the evaluation metrics, presents the results of baseline systems and discusses some of the key issues in unsupervised unit discovery.
  • pdf Synnaeve, G., Schatz, T. & Dupoux, E. (2014). Phonetics embedding learning with side information. In IEEE Spoken Language Technology Workshop, (pp 106 - 111) . [abstract] We show that it is possible to learn an efficient acoustic model using only a small amount of easily available word-level similarity nnotations. In contrast to the detailed phonetic label- ing required by classical speech recognition technologies, the only information our method requires are pairs of speech ex- cerpts which are known to be similar (same word) and pairs of speech excerpts which are known to be different (different words). An acoustic model is obtained by training shallow and deep neural networks, using an architecture and a cost function well-adapted to the nature of the provided informa- tion. The resulting model is evaluated on an ABX minimal- pair discrimination task and is shown to perform much better (11.8% ABX error rate) than raw speech features (19.6%), not far from a fully supervised baseline (best neural network: 9.2%, HMM-GMM: 11%).
  • pdf Synnaeve, G., Dautriche, I., Boerschinger, B., Johnson, M. & Dupoux, E. (2014). Unsupervised word segmentation in context. In Proceedings of 25th International Conference on Computational Linguistics (CoLing), (pp 2326-2334) . [abstract] ABSTRACT = This paper extends existing word segmentation models to take non-linguistic context into account. It improves the token F-score of well-performing segmentation models by 2.5% on a 27k utterances dataset. We posit that word segmentation is easier in-context because the learner is not trying to access irrelevant lexical items. We use topics from Latent Dirichlet Allocation as a proxy for activities context, to label the Providence corpus. We present Adaptor Grammar models that use these context labels, and we study their performance with and without context annotations at test time.
  • pdf Schatz, T., Peddinti, V., Cao, X.N., Bach, F., Hermansky, H. & Dupoux, E. (2014). Evaluating speech features with the Minimal-Pair ABX task (II): Resistance to noise. In INTERSPEECH-2014, (pp 915-919) . [abstract] ABSTRACT = The Minimal-Pair ABX (MP-ABX) paradigm has been proposed as a method for evaluating speech features for zero-resource/unsupervised speech technologies. We apply it in a phoneme discrimination task on the Articulation Index corpus to evaluate the resistance to noise of various speech features. In Experiment 1, we evaluate the robustness to additive noise at different signal-to-noise ratios, using car and babble noise from the Aurora-4 database and white noise. In Experiment 2, we examine the robustness to different kinds of convolutional noise. In both experiments we consider two classes of techniques to induce noise resistance: smoothing of the time-frequency representation and short-term adaptation in the time-domain. We consider smoothing along the spectral axis (as in PLP) and along the time axis (as in FDLP). For short-term adaptation in the time-domain, we compare the use of a static compressive non-linearity followed by RASTA filtering to an adaptive compression scheme.
  • pdf Ludusan, B., Versteegh, M., Jansen, A., Gravier, G., Cao, X.N., Johnson, M. & Dupoux, E. (2014). Bridging the gap between speech technology and natural language processing: an evaluation toolbox for term discovery systems. In Proceedings of LREC 2014, (pp 560-567) . [abstract] ABSTRACT = The unsupervised discovery of linguistic terms from either continuous phoneme transcriptions or from raw speech has seen an increasing interest in the past years both from a theoretical and a practical standpoint. Yet, there exists no common accepted evaluation method for the systems performing term discovery. Here, we propose such an evaluation toolbox, drawing ideas from both speech technology and natural language processing. We first transform the speech-based output into a symbolic representation and compute five types of evaluation metrics on this representation: the quality of acoustic matching, the quality of the clusters found, and the quality of the alignment with real words (type, token, and boundary scores). We tested our approach on two term discovery systems taking speech as input, and one using symbolic input. The latter was run using both the gold transcription and a transcription obtained from anautomatic speech recognizer, in order to simulate the case when only imperfect symbolic information is available. The results obtained are analysed through the use of the proposed evaluation metrics and the implications of these metrics are discussed.
  • pdf Ludusan, B., Gravier, G. & Dupoux, E. (2014). Incorporating Prosodic Boundaries in Unsupervised Term Discovery. In Proceedings of Speech Prosody, 7, (pp 939-943) . [abstract] We present a preliminary investigation on the usefulness of prosodic boundaries for unsupervised term discovery (UTD). Studies in language acquisition show that infants use prosodic boundaries to segment continuous speech into word-like units. We evaluate whether such a strategy could also help UTD algorithms. Running a previously published UTD algorithm (MODIS) on a corpus of prosodically annotated English broadcast news revealed that many discovered terms straddle prosodic boundaries. We then implemented two variants of this algorithm: one that discards straddling items and one that truncates them to the nearest boundary (either prosodic or pause marker). Both algorithms showed a better term matching Fscore compared to the baseline and higher level prosodic boundaries were found to be better than lower level boundaries or pause markers. In addition, we observed that the truncation algorithm, but not the discard algorithm, increased word boundary F-score over the baseline.
  • pdf Ludusan, B. & Dupoux, E. (2014). Towards Low Resource Prosodic Boundary Detection. In Proceedings of International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU'14), (pp 231-237) . [abstract] ABSTRACT = In this study we propose a method of prosodic boundary detection based only on acoustic cues which are easily extractable from the speech signal and without any supervision. Drawing a parallel between the process of language acquisition in babies and the speech processing techniques for under-resourced languages, we take advantage of the findings of several psycholinguistic studies relative to the cues used by babies for the identification of prosodic boundaries. Several durational and pitch cues were investigated, by themselves or in a combination, and relatively good performances were achieved. The best result obtained, a combination of all the cues, compares well against a previously proposed approach, without relying on any learning method or any lexical or syntactic cues.
  • pdf Johnson, M., Christophe, A., Demuth, K. & Dupoux, E. (2014). Modelling function words improves unsupervised word segmentation. In Proceedings of the 52nd Annual meeting of the ACL, (pp 282--292) . [abstract] ABSTRACT = Inspired by experimental psychological findings suggesting that function words play a special role in word learning, we make a simple modification to an Adaptor Grammar based Bayesian word segmentation model to allow it to learn sequences of monosyllabic "function words" at the beginnings and endings of collocations of (possibly multi-syllabic) words. This modification improves unsupervised word segmentation on the standard Bernstein-Ratner (1987) corpus of child-directed English by more than 4% token f-score compared to a model identical except that it does not special-case "function words", setting a new state-of-the-art of 92.4% token f-score. Our function word model assumes that function words appear at the left periphery, and while this is true of languages such as English, it is not true universally. We show that a learner can use Bayesian model selection to determine the location of function words in their language, even though the input to the model only consists of unsegmented sequences of phones. Thus our computational models support the hypothesis that function words play a special role in word learning.
  • pdf Fourtassi, A., Schatz, T., Varadarajan, B. & Dupoux, E. (2014). Exploring the Relative Role of Bottom-up and Top-down Information in Phoneme Learning. In Proceedings of the 52nd Annual meeting of the ACL, 2, (pp 1-6) Association for Computational Linguistics. [abstract] We test both bottom-up and top-down approaches in learning the phonemic status of the sounds of English and Japanese. We used large corpora of spontaneous speech to provide the learner with an input that models both the linguistic properties and statistical regularities of each language. We found both approaches to help discriminate between allophonic and phonemic contrasts with a high degree of accuracy, although top-down cues proved to be effective only on an interesting subset of the data. cues based of the properties of the lexicon. We test their performance in a task that consists on discriminating within category contrasts from between category contrasts. Finally we discuss the role and scope of each approach in learning phonemes.
  • pdf Fourtassi, A., Dunbar, E. & Dupoux, E. (2014). Self Consistency as an Inductive Bias in Early Language Acquisition. In Proceedings of the 36th Annual Meeting of the Cognitive Science Society, (pp 469-474) . [abstract] ABSTRACT = In this paper we introduce an inductive bias for language acquisition. It is based on a holistic approach, whereby the levels of representations are not treated in isolation, but as different interacting parts. The best representation of the sound system is the one that leads to the best lexicon, defined as the one that sustains the most coherent semantics. We quantify this coherence through an intrinsic and unsupervised measure called "Self Consistency". We found this measure to be optimal under the true phonemic inventory and the correct word segmentation in English and Japanese.
  • pdf Fourtassi, A. & Dupoux, E. (2014). A Rudimentary Lexicon and Semantics Help Bootstrap Phoneme Acquisition. In Proceedings of the 18th Conference on Computational Natural Language Learning (CoNLL), (pp 191-200) Association for Computational Linguistics. [abstract] Infants spontaneously discover the relevant phonemes of their language without any direct supervision. This acquisition is puzzling because it seems to require the availability of high levels of linguistic structures (lexicon, semantics), that logically suppose the infants having a set of phonemes already. We show how this circularity can be broken by testing, in real-size language corpora, a scenario whereby infants would learn approximate representations at all levels, and then refine them in a mutual constraining way. We start with corpora of spontaneous speech that have been encoded in a varying number of detailed context-dependent allophones. We derive an approximate lexicon and a rudimentary semantic representation. Despite the fact that all these representations are poor approximations of the ground truth, they help reorganize the fine grained categories into phoneme-like categories with a ahigh degree of accuracy.
  • pdf Cristia, A., Minagawa-Kawai, Y., Vendelin, I., Cabrol, D. & Dupoux, E. (2014). Responses to vocalizations and auditory controls in the human newborn brain. Plos One, 9(12), e115162. [abstract] The functional organization of the human adult brain allows selective activation of specific regions in response to stimuli. In the adult, linguistic processing has been associated with left-dominant activations in perisylvian regions, whereas emotional vocalizations can give place to right-dominant activation in posterior temporal cortices. Near Infrared Spectroscopy was used to register the response of 40 newborns' temporal regions when stimulated with speech, human and macaque emotional vocalizations, and auditory controls where the formant structure was destroyed but the long-term spectrum was retained. Speech elicited left-dominant activation in one channel in left posterior temporal cortices, as well as in more anterior, deeper tissue with no clear lateralization. Emotional vocalizations induced left-dominant, large activations in more anterior regions, and induced activation. Finally, activation elicited by the control stimuli was right-dominant, and more variable across infants. Overall, these results suggest that left-dominance for speech processing in newborns may be partially modulated by the presence of formant structure, which is shared between speech and non-linguistic vocalizations. Moreover, they indicate that development plays an important role in shaping the cortical networks involved in the processing of emotional vocalizations.
  • pdf Cristia, A., Minagawa-Kawai, Y., Egorova, N., Gervain, J., Filippin, L., Cabrol, D. & Dupoux, E. (2014). Neural correlates of infant dialect discrimination: A fNIRS study. Developmental Science, 17(4), 628-635. [abstract] ABSTRACT = The present study investigated the neural correlates of infant discrimination of very similar linguistic varieties (Quebecois and Parisian French) using functional Near InfraRed Spectroscopy. In line with previous behavioral and electrophysiological data, there was no evidence that 3-month-olds discriminated the two regional accents, whereas 5-month-olds did, with the locus of discrimination in left anterior perisylvian regions. These neuroimaging results suggest that a developing language network relying crucially on left perisylvian cortices sustains infants' discrimination of similar linguistic varieties within this early period of infancy.
  • pdf Buon, M., Jacob, P., Margules, S., Brunet, I., Dutat, M., Cabrol, D. & Dupoux, E. (2014). Friend or foe? Early social evaluation of human interactions PloS One, 9(2), e88612. [abstract] ABSTRACT = We report evidence that 29-month-old toddlers and preverbal 10-month-old human infants discriminate between two agents, a pro-social agent, who performs a positive action on a human patient and a negative action on an inanimate object, and an anti-social agent, who does the opposite. Furthermore the evidence shows that they prefer the former to the latter even though the agents perform the same bodily movements. Given that humans can be threats to their conspecifics, we discuss this finding in light of the likely adaptive value of the ability to detect harmful human agents.
  • pdf Schatz, T., Peddinti, V., Bach, F., Jansen, A., Hermansky, H. & Dupoux, E. (2013). Evaluating speech features with the Minimal-Pair ABX task: Analysis of the classical MFC/PLP pipeline. In INTERSPEECH-2013, (pp 1781-1785) . [abstract] ABSTRACT = We present a new framework for the evaluation of speech representations in zero-resource settings, that extends and complements previous work by Carlin, Jansen and Hermansky [1]. In particular, we replace their Same/Different discrimination task by several Minimal-Pair ABX (MP-ABX) tasks. We explain the analytical advantages of this new framework and apply it to decompose the standard signal processing pipelines for computing PLP and MFC coefficients. This method enables us to confirm and quantify a variety of well-known and not-so-well-known results in a single framework.
  • pdf Ngon, C., Martin, A., Dupoux, E., Cabrol, D. & Peperkamp, S. (2013). Nonwords, nonwords, nonwords: Evidence for a proto-lexicon during the first year of life. Developmental Science, 16(1), 24-34. [abstract] ABSTRACT = Previous research with artificial language learning paradigms has shown that infants are sensitive to statistical cues to word boundaries (Saffran, Aslin & Newport, 1996) and that they can use these cues to extract word-like units (Saffran, 2001). However, it is unknown whether infants use statistical information to construct a recognition lexicon when acquiring their native language. In order to investigate this issue, we rely on the fact that besides real words a statistical algorithm extracts sound sequences that are highly frequent in infant-directed speech but constitute nonwords. In two experiments, we use a preferential listening paradigm to test French-learning 11-month-old infants' recognition of highly frequent disyllabic sequences from their native language. In Experiment 1, we use nonword stimuli and find that infants listen longer to high-frequency than to low-frequency sequences. In Experiment 2, we compare high-frequency nonwords to real words in the same frequency range, and find that infants show no preference. Thus, at 11 months, French-learning infants recognize highly frequent sound sequences from their native language and fail to differentiate between words and nonwords among these sequences. These results are evidence that they have used statistical information to extract word candidates from their input and store them in a ``proto-lexicon'', containing both words and nonwords.
  • pdf Minagawa-Kawai, Y., Cristia, A., Long, B., Vendelin, I., Hakuno, Y., Dutat, M., Filippin, L., Cabrol, D. & Dupoux, E. (2013). Insights on NIRS sensitivity from a cross-linguistic study on the emergence of phonological grammar. Frontiers in Language Sciences, 4(170), 10.3389/fpsyg.2013.00170. [abstract] ABSTRACT = Each language has a unique set of phonemic categories and phonotactic rules which determine permissible sound sequences in that language. Behavioral research demonstrates that one's native language shapes the perception of both sound categories and sound sequences in adults, and neuroimaging results further indicate that the processing of native phonemes and phonotactics involves a left-dominant perisylvian brain network. Recent work using a novel technique, functional Near InfraRed Spectroscopy (NIRS), has suggested that a left-dominant network becomes evident toward the end of the first year of life as infants process phonemic contrasts. The present research project attempted to assess whether the same pattern would be seen for native phonotactics. We measured brain responses in Japanese- and French-learning infants to two contrasts: Abuna vs. Abna (a phonotactic contrast that is native in French, but not in Japanese) and Abuna vs. Abuuna (a vowel length contrast that is native in Japanese, but not in French). Results did not show a significant response to either contrast in either group, unlike both previous behavioral research on phonotactic processing and NIRS work on phonemic processing. To understand these null results, we performed similar NIRS experiments with Japanese adult participants. These data suggest that the infant null results arise from an interaction of multiple factors, involving the suitability of the experimental paradigm for NIRS measurements and stimulus perceptibility. We discuss the challenges facing this novel technique, particularly focusing on the optimal stimulus presentation which could yield strong enough hemodynamic responses when using the change detection paradigm.
  • pdf Martin, A., Peperkamp, S. & Dupoux, E. (2013). Learning Phonemes with a Proto-lexicon. Cognitive Science, 37, 103-124. [abstract] ABSTRACT = Before the end of the first year of life, infants begin to lose the ability to perceive distinctions between sounds that are not phonemic in their native language. It is typically assumed that this developmental change reflects the construction of language-specific phoneme categories, but how these categories are learned largely remains a mystery. Peperkamp, Le Calvez, Nadal, & Dupoux (2006) present an algorithm that can discover phonemes using the distributions of allophones as well as the phonetic properties of the allophones and their contexts. We show that a third type of information source, the occurrence of pairs of minimally-differing word forms in speech heard by the infant, is also useful for learning phonemic categories, and is in fact more reliable than purely distributional information in data containing a large number of allophones. In our model, learners build an approximation of the lexicon consisting of the high-frequency n-grams present in their speech input, allowing them to take advantage of top-down lexical information without needing to learn words. This may explain how infants have already begun to exhibit sensitivity to phonemic categories before they have a large receptive lexicon.
  • pdf Jansen, A., Dupoux, E., Goldwater, S., Johnson, M., Khudanpur, S., Church, K., Feldman, N., Hermansky, H., Metze, F., Rose, R., Seltzer, M., Clark, P., McGraw, I., Varadarajan, B., Bennett, E., Boerschinger, B., Chiu, J., Dunbar, E., Fourtassi, A., Harwath, D., Lee, C.y., Levin, K., Norouzian, A., Peddinti, V., Richardson, R., Schatz, T. & Thomas, S. (2013). A summary of the 2012 JH CLSP Workshop on zero resource speech technologies and models of early language acquisition. In ICASSP-2013 (IEEE International Conference on Acoustics Speech and Signal Processing), (pp 8111-8115) . [abstract] ABSTRACT = We summarize the accomplishments of a multi-disciplinary workshop exploring the computational and scientific issues surrounding zero resource (unsupervised) speech technologies and related models of early language acquisition. Centered around the tasks of phonetic and lexical discovery, we consider unified evaluation metrics, present two new approaches for improving speaker independence in the absence of supervision, and evaluate the application of Bayesian word segmentation algorithms to automatic subword unit tokenizations. Finally, we present two strategies for integrating zero resource techniques into supervised settings, demonstrating the potential of unsupervised methods to improve mainstream technologies.
  • pdf Fourtassi, A., Boerschinger, B., Johnson, M. & Dupoux, E. (2013). WhyisEnglishsoeasytosegment. In Proceedings of the 4th Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2013), (pp 1-10) . [abstract] ABSTRACT = Cross-linguistic studies on unsupervised word segmentation have consistently shown that English is easier to segment than other languages. In this paper, we propose an explanation based on the notion of segmentation ambiguity. We show that English has a very low segmentation ambiguity compared to Japanese and that this difference correlates with the segmentation performance in a unigram model. We suggest that segmentation ambiguity is linked to a trade-off between syllable structure complexity and word length distribution.
  • pdf Fourtassi, A. & Dupoux, E. (2013). A corpus-based evaluation method for Distributional Semantic Models. In Proceedings of ACL-SRW 2013, (pp 165-171) . [abstract] ABSTRACT = Evaluation methods for Distributional Semantic Models typically rely on behaviorally derived gold standards. These methods are difficult to deploy in languages with scarce linguistic/behavioral resources. We introduce a corpus-based measure that evaluates the stability of the lexical semantic similarity space using a pseudo-synonym same-different detection task and no external resources. We show that it enables to predict two behavior-based measures across a range of parameters in a Latent Semantic Analysis model.
  • pdf Cristia, A., Dupoux, E., Hakuna, Y., Lloyd-Fox, S., Schuetze, M., Kivits, J., Bergvelt, T., van Gelder, M., Filippin, L., Charron, S. & Minagawa-Kawai, Y. (2013). An online database of infant functional Near InfraRed Spectroscopy studies: A community-augmented systematic review. PLoS One, 8(3), e58906.
  • pdf Buon, M., Jacob, P., Loissel, E. & Dupoux, E. (2013). A non-mentalistic cause-based heuristic in human social evaluations. Cognition, 126(2), 149-155.
  • pdf Buon, M., Dupoux, E., Jacob, P., Chaste, P., Leboyer, M. & Zalla, T. (2013). The role of causal and intentional reasoning in moral judgment in individuals with High Functioning Autism. Journal of Autism and Developmental Disorders, 43(2), 458-70.
  • pdf Kinzler, K.D., Dupoux, E. & Spelke, E.S. (2012). "Native" objects and collaborators: Infants' object choices and acts of giving reflect favor for native over foreign speakers. Journal of Cognition and Development, 13(1), 1-15. [abstract] Infants learn from adults readily and cooperate with them spontaneously, but how do they select culturally appropriate teachers and collaborators? Building on evidence that children demonstrate social preferences for speakers of their native language, Experiment 1 presented 10- month-old infants with videotaped events in which a native and a foreign speaker introduced two different toys. When given a chance to choose between real exemplars of the objects, infants preferentially chose the toy modeled by the native speaker. In Experiment 2, 2.5-year-old children were presented with the same videotaped native and foreign speakers, and played a game in which they could offer an object to one of two individuals. Children reliably gave to the native speaker. Together, the results suggest that infants and young children are selective social learners and cooperators, and that language provides one basis for this selectivity.
  • pdf Jacquemot, C., Dupoux, E., Robotham, L. & Bachoud-Lévi, A.C. (2012). Specificity in rehabilitation of word production: a meta-analysis and a case study. Behavioural Neurology, 25(2), 73-101. [abstract] ABSTRACT = Speech production impairment is a frequent deficit observed in aphasic patients and rehabilitation programs have been extensively developed. Nevertheless, there is still no agreement on the type of rehabilitation that yields the most successful outcomes. Here, we ran a detailed meta-analysis of 39 studies of word production rehabilitation involving 124 patients. We used a model-driven approach for analyzing each rehabilitation task by identifying which levels of our model each task tapped into. We found that (1) all rehabilitation tasks are not equally efficient and the most efficient ones involved the activation of the two levels of the word production system: the phonological output lexicon and the phonological output, and (2) the activation of the speech perception system as it occurs in many tasks used in rehabilitation is not successful in rehabilitating word production. In this meta-analysis, the effect of the activation of the phonological output lexicon and the phonological output cannot be assessed separately. We further conducted a rehabilitation study with DPI, a patient who suffers from a damage of the phonological output lexicon. Our results confirm that rehabilitation is more efficient, in terms of time and performance, when specifically addressing the impaired level of word production.
  • pdf Cova, F., Dupoux, E. & Jacob, P. (2012). On doing things intentionally. Mind and Language, 27(4), 378--409. [abstract] ABSTRACT = Recent empirical and conceptual research has shown that moral considerations have an influence on the way we use the adverb ``intentionally''. Here we propose our own account of these phenomena according to which they arise from the fact that the adverb ``intentionally'' has three different meanings that are differently selected by contextual factors, including normative expectations. We argue that our hypotheses can account for most available data and present some new results which support this. We end by discussing the implications of our account for folk psychology.
  • pdf Cleret de Langavant, L., Trinkler, I., Remy, P., Thirioux, B., McIntyre, J., Berthoz, A., Dupoux, E. & Bachoud-Lévi, A.C. (2012). Viewing another person's body as a target object: a behavioural and PET study of pointing. Neuropsychologia, 50(8), 1801-13. [abstract] ABSTRACT = ...abstract missing...
  • pdf Minagawa-Kawai, Y., van der Lely, H., Ramus, F., Sato, Y., Mazuka, R. & Dupoux, E. (2011). Optical Brain Imaging Reveals General Auditory and Language-Specific Processing in Early Infant Development. Cerebral Cortex, 21(2), 254-261. [abstract] This study uses near-infrared spectroscopy in young infants in order to elucidate the nature of functional cerebral processing for speech. Previous imaging studies of infants' speech perception revealed left-lateralized responses to native language. However, it is unclear if these activations were due to language per se rather than to some low-level acoustic correlate of spoken language. Here we compare native (L1) and non-native (L2) languages with 3 different nonspeech conditions including emotional voices, monkey calls, and phase scrambled sounds that provide more stringent controls. Hemodynamic responses to these stimuli were measured in the temporal areas of Japanese 4 month-olds. The results show clear left-lateralized responses to speech, prominently to L1, as opposed to various activation patterns in the nonspeech conditions. Furthermore, implementing a new analysis method designed for infants, we discovered a slower hemodynamic time course in awake infants. Our results are largely explained by signal-driven auditory processing. However, stronger activations to L1 than to L2 indicate a language-specific neural factor that modulates these responses. This study is the first to discover a significantly higher sensitivity to L1 in 4 month-olds and reveals a neural precursor of the functional specialization for the higher cognitive network.
  • pdf Minagawa-Kawai, Y., Cristia, A., Vendelin, I., Cabrol, D. & Dupoux, E. (2011). Assessing signal-driven mechanisms in neonates: Brain responses to temporally and spectrally different sounds. Frontiers in Language Sciences, 2(135). [abstract] ABSTRACT = Past studies have found that, in adults, the acoustic properties of sound signals (such as fast vs. slow temporal features) differentially activate the left and right hemispheres, and some have hypothesized that left-lateralization for speech processing may follow from left-lateralization to rapidly changing signals. Here, we tested whether newborns' brains show some evidence of signal-specific lateralization responses using near-infrared spectroscopy (NIRS) and auditory stimuli that elicits lateralized responses in adults, composed of segments that vary in duration and spectral diversity. We found significantly greater bilateral responses of oxygenated hemoglobin (oxy-Hb) in the temporal areas for stimuli with a minimum segment duration of 21 ms, than stimuli with a minimum segment duration of 667 ms. However, we found no evidence for hemispheric asymmetries dependent on the stimulus characteristics. We hypothesize that acoustic-based functional brain asymmetries may develop throughout early infancy, and discuss their possible relationship with brain asymmetries for language.
  • pdf Minagawa-Kawai, Y., Cristià, A. & Dupoux, E. (2011). Cerebral lateralization and early speech acquisition: A developmental scenario. Developmental Cognitive Neuroscience, 1(3), 217-232. [abstract] During the past ten years, research using Near-InfraRed Spectroscopy (NIRS) to study the developing brain has provided groundbreaking evidence of brain functions in infants. We review three competing classes of hypotheses, (signal-driven, domain-driven, and learning biases hypotheses) regarding the causes of hemispheric specialization for speech processing. We assess the fit between each of these hypotheses and neuroimaging evidence in speech perception and show that none of the three hypotheses can account for the entire set of observations on its own. However, we argue that they provide a good fit when combined within a developmental perspec- tive. According to our proposed scenario, lateralization for language emerges out of the interaction between pre-existing left--right biases in generic auditory processing (signal- driven hypothesis), and a left-hemisphere predominance of particular learning mechanisms (learning-biases hypothesis). As a result of thiscompleted developmental process, the native language is represented in the left hemisphere predominantly. The integrated sce- nario enables to link infant and adult data, and points to many empirical avenues that need to be explored more systematically.
  • pdf Mazuka, R., Cao, Y., Dupoux, E. & Christophe, A. (2011). The development of a phonological illusion: A cross-linguistic study with Japanese and French infants. Developmental Science, 14(4), 693-699. [abstract] ABSTRACT = In adults, the native language phonology has strong perceptual effects. Previous work showed that Japanese speakers, unlike French speakers, break up illegal sequences of consonants with illusory vowels: they report hearing abna as abuna. To study the development of the phonological grammar, we compared Japanese and French infants in a discrimination task. In Experiment 1, we observed that 14-month-old Japanese infants, in contrast with French infants, failed to discriminate phonetically varied sets of abna-type and abuna-type stimuli. In Experiment 2, 8 month-old French and Japanese did not differ significantly from each other. In Experiment 3, we found that, like adults, Japanese infants can discriminate abna from abuna when phonetic variability is reduced (single item). These results show that the phonologically- induced /u/ illusion is already experienced by Japanese infants at the age of 14 months. Hence, before having acquired many words of their language, they have grasped enough of their native phonological grammar to constrain their perception of speech sound sequences.
  • pdf Jacquemot, C., Dupoux, E. & Bachoud-Lévi, A.C. (2011). Is the word-length effect linked to subvocal rehearsal? Cortex, 47(4), 484-493. [abstract] Models of phonological short-term memory (pSTM) generally distinguish between two components: a phonological buffer and a subvocal rehearsal. Evidence for these two components comes, respectively, from the phonological similarity effect and the word-length effect which disappears under articulatory suppression. But alternative theories posit that subvocal rehearsal is only an optional component of the pSTM. According to them, the depletion of the length effect under articulatory suppression results from the interference of the self-produced speech rather than the disruption of subvocal rehearsal. In order to disentangle these two theories, we tested two patients with a short-term memory deficit. FA, who presents a pseudoword repetition deficit, and FL, who does not. FA's deficit allowed for the observance of an ecological case of subvocal rehearsal disruption without any articulatory suppression task. FA's performance in pSTM tasks reveals as controls a phonological similarity effect, and contrary to controls no word-length effect. In contrast, the second patient, FL, exhibits the same effects as control subjects. This result is in accordance with models of pSTM in which the word-length effect emerges from subvocal rehearsal and disappears when this latter is disrupted.
  • pdf Hannagan, T., Dupoux, E. & Christophe, A. (2011). Holographic String Encoding. Cognitive Science, 35(1), 79-118. [abstract] In this article, we apply a special case of holographic representations to letter position coding. We translate different well-known schemes into this format, which uses distributed representations and supports constituent structure. We show that in addition to these brain-like characteristics, performances on a standard benchmark of behavioral effects are improved in the holographic format relative to the standard localist one. This notably occurs because of emerging properties in holographic codes, like transposition and edge effects, for which we give formal demonstrations. Finally, we outline the limits of the approach as well as its possible future extensions.
  • pdf Dupoux, E., Parlato, E., Frota, S., Hirose, Y. & Peperkamp, S. (2011). Where do illusory vowels come from? Journal of Memory and Language, 64(3), 199-210. [abstract] Listeners of various languages tend to perceive an illusory vowel inside consonant clusters that are illegal in their native language. Here, we test whether this phenomenon arises after phoneme categorization or rather interacts with it. We assess the perception of illegal consonant clusters in native speakers of Japanese, Brazilian Portuguese, and European Portuguese, three languages that have similar phonological properties, but that differ with respect to both segmental categories and segmental transition probabilities. We manipulate the coarticulatory information present in the consonant clusters, and use a forced choice vowel labeling task (Experiment 1) and an ABX discrimination task (Experiment 2). We find that only Japanese and Brazilian Portuguese listeners show a perceptual epenthesis effect, and, furthermore, that within these participant groups the nature of the perceived epenthetic vowel varies according to the coarticulation cues. These results are consistent with models that integrate phonotactic probabilities within perceptual categorization, and are problematic for two-step models in which the repair of illegal sequences follows that of categorization.
  • pdf Dupoux, E., Beraud-Sudreau, G. & Sagayama, S. (2011). Templatic features for modeling phoneme acquisition. In Proceedings of the 33rd Annual Conference of the Cognitive Science Society, Boston, Mass.. [abstract] We describe a model for the coding of speech sounds into a high dimensional space. This code is obtained by computing the similarity between speech sounds and stored syllable-sized templates. We show that this code yields a better linear separation of phonemes than the standard MFCC code. Additional experiments show that the code is tuned to a particular language, and is able to use temporal cues for the purpose of phoneme recognition. Optimal templates seem to correspond to chunks of speech of around 120ms containing transitions between phonemes or syllables.
  • pdf Cleret de Langavant, L., Remy, P., Trinkler, I., McIntyre, J., Dupoux, E., Berthoz, A. & Bachoud-Lévi, A.C. (2011). Behavioral and Neural Correlates of Communication via Pointing. Plos One, 6(3), e17719. [abstract] Communicative pointing is a human specific gesture which allows sharing information about a visual item with another person. It sets up a three-way relationship between a subject who points, an addressee and an object. Yet psychophysical and neuroimaging studies have focused on non-communicative pointing, which implies a two-way relationship between a subject and an object without the involvement of an addressee, and makes such gesture comparable to touching or grasping. Thus, experimental data on the communicating function of pointing remain scarce. Here, we examine whether the communicative value of pointing modifies both its behavioral and neural correlates by comparing pointing with or without communication. We found that when healthy participants pointed repeatedly at the same object, the communicative interaction with an addressee induced a spatial reshaping of both the pointing trajectories and the endpoint variability. Our finding supports the hypothesis that a change in reference frame occurs when pointing conveys a communicative intention. In addition, measurement of regional cerebral blood flow using H2O15 PET-scan showed that pointing when communicating with an addressee activated the right posterior superior temporal sulcus and the right medial prefrontal cortex, in contrast to pointing without communication. Such a right hemisphere network suggests that the communicative value of pointing is related to processes involved in taking another person's perspective. This study brings to light the need for future studies on communicative pointing and its neural correlates by unraveling the three-way relationship between subject, object and an addressee.
  • pdf Boruta, L., Peperkamp, S., Crabbé, B. & Dupoux, E. (2011). Testing the robustness of online word segmentation: effects of linguistic diversity and phonetic variation. In Proceedings of the 2011 Workshop on Cognitive Modeling and Computational Linguistics, ACL, 1-9, Portland, Oregon. [abstract] Models of the acquisition of word segmentation are typically evaluated using phonemically transcribed corpora. Accordingly, they implicitly assume that children know how to undo phonetic variation when they learn to extract words from speech. Moreover, whereas models of language acquisition should perform similarly across languages, evaluation is often limited to English samples. Using child-directed corpora of English, French and Japanese, we evaluate the performance of state-of-the-art statistical models given inputs where phonetic variation has not been reduced. To do so, we measure segmentation robustness across different levels of segmental variation, simulating systematic allophonic variation or errors in phoneme recognition. We show that these models do not resist an increase in such variations and do not generalize to typologically different languages. From the perspective of early language acquisition, the results strengthen the hypothesis according to which phonological knowledge is acquired in large part before the construction of a lexicon.
  • pdf Peperkamp, S., Vendelin, I. & Dupoux, E. (2010). Perception of predictable stress: A cross-linguistic investigation. Journal of Phonetic, 38(3), 422-430. [abstract] Previous studies have documented that speakers of French, a language with predictable stress, have difficulty distinguishing nonsense words that vary in stress position solely (stress ``deafness{''}). In a sequence recall task with adult speakers of five languages with predictable stress (Standard French, Southeastern French, Finnish, Hungarian and Polish) and one language with non-predictable stress (Spanish), it was found that speakers of all languages with predictable stress except Polish exhibited a strong stress ``deafness{''}, while Spanish speakers exhibited no such ``deafness{''}. Polish speakers yielded an intermediate pattern of results: they exhibited a weak stress ``deafness{''}. These findings are discussed in light of current theoretical models of speech perception.
  • pdf Parlato-Oliveira, E., Christophe, A., Hirose, Y. & Dupoux, E. (2010). Plasticity of illusory vowel perception in Brazilian-Japanese bilinguals. Journal of the Acoustical Society of America, 127(6), 3738-3748. [abstract] Previous research shows that monolingual Japanese and Brazilian Portuguese listeners perceive illusory vowels (/u/ and /i/, respectively) within illegal sequences of consonants. Here, several populations of Japanese-Brazilian bilinguals are tested, using an explicit vowel identification task (experiment 1), and an implicit categorization and sequence recall task (experiment 2). Overall, second-generation immigrants, who first acquired Japanese at home and Brazilian during childhood (after age 4) showed a typical Brazilian pattern of result (and so did simultaneous bilinguals, who were exposed to both languages from birth on). In contrast, late bilinguals, who acquired their second language in adulthood, exhibited a pattern corresponding to their native language. In addition, an influence of the second language was observed in the explicit task of Exp. 1, but not in the implicit task used in Exp. 2, suggesting that second language experience affects mostly explicit or metalinguistic skills. These results are compared to other studies of phonological representations in adopted children or immigrants, and discussed in relation to the role of age of acquisition and sociolinguistic factors. (C) 2010 Acoustical Society of America. [DOI: 10.1121/1.3327792]
  • pdf Kouider, S., de Gardelle, V., Sackur, J. & Dupoux, E. (2010). How rich is consciousness? The partial awareness hypothesis Trends in Cognitive Sciences, 14(7), 301-307. [abstract] Current theories of consciousness posit a dissociation between `phenomenal' consciousness (rich) and `access' consciousness (limited). Here, we argue that the empirical evidence for phenomenal consciousness without access is equivocal, resulting either from a confusion between phenomenal and unconscious contents, or from an impression of phenomenally rich experiences arising from illusory contents. We propose a refined account of access that relies on a hierarchy of representational levels and on the notion of partial awareness, whereby lower and higher levels are accessed independently. Reframing of the issue of dissociable forms of consciousness into dissociable levels of access provides a more parsimonious account of the existing evidence. In addition, the rich phenomenology illusion can be studied and described in terms of testable cognitive mechanisms.
  • pdf Kouider, S., de Gardelle, V., Dehaene, S., Dupoux, E. & Pallier, C. (2010). Cerebral bases of subliminal speech priming. Neuroimage, 49(1), 922-929. [abstract] While the neural correlates of unconscious perception and subliminal priming have been largely studied for visual stimuli, little is known about their counterparts in the auditory modality. Here we used a subliminal speech priming method in combination with fMRI to investigate which regions of the cerebral network for language can respond in the absence of awareness. Participants performed a lexical decision task on target items preceded by subliminal primes, which were either phonetically identical or different from the target. Moreover, the prime and target could be spoken by the same speaker or by two different speakers. Word repetition reduced the activity in the insula and in the left superior temporal gyrus. Although the priming effect on reaction times was independent of voice manipulation, neural repetition suppression was modulated by speaker change in the superior temporal gyrus while the insula showed voice-independent priming. These results provide neuroimaging evidence Of Subliminal priming for spoken words and inform us on the first, unconscious stages of speech perception.
  • pdf Dupoux, E., Peperkamp, S. & Sebastian-Galles, N. (2010). Limits on bilingualism revisited: Stress "deafness" in simultaneous French-Spanish bilinguals. Cognition, 114(2), 266-275. [abstract] We probed simultaneous French-Spanish bilinguals for the perception of Spanish lexical stress using three tasks, two short-term memory encoding tasks and a speeded lexical decision. In all three tasks, the performance of the group of simultaneous bilinguals was intermediate between that of native speakers of Spanish on the one hand and French late learners of Spanish on the other hand. Using a composite stress `deafness' index measure computed over the results of the three tasks, we found that the performance of the simultaneous bilinguals is best fitted by a bimodal distribution that corresponds to a mixture of the performance distributions of the two control groups. Correlation analyses showed that the variables explaining language dominance are linked to early language exposure. These findings are discussed in light of theories of language processing in bilinguals.
  • pdf Teichmann, M., Darcy, I., Bachoud-Lévi, A.C. & Dupoux, E. (2009). The role of the striatum in phonological processing. Evidence from early stages of Huntington's disease Cortex, 45(7), 839-849. [abstract] The linguistic role of subcortical structures such as the striatum is still controversial. According to the claim that language processing is subdivided into a lexical memory store and a computational rule system (Pinker, 1999) several studies on word morphology (e.g., Ullman et al., 1997) and on syntax (e.g., Teichmann et al., 2005) have suggested that the striatum is specifically dedicated to the latter component. However, little is known about whether the striatum is involved in phonological operations and whether its role in linguistic rule application generalizes to phonological processing. We investigated this issue by assessing perceptual compensation for assimilation rules in a model of striatal disorders, namely in the early stages of Huntington's disease (HD). In Experiment 1 we used a same-different task with isolated words to evaluate whether phoneme perception is intact in HD. In Experiment 2 a word detection task in phrasal contexts allowed for assessing both phoneme perception and perceptual compensation for the French regressive assimilation rule. Results showed that HD patients have normal performance with both phoneme perception in isolated words and regressive assimilation rules. However, in phrasal contexts they display reduced abilities of phoneme discrimination. These findings challenge the striatum-rule claim and suggest a more fine-grained function of striatal structures in linguistic rule processing. Alternative explanatory frameworks of the striatum-language link are discussed.
  • pdf Skoruppa, K., Pons, F., Christophe, A., Bosch, L., Dupoux, E., Sebastian-Galles, N., Limissuri, R.A. & Peperkamp, S. (2009). Language-specific stress perception by 9-month-old French and Spanish infants. Developmental Science, 12(6), 914-919. [abstract] During the first year of life, infants begin to have difficulties perceiving non-native vowel and consonant contrasts, thus adapting their perception to the phonetic categories of the target language. In this paper, we examine the perception of a non-segmental feature, i.e. stress. Previous research with adults has shown that speakers of French (a language with fixed stress) have great difficulties in perceiving stress contrasts (Dupoux, Pallier, Sebastian & Mehler, 1997), whereas speakers of Spanish (a language with lexically contrastive stress) perceive these contrasts as accurately as segmental contrasts. We show that language-specific differences in the perception of stress likewise arise during the first year of life. Specifically, 9-month-old Spanish infants successfully distinguish between stress-initial and stress-final pseudo-words, while French infants of this age show no sign of discrimination. In a second experiment using multiple tokens of a single pseudo-word, French infants of the same age successfully discriminate between the two stress patterns, showing that they are able to perceive the acoustic correlates of stress. Their failure to discriminate stress patterns in the first experiment thus reflects an inability to process stress at an abstract, phonological level.
  • pdf Kouider, S. & Dupoux, E. (2009). Episodic accessibility and morphological processing: Evidence from long-term auditory priming. Acta Psychologica, 130(1), 38-47. [abstract] Long-term priming studies of lexical processing have yielded conflicting claims as to whether abstract versus episodic representations are involved during word recognition. A critical piece of evidence that could separate the two accounts rests on the existence of full morphological priming, where morphologically related words yield the same amount of priming as repeated words. In this study. participants performed speeded lexical decision on lists of auditory words and non-words, which contained repeated, morphologically related, semantically related and phonologically related pairs of items. In order to minimize the involvement of episodic factors, we increased the prime-target interval and decreased their physical similarity by introducing a change in speaker's voice. We show that under conditions that minimize access to episodic features, the magnitude of repetition priming decreased to attain that of morphological priming. Importantly, morphological and repetition priming for words were always observed in the absence of any semantic and phonological priming, suggesting that they cannot be reduced to formal or meaning overlap. Our results support the view that long-term priming taps both abstract lexical codes with a morphological format and episodic memory components. Further, they show that episodic influences on priming can be modulated by prime-target interval and physical similarity.
  • pdf Varadarajan, B., Khudanpur, S. & Dupoux, E. (2008). Unsupervised Learning of Acoustic Subword Units. In Proceedings of ACL-08: HLT, (pp 165-168) . [abstract] Accurate unsupervised learning of phonemes of a language directly from speech is demonstrated via an algorithm for joint unsupervised learning of the topology and parameters of a hidden Markov model (HMM); states and short state-sequences through this HMM correspond to the learnt sub-word units. The algorithm, originally proposed for unsupervised learning of allophonic variations within a given phoneme set, has been adapted to learn without any knowledge of the phonemes. An evaluation methodology is also proposed, whereby the state-sequence that aligns to a test utterance is transduced in an automatic manner to a phoneme-sequence and compared to its manual transcription. Over 85% phoneme recognition accuracy is demonstrated for speaker-dependent learning from fluent, large-vocabulary speech.
  • pdf Teichmann, M., Dupoux, E., Cesaro, P. & Bachoud-Lévi, A.C. (2008). The role of the striatum in sentence processing: Evidence from a priming study in early stages of Huntington's disease. Neuropsychologia, 46(1), 174-185. [abstract] The role of sub-cortical structures such as the striatum in language remains a controversial issue. Based on linguistic claims that language processing implies both recovery of lexical information and application of combinatorial rules it has been shown that striatal damaged patients have difficulties applying conjugation rules while lexical recovery of irregular forms is broadly spared (e.g., Ullman, M. T., Corkin, S., Coppola, M., Hickok, G., Growdon, J. H., Koroshetz, W. J., et al. (1997). A neural dissociation within language: Evidence that the mental dictionary is part of declarative memory, and that grammatical rules are processed by the procedural system. Journal of Cognitive Neuroscience, 9(2), 266-276). Here we bolstered the striatum-rule hypothesis by investigating lexical abilities and rule application at the phrasal level. Both processing aspects were assessed in a model of striatal dysfunction, namely Huntington's disease (HD). Using a semantic priming task we compared idiomatic prime sentences involving lexical access to whole phrases (e.g., ``Paul has kicked the bucket{''}) with idiom-derived sentences that contained passivation changes involving syntactic movement rules (e.g., ``Paul was kicked by the bucket{''}), word changes (e.g., ``Paul has crushed the bucket{''}) or either. Target words that were either idiom-related (e.g., ``death{''}) reflecting lexical access to idiom meanings, word-related (e.g., ``bail{''}) reflecting lexical access to single words, or unrelated (e.g., ``table{''}). HD patients displayed selective abnormalities with passivated sentences whereas priming was normal with idioms and sentences containing only word changes. We argue that the role of the striatum in sentence processing specifically pertains to the application of syntactic movement rules whereas it is not involved in canonical rules required for active structures or in lexical processing aspects. Our findings support the striatum-rule hypothesis but suggest that it should be refined by tracking the particular kind of language rules depending on striatal computations.
  • pdf Minagawa-Kawai, Y., Mori, K., Hebden, J.C. & Dupoux, E. (2008). Optical Imaging of infants' neurocognitive development: Recent advances and perspectives. Developmental Neurobiology, 68(6), 712-728. [abstract] Near-infrared spectroscopy (NIRS) provides a unique method of monitoring infant brain function by measuring the changes in the concentrations of oxygenated and deoxygenated hemoglobin. During the past 10 years, NIRS measurement of the developing brain has rapidly expanded. In this article, a brief discussion of the general principles of NIRS, including its technical advantages and limitations, is followed by a detailed review of the role played so far by NIRS in the study of infant perception and cognition, including language, and visual and auditory functions. Results have highlighted, in particular, the developmental changes of cerebral asymmetry associated with speech acquisition. Finally, suggestions for future studies of neurocognitive development using NIRS are presented. Although NIRS studies of the infant brain have yet to fulfill their potential, a review of the work done so far indicates that NIRS is likely to provide many unique insights in the field of developmental neuroscience.
  • pdf Dupoux, E., de Gardelle, V. & Kouider, S. (2008). Subliminal speech perception and auditory streaming. Cognition, 109(2), 267-273. [abstract] Current theories of consciousness assume a qualitative dissociation between conscious and unconscious processing: while subliminal stimuli only elicit a transient activity, supraliminal stimuli have long-lasting influences. Nevertheless, the existence of this qualitative distinction remains controversial, as past studies confounded awareness and stimulus strength (energy, duration). Here, we used a masked speech priming method in conjunction with a submillisecond interaural delay manipulation to contrast subliminal and supraliminal processing at constant prime, mask and target strength. This delay induced a perceptual streaming effect, with the prime popping out in the supraliminal condition. By manipulating the prime-target interval (ISI), we show a qualitatively distinct profile of priming longevity as a function of prime awareness. While subliminal priming disappeared after half a second, supraliminal priming was independent of ISI. This shows that the distinction between conscious and unconscious processing depends on high-level perceptual streaming factors rather than low-level features (energy, duration).
  • pdf Dupoux, E., Sebastian-Galles, N., Navarrete, E. & Peperkamp, S. (2008). Persistent stress "deafness": The case of French learners of Spanish. Cognition, 106(2), 682-706. [abstract] Previous research by Dupoux et al. [Dupoux, E., Pallier, C., Sebastian, N., & Mehler, J. (1997). A destressing ``deafness{''} in French? Journal of Memory Language 36, 406-421; Dupoux, E., Peperkamp, S., & Sebastian-Galles (2001). A robust method to study stress' deafness. Journal of the Acoustical Society of America 110, 1608-1618.] found that French speakers, as opposed to Spanish ones, are impaired in discrimination tasks with stimuli that vary only in the position of stress. However, what was called stress `deafness' was only found in tasks that used high phonetic variability and memory load, not in cognitively less demanding tasks such as single token AX discrimination. This raised the possibility that instead of a perceptual problem, monolingual French speakers might simply lack a metalinguistic representation of contrastive stress, which would impair them in memory tasks. We examined a sample of 39 native speakers of French who underwent formal teaching of Spanish after age 10, and varied in degree of practice in this language. Using a sequence recall task, we observed in all our groups of late learners of Spanish the same impairment in short-term memory encoding of stress contrasts that was previously found in French monolinguals. Furthermore, using a speeded lexical decision task with word-nonword minimal pairs that differ only in the position of stress, we found that all late learners had much difficulty in the use of stress to access the lexicon. Our results show that stress `deafness' is better interpreted as a lasting processing problem resulting from the impossibility for French speakers to encode contrastive stress in their phonological representations. This affects their memory encoding as well as their lexical access in on-line tasks. The generality of such a persistent suprasegmental `deafness' is discussed in relation to current findings and models on the perception of non-native phonological contrasts.
  • Peperkamp, S. & Dupoux, E. (2007). Learning the mapping from surface to underlying representations in an artificial language. In J. Cole & J. Hualde (eds) Laboratory Phonology, 9, Mouton de Gruyter. [abstract] ABSTRACT = When infants acquire their native language they not only extract language-specific segmental categories and the words of their language, they also learn the underlying form of these words. This is difficult because words can have multiple phonetic realizations, according to the phonological context. In a series of artificial language-learning experiments with a phrase-picture matching task, we consider the respective contributions of word meaning and distributional information for the acquisition of underlying representations in the presence of an allophonic rule. We show that on the basis of semantic information, French adults can learn to map voiced and voiceless stops or fricatives onto the same underlying phonemes, whereas in their native language voicing is phonemic in all obstruents. They do not extend this knowledge to novel stops or fricatives, though. In the presence of distributional cues only, learning is much reduced and limited to the words subjects are trained on. We also test if phonological naturalness plays a role in this type of learning, and find that if semantic information is present, French adults can learn to map different segments onto a single underlying phoneme even if the mappings are highly unnatural. We discuss our findings in light of current statistical learning approaches to language acquisition.
  • pdf Minagawa-Kawai, Y., Naoi, N., Nishijima, N., Kojima, S. & Dupoux, E. (2007). Developmental changes in cerebral responses to native and non-native vowels: a NIRS study. In Proceedings of the International Conference of Phonetic Sciences XVI, (pp 1877--1880) Saarbrucken. [abstract] ABSTRACT = While newborn infants discriminate speech sounds from languages that they have never heard, 6-month-olds demonstrate the beginnings of vowel classification specific to their native-language. The neuronal correlates involved in such a dramatic perceptual reorganization process, however, are not well understood. Using near-infrared spectroscopy (NIRS), this study compares the neural responses of Japanese infants at 3-4 months and 7-8 months of age as well as of adults to native ([i] vs. [w] ) and non-native vowel contrasts ([w] vs. [u]) within pseudo-word contexts. The findings demonstrated longitudinal developmental changes of functional temporal cortex asymmetries associated with the exposure of the native language.
  • pdf Kinzler, K.D., Dupoux, E. & Spelke, E.S. (2007). The native language of social cognition. Proceedings of the National Academy of Sciences of the United States of America, 104(30), 12577-12580. [abstract] What leads humans to divide the social world into groups, preferring their own group and disfavoring others? Experiments with infants and young children suggest these tendencies are based on predispositions that emerge early in life and depend, in part, on natural language. Young infants prefer to look at a person who previously spoke their native language. Older infants preferentially accept toys from native-language speakers, and preschool children preferentially select native-language speakers as friends. Variations in accent are sufficient to evoke these social preferences, which are observed in infants before they produce or comprehend speech and are exhibited by children even when they comprehend the foreign-accented speech. Early-developing preferences for native-language speakers may serve as a foundation for later-developing preferences and conflicts among social groups.
  • pdf Jacquemot, C., Dupoux, E. & Bachoud-Lévi, A.C. (2007). Breaking the mirror: Asymmetrical disconnection between the phonological input and output codes. Cognitive Neuropsychology, 24(1), 3-22. [abstract] In this paper, we study the link between the processing systems that sustain speech perception and production in a patient (F.A.) with conduction aphasia. Her pattern of performance in repetition task - quantitative but also qualitative striking difference in errors with pseudowords versus words - cannot be properly accounted for either by a perception deficit or by a production deficit. We discuss this finding according to theoretical models of phonological processing and show that it is best explained by an impaired ability to transfer phonological information from the perception to the production system. We also probed for a phonological link in the opposite direction, from the production to the perception system. F. A.'s results show that this link was not impaired. Overall, our results suggest that (a) the phonological codes in perception and in production are separate but connected by two conversion mechanisms and that (b) these two mechanisms can be disrupted independently.
  • pdf Dupoux, E. & Jacob, P. (2007). Universal moral grammar: a critical appraisal. Trends in Cognitive Sciences, 11(9), 373-378. [abstract] A new framework for the study of the human moral faculty is currently receiving much attention: the so-called `universal moral grammar' framework. It is based on an intriguing analogy, first pointed out by Rawls, between the study of the human moral sense and Chomsky's research program into the human language faculty. To assess UMG, we ask: is moral competence modular? Does it have an underlying hierarchical grammatical structure? Does moral diversity rest on culture-dependant parameters? We review the evidence and argue that formal grammatical concepts are of limited value for the study of moral judgments, moral development and moral diversity.
  • pdf Darcy, I., Peperkamp, S. & Dupoux, E. (2007). Bilinguals play by the rules. Perceptual compensation for assimilation in late L2-learners In J. Cole & J. Hualde (eds) Laboratory Phonology, 9, (pp 411-442) Mouton de Gruyter. [abstract] Phonological rules introduce variation in word forms, that listeners have to compensate for. We previously showed (Darcy 2002, Darcy et al., under review) that compensation for phonological variation in perception is driven by language-specific mechanisms. In particular, English speakers compensate more for place assimilation than for voicing assimilation, whereas the reverse holds for French speakers. English indeed has a rule of place assimilation, whereas French has a rule of voicing assimilation. In the present study, we explore the patterns of compensation for assimilation in English learners of French and in French learners of English. We use the same design and stimuli as Darcy 2002, Darcy et al. (under review); in this design, listeners are engaged in a word detection task on sentences containing occurrences of both place assimilation and voicing assimilation. We test British English and American English learners of French as well as French learners of American English on both their native language (L1) and their second language (L2). The results show that beginners interpret their L2 in exactly the same way as their L1: they apply the native compensation pattern to both languages. Advanced learners, by contrast, succeed in compensating for the non-native assimilation rule in their L2, while keeping the native compensation pattern for L1; as little or no interference from L2 on L1 is observed for these learners, we conclude that two separate systems of compensation for phonological processes can co-exist.
  • pdf Teichmann, M., Dupoux, E., Kouider, S. & Bachoud-Lévi, A.C. (2006). The role of the striatum in processing language rules: Evidence from word perception in Huntington's disease. Journal of Cognitive Neuroscience, 18(9), 1555-1569. [abstract] On the assumption that linguistic faculties reflect both lexical storage in the temporal cortex and combinatorial rules in the striatal circuits, several authors have shown that striatal-damaged patients are impaired with conjugation rules while retaining lexical knowledge of irregular verbs {[}Teichmann, M., Dupoux, E., Kouider, S., Brugieres, P., Boisse, M. F., Baudic, S., Cesaro, P., Peschanski, M., & Bachoud-L{é}vi, A. C. (2005). The role of the striatum in rule application. The model of Huntington's disease at early stage. Brain, 128, 1155-1167; Ullman, A T., Corkin, S., Coppola, M., Hickok, G., Growdon, J. H., Koroshetz, W. J., & Pinker, S. (1997). A neural dissociation within language: Evidence that the mental dictionary is part of declarative memory, and that grammatical rules are processed by the procedural system. Journal of Cognitive Neuroscience, 9, 266-276]. Yet, such impairment was documented only with explicit conjugation tasks in the production domain. Little is known about whether it generalizes to other language modalities such as perception and whether it refers to implicit language processing or rather to intentional rule operations through executive functions. We investigated these issues by assessing perceptive processing of conjugated verb forms in a model of striatal dysfunction, namely, in Huntington's Disease (HD) at early stages. Rule application and lexical processes were evaluated in an explicit task (acceptability judgments on verb and nonword forms) and in an implicit task (lexical decision on frequency-manipulated verb forms). HD patients were also assessed in executive functions, and striatal atrophy was evaluated with magnetic resonance imaging (bicaudate ratio). Results from both tasks showed that HD patients were selectively impaired for rule application but lexical abilities were spared. Bicaudate ratios correlated with rule scores on both tasks, whereas executive parameters only correlated with scores on the explicit task. We argue that the striatum has a core function in linguistic rule application generalizing to perceptive aspects of morphological operations and pertaining to implicit language processes. In addition, we suggest that the striatum may enclose computational circuits that underpin explicit manipulation of regularities.
  • pdf Peperkamp, S., Le Calvez, R., Nadal, J.P. & Dupoux, E. (2006). The acquisition of allophonic rules: Statistical learning with linguistic constraints. Cognition, 101(3), B31-B41. [abstract] Phonological rules relate surface phonetic word forms to abstract underlying forms that are stored in the lexicon. Infants must thus acquire these rules in order to infer the abstract representation of words. We implement a statistical learning algorithm for the acquisition of one type of rule, namely allophony, which introduces context-sensitive phonetic variants of phonemes. This algorithm is based on the observation that different realizations of a single phoneme typically do not appear in the same contexts (ideally, they have complementary distributions). In particular, it measures the discrepancies in context probabilities for each pair of phonetic segments. In Experiment 1, we test the algorithm's performances on a pseudo-language and show that it is robust to statistical noise due to sampling and coding errors, and to non-systematic rule application. In Experiment 2, we show that a natural corpus of semiphonetically transcribed child-directed speech in French presents a very large number of near-complementary distributions that do not correspond to existing allophonic rules. These spurious allophonic rules can be eliminated by a linguistically motivated filtering mechanism based on a phonetic representation of segments. We discuss the role of a priori linguistic knowledge in the statistical learning of phonology.
  • pdf Jacquemot, C., Dupoux, E., Decouche, O. & Bachoud-Lévi, A.C. (2006). Misperception in sentences but not in words: Speech perception and the phonological buffer. Cognitive Neuropsychology, 23(6), 949-971. [abstract] We report two case studies of aphasic patients with a working-memory impairment due to reduced storage in the phonological buffer. The two patients display excellent performance in phonological discrimination tasks as long as the tasks do not involve a memory load. We then show that their performance drops when they have to maintain fine-grained phonological information for sentence comprehension: They are impaired at mispronunciation detection and at comprehending sentences involving minimal word pairs. We argue that the phonological buffer plays a role in sentence perception during the phonological analysis of the speech stream: It sustains the temporary storage of phonological input in order to check and resolve phonological ambiguities, and it also allows reexamination of the phonological input if necessary.
  • pdf Teichmann, M., Dupoux, E., Kouider, S., Brugières, P., Boisse, M., Baudic, S., Cesaro, P., Peschanski, M. & Bachoud-Lévi, A.C. (2005). The role of the striatum in rule application: the model of Huntington's disease at early stage. Brain, 128(5), 1155-1167. [abstract] The role of the basal ganglia, and more specifically of the striatum, in language is still debated. Recent studies have proposed that linguistic abilities involve two distinct types of processes: the retrieving of stored information, implicating temporal lobe areas, and the application of combinatorial rules, implicating fronto-striatal circuits. Studies of patients with focal lesions and neurodegenerative diseases have suggested a role for the striatum in morphological rule application, but functional imaging studies found that the left caudate was involved in syntactic processing and not morphological processing. In the present study, we tested the view that the basal ganglia are involved in rule application and not in lexical retrieving in a model of striatal dysfunction, namely Huntington's disease at early stages. We assessed the rule-lexicon dichotomy in the linguistic domain with morphology (conjugation of non-verbs and verbs) and syntax (sentence comprehension) and in a non-linguistic domain with arithmetic operations (subtraction and multiplication). Thirty Huntington's disease patients (15 at stage I and 15 at stage II) and 20 controls matched for their age and cultural level were included in this study. Huntington's disease patients were also assessed using the Unified Huntington's Disease Rating Scale (UHDRS) and MRI. We found that early Huntington's disease patients were impaired in rule application in the linguistic and non-linguistic domains (morphology, syntax and subtraction), whereas they were broadly spared with lexical processing. The pattern of performance was similar in patients at stage I and stage II, except that stage II patients were more impaired in all tasks assessing rules and had in addition a very slight impairment in the lexical condition of conjugation. Finally, syntactic rule abilities correlated with all markers of the disease evolution including bicaudate ratio and performance in executive function, whereas there was no correlation with arithmetic and morphological abilities. Together, this suggests that the striatum is involved in rule processing more than in lexical processing and that it extends to linguistic and non-linguistic domains. These results are discussed in terms of domain-specific versus domain-general processes of rule application.
  • pdf Kouider, S. & Dupoux, E. (2005). Subliminal speech priming. Psychological Science, 16(8), 617-625. [abstract] We present a novel subliminal priming technique that operates in the auditory modality. Masking is achieved by hiding a spoken word within a stream of time-compressed speechlike sounds with similar spectral characteristics. Participants were unable to consciously identify the hidden words, yet reliable repetition priming was found. This effect was unaffected by a change in the speaker's voice and remained restricted to lexical processing. The results show that the speech modality, like the written modality, involves the automatic extraction of abstract word-form representations that do not include nonlinguistic details. In both cases, priming operates at the level of discrete and abstract lexical entries and is little influenced by overlap in form or semantics.
  • pdf Kouider, S. & Dupoux, E. (2004). Partial awareness creates the "illusion" of subliminal semantic priming. Psychological Science, 15(2), 75-81. [abstract] We argue that the lack of consensus regarding the existence of subliminal semantic processing arises from not taking into account the fact that linguistic stimuli are represented across several processing levels (features, letters, word form) that can independently reach or not reach awareness. Using masked words, we constructed conditions in which participants were aware of some letters or fragments of a word, while remaining unaware of the whole word. Three experiments using the Stroop priming paradigm show that when the stimulus set is reduced and participants are encouraged to guess the identity of the prime, such partially perceived stimuli can nonetheless give rise to ``semantic{''} processing. We provide evidence that this effect is due to illusory reconstruction of the incompletely perceived stimulus, followed by usual semantic processing of the result. We conclude that previously reported unconscious Stroop priming is in fact a conscious effect, but applied to a perceptual illusion.
  • pdf Pallier, C., Dahaene, S., Poline, J., LeBihan, D., Argenti, A., Dupoux, E. & Mehler, J. (2003). Brain imaging of language plasticity in adopted adults: Can a second language replace the first? Cerebral Cortex, 13(2), 155-161. [abstract] Do the neural circuits that subserve language acquisition lose plasticity as they become tuned to the maternal language? We tested adult subjects born in Korea and adopted by French families in childhood; they have become fluent in their second language and report no conscious recollection of their native language. In behavioral tests assessing their memory for Korean, we found that they do not perform better than a control group of native French subjects who have never been exposed to Korean. We also used event-related functional magnetic resonance imaging to monitor cortical activations while the Korean adoptees and native French listened to sentences spoken in Korean, French and other, unknown, foreign languages. The adopted subjects did not show any specific activations to Korean stimuli relative to unknown languages. The areas activated more by French stimuli than by foreign stimuli were similar in the Korean adoptees and in the French native subjects, but with relatively larger extents of activation in the latter group. We discuss these data in light of the critical period hypothesis for language acquisition.
  • pdf Jacquemot, C., Pallier, C., LeBihan, D., Dehaene, S. & Dupoux, E. (2003). Phonological grammar shapes the auditory cortex: A functional magnetic resonance imaging study. Journal of Neuroscience, 23(29), 9541-9546. [abstract] Languages differ depending on the set of basic sounds they use (the inventory of consonants and vowels) and on the way in which these sounds can be combined to make up words and phrases (phonological grammar). Previous research has shown that our inventory of consonants and vowels affects the way in which our brains decode foreign sounds (Goto, 1971; Naatanen et al., 1997; Kuhl, 2000). Here, we show that phonological grammar has an equally potent effect. We build on previous research, which shows that stimuli that are phonologically ungrammatical are assimilated to the closest grammatical form in the language (Dupoux et al., 1999). In a cross-linguistic design using French and Japanese participants and a fast event-related functional magnetic resonance imaging (fMRI) paradigm, we show that phonological grammar involves the left superior temporal and the left anterior supramarginal gyri, two regions previously associated with the processing of human vocal sounds.
  • pdf Dupoux, E., Kouider, S. & Mehler, J. (2003). Lexical access without attention? Explorations using dichotic priming Journal of Experimental Psychology-human Perception and Performance, 29(1), 172-184. [abstract] The authors used lexical decision in a dichotic listening situation and measured identity priming across channels to explore whether unattended stimuli can be processed lexically. In 6 experiments, temporal synchronization of prime and target words was manipulated, and acoustic saliency of the unattended prime was varied by embedding it in a carrier sentence or in babble speech. When the prime was acoustically salient, a cross-channel priming effect emerged, and participants were aware of the prime. When the prime was less salient, no identity priming was found, and participants failed to notice the prime. Saliency was manipulated in ways that did not degrade the prime. Results are inconsistent with models of late filtering, which predict equal priming irrespective of prime saliency.
  • pdf Bachoud-Lévi, A.C. & Dupoux, E. (2003). An influence of syntactic and semantic variables on word form retrieval. Cognitive Neuropsychology, 20(2), 163-188. [abstract] We report the case of DPI, an aphasic patient who shows a phonological impairment in production that spares certain syntactic and semantic categories. On a picture naming task, he produces mostly phonological paraphasias, and the probability of producing a correct response depends on the frequency and length of the target word. This deficit occurs in the presence of spared ability to find the grammatical gender of the items that he cannot name, intact conceptual knowledge, and very good reading and word repetition. Therefore, we conclude that DPI's deficit is restricted to the phonological retrieval of a correctly selected lexical entry. However, production errors are not uniform across semantic and syntactic domains. Numerals and names of days and months are totally spared compared to matched controls. In addition, abstract nouns and verbs are significantly less affected than concrete nouns, even when variables affecting phonological retrieval ( frequency, length, syllabic structure) are controlled for. This suggests that a functional organisation in terms of semantic and syntactic variables exists at the level of phonological retrieval. We discuss these findings in light of current models of speech production.
  • pdf Peperkamp, S. & Dupoux, E. (2002). A typological study of stress "deafness". In C. Gussenhoven & N. Warner (eds) Laboratory Phonology 7, 4-1, (pp 203-240) . [abstract] Previous research has shown that native speakers of French, as opposed to those of Spanish, exhibit stress `deafness', i.e. have difficulties distinguishing stress contrasts. In French, stress is non-contrastive, while in Spanish, stress is used to make lexical distinctions. We examine three other languages with non-contrastive stress, Finnish, Hungarian and Polish. In two experiments with a short-term memory sequence repetition task, we find that speakers of Finnish and Hungarian are like French speakers (i.e. exhibit stress `deafness'), but not those of Polish. We interpret these findings in the light of an acquisition framework, that states that infants decide whether or not to keep stress in their phonological representation during the first two years of life, based on information extractable from utterance edges. In particular, we argue that Polish infants, unlike French, Finnish and Hungarian ones, cannot extract the stress regularity of their language on the basis of what they have already learned. As a consequence, they keep stress in their phonological representation, and as adults, they do not have difficulties in distinguishing stress contrasts.
  • pdf Jacquemot, C., Dupoux, E., Pallier, C. & Bachoud-Lévi, A.C. (2002). Comprehending spoken words without hearing phonemes: A case study. Cortex, 38, 869-873. [abstract] In this paper, we describe a patient who presents a strong dissociation between performance on sublexical and lexical tasks in the unexpected direction. While he was extremely poor in a sublexical discrimination task, he was only mildly impaired in lexical tasks. The patient had a global aphasia resulting from a left parieto-temporal ischemia. Tested with the Boston Diagnostic Aphasia Examination, he showed impairment in oral comprehension, and strong deficits in naming and repetition. Here, we focus on his speech comprehension deficit and in particular on the relatively spared lexical level compared to the drastic impairment of the sublexical level.
  • pdf Gout, A., Christophe, A. & Dupoux, E. (2002). Testing Infants' Discrimination With the Orientation Latency Procedure. Infancy, 3(2), 249-259. [abstract] A new discrimination procedure based on the measurement of visual orientation latency to speech stimuli is introduced. Each participant listens to a series of short familiarization test trials. In each trial, 5 to 7 centrally-presented familiarization stimuli are followed by laterally-presented test stimuli. Infants were found to orient faster to different-category than to same-category test stimuli. This result was found despite a high degree of prosodic variability in the familiarization and test stimuli introduced by changes in talker and speaking rate. The combination of a multitrial design with use of acoustic and prosodic variability seems suitable for studying the representation of phonological categories.
  • pdf Kouider, S. & Dupoux, E. (2001). A functional disconnection between spoken and visual word recognition: evidence from unconscious priming. Cognition, 82(1), B35-B49. [abstract] The goal of the present study is to assess whether there is an automatic and obligatory activation of the phonological lexicon upon the presentation of a written word under unconscious processing conditions. We use a cross-modal version of the masked repetition priming procedure introduced by Forster and Davis (Journal of Experimental Psychology: Learning, Memory. and Cognition 10 (1984) 680) which consists of priming a spoken word by its written equivalent under masked conditions. These trials are randomly mixed with within-modal (visual-visual) repetition priming control trials. Our results show that cross-modal priming effects are absent unless primes are consciously perceived, as assessed by d ` scores obtained with a letter/pseudo discrimination task. In contrast, priming effects within the written modality are observed under conscious as well as unconscious processing conditions. We conclude that the systems underlying written and spoken word processing are, respectively, autonomous and connected only under conscious conditions.
  • pdf Dupoux, E., Peperkamp, S. & Sebastian-Galles, N. (2001). A robust method to study stress "deafness". Journal of the Acoustical Society of America, 110(3), 1606-1618. [abstract] Previous research by Dupoux et al. [J. Memory Lang. 36, 406-421 (1997)] has shown that French participants, as opposed to Spanish participants, have difficulties in distinguishing nonwords that differ only in the location of stress. Contrary to Spanish, French does not have contrastive stress, and French participants are ``deaf{''} to stress contrasts. The experimental paradigm used by Dupoux et al. (speeded ABX) yielded significant group differences, but did not allow for a sorting of individuals according to their stress ``deafness.{''} Individual assessment is crucial to study special populations, such as bilinguals or trained monolinguals. In this paper, a more robust paradigm based on a short-term memory sequence repetition task is proposed. In five French-Spanish cross-linguistic experiments, stress ``deafness{''} is shown to crucially depend upon a combination of memory load and phonetic variability in F0. In experiments 3 and 4, nonoverlapping distribution of individual results for French and Spanish participants is observed. The paradigm is thus appropriate for assessing stress deafness in individual participants.
  • pdf Dupoux, E., Pallier, C., Kakehi, K. & Mehler, J. (2001). New evidence for prelexical phonological processing in word recognition. Language and Cognitive Processes, 16(5-6), 491-505. [abstract] When presented with stimuli that contain illegal consonant clusters, Japanese listeners tend to hear an illusory vowel that makes their perception conform to the phonotactics of their language. In a previous paper, we suggested that this effect arises from language-specific prelexical processes. The present paper assesses the alternative hypothesis that this illusion is due to a ``top-down{''} lexical effect. We manipulate the lexical neighbourhood of nonwords that contain illegal consonant clusters and show that perception of the illusory vowel is not due to lexical influences. This demonstrates that phonotactic knowledge influences speech processing at an early stage.
  • pdf Bachoud-Lévi, A.C., Dupoux, E. & Degos, J.D. (2001). Syntactic and semantic organization in word form retrieval? Cortex, 37(5), 693-695. [abstract] Many studies have reported that naming disorders may affect selectively certain semantic categories (animals vs. vegetables or artifacts, see Caramazza and Shelton, 1998, for a review) or syntactic categories (open vs. closed class items, Friederici and Schoenle, 1980, nouns vs. verbs, Baxter and Warrington, 1985; Caramazza and Hillis, 1991; Daniele et al., 1994; McCarthy and Warrington, 1985; Miceli et al., 1988) suggesting that the conceptual system and the output lexicon are organized along both syntactic and semantic dimensions. Most current models of speech production distinguish two components in the output lexicon: lexical selection and word form retrieval. Lexical selection consists in comparing the conceptual representation of the object to be named to the lexical entries, and selecting the best match. Conceivably, this level should be both sensitive to syntactic and semantic parameters. Word form retrieval involves recovering the phonological information associated to the selected entry which is then used to construct a phonological plan to be executed by the articulatory system. Prima facie, word form retrieval should not be influenced by syntactic, and even more, semantic variables. However, Cohen et al. (1997) reported the case of a patient impaired in word form retrieval, as evidenced by a predominance of phonological paraphasias in naming and reading tasks, which totally spared names for numbers. The authors speculated that the topographical segregation of numbers in the conceptual system propagates along the speech production pathway, even down to word form retrieval. In this paper, we report the case of another aphasic patient who shows a word form retrieval impairment in production which surprisingly spares certain syntactic and semantic categories.
  • pdf Sebastian-Galles, N., Dupoux, E., Costa, A. & Mehler, J. (2000). Adaptation to time-compressed speech: Phonological determinants. Perception & Psychophysics, 62(4), 834-842. [abstract] Perceptual adaptation to time-compressed speech was analyzed in two experiments. Previous research has suggested that this adaptation phenomenon is language specific and takes place at the phonological level. Moreover, it has been proposed that adaptation should only be observed for languages that are rhythmically similar. This assumption was explored by studying adaptation to different time-compressed languages in Spanish speakers. In Experiment 1, the performances of Spanish-speaking subjects who adapted to Spanish, Italian, French, English, and Japanese were compared. In Experiment 2, subjects from the same population were tested with Greek sentences compressed to two different rates. The results showed adaptation for Spanish, Italian, and Greek and no adaptation for English and Japanese, with French being an intermediate case. To account for the data, we propose that variables other than just the rhythmic properties of the languages, such as the vowel system and/or the lexical stress pattern, must be considered. The Greek data also support the view that phonological, rather than lexical, information is a determining factor in adaptation to compressed speech.
  • pdf Le Clec'H, G., Dehaene, S., Cohen, L., Mehler, J., Dupoux, E., Poline, J., Lehericy, S., van de Moortele, P. & Le Bihan, D. (2000). Distinct cortical areas for names of numbers and body parts independent of language and input modality. Neuroimage, 12(4), 381-391. [abstract] Some models of word comprehension postulate that the processing of words presented in different modalities and languages ultimately converge's toward common cerebral systems associated with semantic-level processing and that the localization of these systems may vary with the category of semantic knowledge being accessed. We used functional magnetic resonance imaging to investigate this hypothesis with two categories of words, numerals, and body parts, for which the existence of distinct category-specific areas is debated in neuropsychology. Across two experiments, one with a blocked design and the other with an event-related design, a reproducible set of left-hemispheric parietal and prefrontal areas showed greater activation during the manipulation of topographical knowledge about body parts and a right-hemispheric parietal network during the manipulation of numerical quantities. These results complement the existing neuropsychological and brain-imaging literature by suggesting that within the extensive network of bilateral parietal regions active during both number and body-part processing, a subset shows category-specific responses independent of the language and modality of presentation.
  • pdf Dehaene-Lambertz, G., Dupoux, E. & Gout, A. (2000). Electrophysiological correlates of phonological processing: A cross-linguistic study. Journal of Cognitive Neuroscience, 12(4), 635-647. [abstract] It is well known that speech perception is deeply affected by the phoneme categories of the native language. Recent studies have found that phonotactics, i.e., constraints on the cooccurrence of phonemes within words, also have a considerable impact on speech perception routines. For example, Japanese does not allow (nonnasal) coda consonants. When presented with stimuli that violate this constraint, as in / ebzo/, Japanese adults report that they hear a /u/ between consonants, i.e.,/ebuzo/. We examine this phenomenon using event-related potentials (ERPs) on French and Japanese participants in order to study how and when the phonotactic properties of the native language affect speech perception routines. Trials using four similar precursor stimuli were presented followed by a test stimulus that was either identical or different depending on the presence or absence of an epenthetic vowel /u/ between two consonants (e.g., ``ebuzo ebuzo ebuzo-ebzo{''}). Behavioral results confirm that Japanese, unlike French participants, are not able to discriminate between identical and deviant trials. In ERPs, three mismatch responses were recorded in French participants. These responses were either absent or significantly weaker for Japanese. In particular, a component similar in latency and topography to the mismatch negativity (MMN) was recorded for French, but not for Japanese participants. Our results suggest that the impact of phonotactics cakes place early in speech processing and support models of speech perception, which postulate that the input signal is directly parsed into the native language phonological format. We speculate that such a fast computation of a phonological representation should facilitate lexical access, especially in degraded conditions.
  • pdf Dupoux, E., Kakehi, K., Hirose, Y., Pallier, C. & Mehler, J. (1999). Epenthetic vowels in Japanese: A perceptual illusion? Journal of Experimental Psychology-human Perception and Performance, 25(6), 1568-1578. [abstract] In 4 cross-linguistic experiments comparing French and Japanese Listeners, we found that the phonotactic properties of Japanese (a reduced set of syllable types) induce Japanese listeners to perceive ``illusory{''} vowels inside consonant clusters in vowel-consonant-consonant-vowel (VCCV) stimuli. In Experiments 1 and 2, a continuum of stimuli ranging from no vowel (e.g., ebzo) to a full vowel between the consonants (e.g., ebuzo) was used. Japanese, but not French participants, reported the presence of a vowel {[}u] between consonants, even in stimuli with no vowel. A speeded ABX discrimination paradigm was used in Experiments 3 and 4 and revealed that Japanese participants had trouble discriminating between VCCV and VCuCV stimuli. French participants, in contrast, had problems discriminating items that differed in vowel length (ebuzo vs. ebunzo), a distinctive contrast in Japanese bur not in French. It is concluded that models of speech perception have to be revised to account for phonotactically based assimilations.
  • pdf Perani, D., Paulesu, E., Galles, N., Dupoux, E., Dehaene, S., Bettinardi, V., Cappa, S., Fazio, F. & Mehler, J. (1998). The bilingual brain - Proficiency and age of acquisition of the second language. Brain, 121(10), 1841-1852. [abstract] Functional imaging methods show differences in the pattern of cerebral activation associated with the subject's native language (L1) compared with a second language (L2). In a recent PET investigation on bilingualism we showed that auditory processing of stories in L1 (Italian) engages the temporal lobes and temporoparietal cortex more extensively than L2 (English), However, in that study the Italian subjects learned L2 late and attained a fair, but not an excellent command of this language (low proficiency, late acquisition bilinguals), Thus, the different patterns of activation could be ascribed either to age of acquisition or to proficiency level, In the current study we use a similar paradigm to evaluate the effect of early and late acquisition of L2 in highly proficient bilinguals. We studied a group of Italian-English bilinguals who acquired L2 after the age of 10 years thigh proficiency, late acquisition bilinguals) and a group of Spanish-Catalan bilinguals who acquired L2 before the age of 4 years thigh proficiency, early acquisition bilinguals), The differing cortical responses we had observed when low proficiency volunteers listened to stories in L1 and L2 were not found in either of the high proficiency groups in this Study, Several brain areas, similar to those observed for L1 in low proficiency bilinguals, were activated by L2, These findings suggest that, at least for pairs of L1 and L2 languages that are fairly close, attained proficiency is more important than age of acquisition as a determinant of the cortical representation of L2.
  • pdf Pallier, C., Sebastian-Galles, N., Dupoux, E., Christophe, A. & Mehler, J. (1998). Perceptual adjustment to time-compressed speech: A cross-linguistic study. Memory & Cognition, 26(4), 844-851. [abstract] Previous research has shown that, when hearers listen to artificially speeded speech, their performance improves over the course of 10-15 sentences, as if their perceptual system was ``adapting{''} to these fast rates of speech. In this paper, we further investigate the mechanisms that are responsible for such effects. In Experiment 1, we report that, for bilingual speakers of Catalan and Spanish, exposure to compressed sentences in either language improves performance on sentences in the ether language. Experiment 2 reports that Catalan/Spanish transfer of performance occurs even in monolingual speakers of Spanish who do not understand Catalan. In Experiment 3, we study another pair of languages-namely, English and French-and report no transfer of adaptation between these two languages for English-French bilinguals. Experiment 4, with monolingual English speakers, assesses transfer of adaptation from French, Dutch, and English toward English. Here we find that there is no adaptation from French and intermediate adaptation from Dutch. We discuss the locus of the adaptation to compressed speech and relate our findings to other cross-linguistic studies in speech perception.
  • pdf Bachoud-Lévi, A.C., Dupoux, E., Cohen, L. & Mehler, J. (1998). Where is the length effect? A cross-linguistic study of speech production Journal of Memory and Language, 39(3), 331-346. [abstract] Many models of speech production assume that one cannot begin to articular a word before all its segmental units are inserted into the articulatory plan. Moreover, some of these models assume that segments are serially inserted from left to right. As a consequence, latencies to name words should increase with word length In a series of five experiments, however, we showed that the time to name a picture or retrieve a word associated with a symbol is not affected by the length of the word. Experiments 1 and 2 used French materials and participants, while Experiments 3, 4, and 5 were conducted with English materials and participants. These results are discussed in relation to current models of speech production and previous reports of length effects are reevaluated in light of these findings. We conclude that if words are encoded serially, then articulation can start before an entire phonological word has been encoded.
  • pdf Pallier, C., Dupoux, E. & Jeannin, X. (1997). EXPE: An expandable programming language for on-line psychological experiments. Behavior Research Methods Instruments & Computers, 29(3), 322-327. [abstract] EXPE is a DOS program for the design and running of experiments that involve the presentation of audio or visual stimuli and the collection of on-line or off-line behavioral responses. Its flexibility also makes it a useful tool for the rapid design of protocols for testing neuropsychological patients. EXPE provides a powerful scripting language that allows the user to specify all the components of an experiment in a human readable file. Subjects' responses are saved in a user-specified format as well as in readable AscII files. The user can easily add new commands to the language: All the instructions are calls to functions written in independent Borland Pascal units. Thus, users can link their own Pascal procedures to EXPE to meet virtually any special need. This makes it possible, for example, to adapt EXPE to new hardware, such as new sound or video boards.
  • pdf Dupoux, E., Pallier, C., Sebastian, N. & Mehler, J. (1997). A destressing "deafness" in French? Journal of Memory and Language, 36(3), 406-421. [abstract] Spanish but not French uses accent to distinguish between words (e.g., topo vs topo). Two populations of subjects were tested on the same materials to determine whether this difference has an impact on the perceptual capacities of listeners. In Experiment 1, using an ABX paradigm, we found that French subjects had significantly more difficulties than Spanish subjects in performing an ABX classification task based on accent. In Experiment 2, we found that Spanish subjects were unable to ignore irrelevant differences in accent in a phoneme-based ABX tack, whereas French subjects had no difficulty at all. In Experiment 3, we replicated the basic French finding and found that Spanish subjects benefited from redundant accent information even when phonemic information alone was sufficient to perform the task. In our final experiment, we showed that French subjects can be made to respond to the acoustic correlates of accent; therefore their difficulty in Experiment 1 seems to be located at the level of short-term memory. The implications of these findings for language-specific processing and acquisition are discussed.
  • pdf Dupoux, E. & Green, K. (1997). Perceptual adjustment to highly compressed speech: Effects of talker and rate changes. Journal of Experimental Psychology-human Perception and Performance, 23(3), 914-927. [abstract] This study investigated the perceptual adjustments that occur when listeners recognize highly compressed speech. In Experiment 1, adjustment was examined as a function of the amount of exposure to compressed speech by use of 2 different speakers and compression rates. The results demonstrated that adjustment takes place over a number of sentences, depending on the compression rate. Lower compression rates required less experience before full adjustment occurred. In Experiment 2, the impact of an abrupt change in talker characteristics was investigated; in Experiment 3, the impact of an abrupt change in compression rate was studied. The results of these 2 experiments indicated that sudden changes in talker characteristics or compression rate had little impact on the adjustment process. The findings are discussed with respect to the level of speech processing at which such adjustment might occur.
  • pdf Dehaene, S., Dupoux, E., Mehler, J., Cohen, L., Paulesu, E., Perani, D., van de Moortele, P., Lehericy, S. & LeBihan, D. (1997). Anatomical variability in the cortical representation of first and second language. Neuroreport, 8(17), 3809-3815. [abstract] FUNCTIONAL magnetic resonance imaging was used to assess inter-subject variability in the cortical representation of language comprehension processes. Moderately fluent French-English bilinguals were scanned while they listened to stories in their first language (L1 = French) or in a second language (L2 = English) acquired at school after the age of seven. In all subjects, listening to L1 always activated a similar set of areas in the left temporal lobe, clustered along the left superior temporal sulcus. Listening to L2, however, activated a highly variable network of left and right temporal and frontal areas, sometimes restricted only to right-hemispheric regions. These results support the hypothesis that first language acquisition relies on a dedicated left-hemispheric cerebral network, while late second language acquisition is not necessarily associated with a reproducible biological substrate. The postulated contribution of the right hemisphere to L2 comprehension(1) is found to hold only on average, individual subjects varying from complete right lateralization to standard left lateralization for L2.
  • pdf Christophe, A., Guasti, T., Nespor, M., Dupoux, E. & Van Ooyen, B. (1997). Reflections on phonological bootstrapping: Its role for lexical and syntactic acquisition. Language and Cognitive Processes, 12(5-6), 585-612. [abstract] ``Phonological bootstrapping'' is the hypothesis that a purely phonological analysis of the speech signal may allow infants to start acquiring the lexicon and syntax of their native language (Morgan & Demuth, 1996a) To assess this hypothesis, a first step is to estimate how much information is provided by a phonological analysis of the speech input conducted in the absence of any prior (language-specific) knowledge in other domains such as syntax or semantics. We first review existing work on how babies may start acquiring a lexicon by relying on distributional regularities, phonotactics, typical word shape and prosodic boundary cues. Taken together, these sources of information may enable babies to learn the sound pattern of a reasonable number of the words in their native language. We then focus on syntax acquisition and discuss how babies may set one of the major structural syntactic parameters, the head direction parameter, by listening to prominence within phonological phrases and before they possess any words. Next, we discuss how babies may hope to acquire function words early, and how this knowledge would help lexical segmentation and acquisition, as well as syntactic analysis and acquisition. We then present a model of phonological bootstrapping of the lexicon and syntax that helps us to illustrate the congruence between problems. Some sources of information appear to be useful for more than one purpose; for example, phonological phrases and function words may help lexical segmentation as well as segmentation into syntactic phrases and labelling (NP, VP, etc.). Although our model derives directly from our reflection on acquisition, we argue that it may also be adequate as a model of adult speech processing. Since adults allow a greater variety of experimental paradigms, an advantage of our approach is that specific hypotheses can be tested on both populations. We illustrate this aspect in the final section of the paper, where we present the results of an adult experiment which indicates that prosodic boundaries and function words play an important role in continuous speech processing.
  • Perani, D., Dehaene, S., Grassi, F., Cohen, L., Cappa, S., Paulesu, E., Dupoux, E., Fazio, F. & Mehler, J. (1996). A PET study of native and foreign language processing. Brain and Language, 55(1), 99-101. [abstract] We used positron emission tomography to study brain activity in adults while they were listening to stories in their native language, in a second language acquired after the age of seven and in a third unknown language. Several areas, similar to those previously observed in monolinguals, were activated by the native but not by the second language. Both the second and the unknown language yielded distinct left-hemispheric activations in areas specialized for phonological processing, which were not engaged in a backward speech control task. These results indicate that some brain areas are shaped by early exposure to the maternal language, and are not necessarily activated by the processing of a second language to which they have been exposed for a limited time later in life.
  • pdf Perani, D., Dehaene, S., Grassi, F., Cohen, L., Cappa, S., Dupoux, E., Fazio, F. & Mehler, J. (1996). Brain processing of native and foreign languages. Neuroreport, 7(15-17), 2439-2444. [abstract] We used positron emission tomography to study brain activity in adults while they were listening to stories in their native language, in a second language acquired after the age of seven, and in a third unknown language. Several areas, similar to those previously observed in monolinguals, were activated by the native but not by the second language. Both the second and the unknown language yielded distinct left-hemispheric activations in areas specialized for phonological processing, which were not engaged by a backward speech control task. These results indicate that some brain areas are shaped by early exposure to the maternal language, and are not necessarily activated by the processing of a second language to which they have been exposed for a limited time later in life.
  • pdf Christophe, A. & Dupoux, E. (1996). Bootstrapping lexical acquisition: The role of prosodic structure. Linguistic Review, 13(3-4), 383-412.
  • pdf Mehler, J., Dupoux, E., Pallier, C. & Dehaene-Lambertz, G. (1994). Cross-linguistic approaches to speech processing. Current Opinion in Neurobiology, 4(2), 171-176. [abstract] Recent advances in the field of speech processing indicate that speakers of differing languages process speech relying on units that are appropriate to the rhythmical properties of their maternal tongue. Studies with young infants suggest that the acquisition of these processing routines takes place before the end of the first year of life. Further evidence shows that the left hemisphere initially processes any language and gradually becomes specialized for the maternal language.
  • Mehler, J., Bertoncini, J., Dupoux, E. & Pallier, C. (1994). The role of suprasegmental in speech perception and acquisition. Dokkyo International Review, 7, 343-376.
  • pdf Christophe, A., Dupoux, E., Bertoncini, J. & Mehler, J. (1994). Do infants perceive word boundaries ? An empirical study of the bootstrapping of lexical acquisition Journal of the Acoustical Society of America, 95(3), 1570-1580. [abstract] Babies, like adults, hear mostly continuous speech. Unlike adults, however, they are not acquainted with the words that constitute the utterances; yet in order to construct representations for words, they have to retrieve them from the speech wave. Given the apparent lack of obvious cues to word boundaries (such as pauses between words), this is not a trivial problem. Among the several mechanisms that could be explored to solve this bootstrapping problem for lexical acquisition, a tentative but reasonable one posits the existence of some cues (other than silence) that signal word boundaries. In order to test this hypothesis, infants were used as informants in our experiments. It was hypothesized that if word boundary cues exist, and if infants are to use them in the course of language acquisition, then they should at least perceive these cues. As a consequence, infants should be able to discriminate sequences that contain a word boundary from those that do not. A number of bisyllabic stimuli were extracted either from within French words (e.g., mati in mathematicien), or from between words (e.g., mati in panorama typique). Three-day-old infants were tested with a non-nutritive sucking paradigm, and the results of two experiments suggest that infants can discriminate between items that contain a word boundary and items that do not. It is therefore conceivable that newborns are already sensitive to cues that correlate with word boundaries. This result lends plausibility to the hypothesis that infants might use word boundary cues during lexical acquisition.
  • pdf Mehler, J., Sebastian-Galles, N., Altmann, G., Dupoux, E., Christophe, A. & Pallier, C. (1993). Understanding compressed sentences - The role of rhythm and meaning. Annals of the New York Academy of Sciences, 682, 272-282.
  • pdf Sebastian-Galles, N., Dupoux, E., Segui, J. & Mehler, J. (1992). Contrasting syllabic effects in Catalan and Spanish. Journal of Memory and Language, 31(1), 18-32. [abstract] The role of syllabic structure and stress assignment in the perceptual segmentation of Catalan and Spanish words is studied. Previous research suggested that the syllable is the segmentation unit for languages with clear syllabic structure. In Experiment I, we found that syllabification effects are found in Catalan but only in unstressed first syllable word-targets. No syllabification is obtained when the tirst syllable is stressed. In Experiment 2, we failed to find any syllabification effect in Spanish, regardless of stress in word-targets. Nonetheless, Experiment 3 shows that syllabification effects emerge in Spanish when subjects are made to respond to 250 ms slower than in Experiment 2. On the basis of these results, a modified version of the original syllabic hypothesis is proposed. We propose that both task demands and language specific parameters play a role in the presence or absence of syllabification effects in segment detection.
  • pdf Dupoux, E. & Mehler, J. (1990). Monitoring the lexicon with normal and compressed speech - Frequency effects and the prelexical code. Journal of Memory and Language, 29(3), 316-335. [abstract] Previous reportss uggest that initial phonemes are monitored on the basis of lexical information in monosyllabic words and on the basis of acoustic/phonetic information in multisyllabic words (Cutler, Mehler, Norris, & Segui, 1987). In Experiment 1, a frequency effect was found with item-initial phoneme monitoring for monosyllabic but not for bisyllabic words. In Experiments 2 and 3, we used speech time-compressed at a rate of 50% and failed to find a frequency effect for bisyllabic words, even though they were shorter than uncompressed monosyllables. In Experiment 4, we used a lexical decision task on the same items and found a frequency effect for both mono- and bisyllabic words. Results are interpreted on the basis of the dual code hypothesis. Implications for the nature of the prelexical code are discussed.
  • pdf Dehaene, S., Dupoux, E. & Mehler, J. (1990). Is numerical comparison digital? Analogical and symbolic effects in two digit number comparison Journal of Experimental Psychology-human Perception and Performance, 16(3), 626-641. [abstract] Do Ss compare multidigit numbers digit by digit (symbolic model) or do they compute the whole magnitude of the numbers before comparing them (holistic model)? In 4 experiments of timed 2-digit number comparisons with a fixed standard, the findings of Hinrichs, Yurko, and Hu (1981) were extended with French Ss. Reaction times (RTs) decreased with target-standard distance, with discontinuities at the boundaries of the standard's decade appearing only with standards 55 and 66 but not with 65. The data are compatible with the holistic model. A symbolic interference model that posits the simul~meous comparison of decades and units can also account for the results. To separate the 2 models, the decades and units digits of target numbers were presented asynchronously in Experiment 4. Contrary to the prediction of the interference model, presenting the units before the decades did not change the influence of units on RTs. Pros and cons of the holistic model are discussed.
  • Books

  • google_book Dupoux, E. (2001). Language, Brain and Cognitive Development: Essays in Honor of Jacques Mehler., Cambridge, Mass: MIT Press (translated in French: (2002). Les langages du cerveau, Paris: O. Jacob.). [abstract] ABSTRACT = Au début des années 1960, la cognition n'était connue que d'un groupe de scientifiques d'avant-garde. L'audacieux projet de ce domaine de recherche était de soumettre l'esprit humain à un examen rationnel fondé sur la philosophie, la linguistique, l'informatique, la psychologie. Quarante ans plus tard, les sciences cognitives se sont épanouies. Quels ont été les vrais progrès ? Qu'avons-nous appris sur le langage, la cognition, le cerveau ? Quels ont été les échecs et les succès ? Quelles sont les voies d'avenir les plus prometteuses ?
  • google_book Mehler, J. & Dupoux, E. (1990). Naître Humain., Paris: Odile Jacob. Translated and published in English (Blackwell), Chineese (Yuan-Liou Publishers), Greek (Alexiandria) Italian, (Mondadori), Japanese, (Fujiwara-Shoten), Portuguese (Piaget), & Spanish (Alianza).
  • Chapters, commentaries, etc.

  • pdf Synnaeve, G. & Dupoux, E. (2015). Weakly Supervised Multi-Embeddings Learning of Acoustic Models. In ICLR Workshop, (pp ArXiv 1412.6645 [cs.SD]) . [abstract] ABSTRACT = We trained a Siamese network with multi-task same/different information on a speech dataset, and found that it was possible to share a network for both tasks without a loss in performance. The first task was to discriminate between two same or different words, and the second was to discriminate between two same or different talkers.
  • pdf Dupoux, E. (2015). Category Learning in Songbirds: top-down effects are not unique to humans. Current Biology, 25(16), R718-R720. [abstract] ABSTRACT = Human infants use higher order patterns (words) to learn the sound category of their language. A new study using artificial patterns made up of naturally occurring vocalizations shows that a similar mechanism may also exist in songbirds.
  • pdf Dupoux, E. (2014). Towards Quantitative Studies of Early Cognitive Development. Autonomous Mental Development Technical Committee Newsletter, 11(1), 10-11. [abstract] ABSTRACT = We present a new framework for the evaluation of speech representations in zero-resource settings, that extends and complements previous work by Carlin, Jansen and Hermansky [1]. In particular, we replace their Same/Different discrimination task by several Minimal-Pair ABX (MP-ABX) tasks. We explain the analytical advantages of this new framework and apply it to decompose the standard signal processing pipelines for computing PLP and MFC coefficients. This method enables us to confirm and quantify a variety of well-known and not-so-well-known results in a single framework.
  • pdf Synnaeve, G. & Dupoux, E. (2013). In Depth Deep Beliefs Networks for Phone Recognition. In Poster presented in NIPS-2013.
  • pdf Cleret de Langavant, L., Charlotte Jacquemot, , Bachoud-Lévi, A.C. & Dupoux, E. (2013). The second person in `I'-`you'-`it' triadic interactions. Behavioral and Brain Sciences, 36(416-417). [abstract] Generative linguistics' search for linguistic universals (1) is not comparable to the vague explanatory suggestions of the article; (2) clearly merits a more central place than linguistic typology in cognitive science; (3) is fundamentally untouched by the article's empirical arguments; (4) best explains the important facts of linguistic diversity; and (5) illuminates the dominant component of language's ``biocultural'' nature: biology.
  • pdf Ramus, F., Peperkamp, S., Christophe, A., Jacquemot, C., Kouider, S. & Dupoux, E. (2011). A psycholinguistic perspective on the acquisition of phonology. In C. Fougeron, B. Kühnert, d'Imperio M. & Vallée N. (eds) Laboratory Phonology, 10, Berlin: Mouton de Gruyter. [abstract] This paper discusses the target articles by Fikkert, Vihman, and Goldrick & Larson, which address diverse aspects of the acquisition of phonology. These topics are examined using a wide range of tasks and experimental paradigms across different ages. Various levels of processing and representation are thus involved. The main point of the present paper is that such data can be coherently interpreted only within a particular information-processing model that specifies in sufficient detail the different levels of processing and representation. In this paper, we first present the basic architecture of a model of speech perception and production, justifying it with psycholinguistic and neuropsychological data. We then use this model to interpret data from the target articles relative to the acquisition of phonology.
  • pdf Cova, F., Dupoux, E. & Jacob, P. (2010). Moral evaluation shapes linguistic reports of others' psychological states, not theory-of-mind judgments. Behavioral and Brain Sciences, 33(4), 334-335. [abstract] We use psychological concepts (e.g., intention and desire) when we ascribe psychological states to others for purposes of describing, explaining, and predicting their actions. Does the evidence reported by Knobe show, as he thinks, that moral evaluation shapes our mastery of psychological concepts? We argue that the evidence so far shows instead that moral evaluation shapes the way we report, not the way we think about, others' psychological states.
  • pdf Smolensky, P. & Dupoux, E. (2009). Universals in cognitive theories of language. Behavioral and Brain Sciences, 32(5), 468-469. [abstract] Generative linguistics' search for linguistic universals (1) is not comparable to the vague explanatory suggestions of the article; (2) clearly merits a more central place than linguistic typology in cognitive science; (3) is fundamentally untouched by the article's empirical arguments; (4) best explains the important facts of linguistic diversity; and (5) illuminates the dominant component of language's ``biocultural'' nature: biology.
  • pdf Darcy, I., Ramus, F., Christophe, A., Kinzler, K.D. & Dupoux, E. (2009). Phonological knowledge in compensation for native and non-native assimilation. In F. Kügler, C. Féry & R. van de Vijver (eds) Variation and Gradience in Phonetics and Phonology, (pp 265-309) Berlin: Mouton De Gruyter. [abstract] We investigated whether compensation for phonological assimilation depends on language-universal or language-specific processes. To this end, we tested two different assimilation rules, one that exists in English and involves place of articulation, and another that exists in French and involves voicing. Both contrasts were tested on speakers of French, British English and American English. In three experiments using a word detection task, we observed that monolingual participants showed a significantly higher degree of compensation for phonological changes that correspond to rules existing in their language than to rules that do not exist in their language (even though they are phonologically possible since they exist in another language). Thus, French participants compensated more for voicing than place assimilation, while British and American English participants compensated more for place than voicing assimilation. In all three experiments, we also found that the non-native rule induced a very small but significant compensation effect, suggesting that both a language-specific and a language-universal mechanism are at play. In Experiment 4, we studied native speakers of British English who were late learners of French: they showed the British pattern of results even when listening to French stimuli, confirming that compensation for assimilation is induced by language-specific phonological processes rather than specific phonetic cues. The results are discussed in light of current models of lexical access and phonological processing.
  • pdf Jacob, P. & Dupoux, E. (2008). A precursor of moral judgment in human infants? Current Biology, 18(5), R216-R218. [abstract] Human infants evaluate social interactions well before they can speak, and show a preference for characters that help others over characters that are not cooperative or are hindering.
  • pdf Dupoux, E. & Jacob, P. (2008). Response to Dwyer and Hauser: Sounding the retreat? Trends in Cognitive Sciences, 12(1), 2-3.
  • pdf Le Calvez, R., Peperkamp, S. & Dupoux, E. (2007). Bottom-up learning of phonemes: A computational study. In S. Vosniadou, D. Kayser & A. Protopapas (eds) Proceedings of the Second European Cognitive Science Conference, Taylor and Francis. (French translation in Mathematiques et Sciences Humaines 2007(4), 99-111). [abstract] We present a computational evaluation of a hypothesis according to which distributional information is suffic ient to acquire allophonic rules (and hence phonemes) in a bottom-up fashion. The hypothesis was tested using a measure based on information theory that com- pares distributions. The test was conducted on several artificial language corpora and on two natural corpora containing transcriptions of speech directed to infants from two typologically distant languages (French and Japanese). The measure was complemented with three filters, one concerning the statistical reliability due to sample size and two concerning the following univer- sal properties of allophonic rules: constituents of an al- lophonic rule should be phonetically similar, and allo- phonic rules should be assimilatory in nature.
  • pdf Kouider, S., de Gardelle, V. & Dupoux, E. (2007). Partial awareness and the illusion of phenomenal consciousness (Comment on Block, 2007). Behavioral and Brain Sciences, 30(5-6), 510-511. [abstract] The dissociation Block provides between phenomenal and access consciousness (P-consciousness and A-consciousness) captures much of our intuition about conscious experience. However, it raises a major methodological puzzle, and is not uniquely supported by the empirical evidence. We provide an alternative interpretation based on the notion of levels of representation and partial awareness.
  • pdf Kouider, S. & Dupoux, E. (2007). How "semantic'' is response priming restricted to practiced items? A reply to Abrams & Grinspan (2007) Consciousness and Cognition, 16(4), 954-956.
  • pdf Peperkamp, S., Skoruppa, K. & Dupoux, E. (2006). The role of phonetic naturalness in phonological rule acquisition. In D. Bamman, T. Magnitskaia & C. Zaller (eds) Proceedings of the 30th Annual Boston University Conference on Language Development, Vols 1 and 2, (pp 464-475) . [abstract] The role of naturalness constraints in phonological learning is of considerable theoretical importance for linguistically motivated models of language acquisition. However, the existence of naturalness effects is still not resting on firm empirical grounds. P&D (in press) exposed French subjects to an artificial language consisting of determiner + noun phrases which obey either a natural allophonic rule that voices a subclass of obstruents intervocalically, or an unnatural one that defines arbitrary relationships among certain obstruents intervocalically. After exposure, a phrase-picture matching task was used to assess whether subjects had learned the allophonic distributions and hence distinguished between phonemic and allophonic contrasts among obstruents for the purposes of word identification. Surprisingly, P&D (in press) found that natural assimilatory rules and unnatural arbitrary rules were learned with equal ease. In the present study, we use exactly the same exposure phase, but change the test phase: here, subjects have to produce a noun phrase upon the presentation of a picture, both for nouns that they have been trained on during the exposure phase, and for novel nouns. We find that with this more ecologically valid, but also more demanding task, a naturalness effect emerges: subjects learned the rule on old items and extended it to novel items, but ony for the natural assimilatory rules, not for the nonntatural arbitrary rules. We discuss these findings in relation to existing studies of the acquisition of phonological rules. We distinguish at least three constraints that characterize rule naturalness, and discuss the role of task demands and response strategies in relation to the emergence of naturalness effects in learning studies using artificial languages.
  • pdf Dupoux, E. (2004). The Acquisition of Discrete Segmental Categories: Data and Model. In Proceedings of the 18th International Congress of Acoustics, Kyoto. [abstract] The way in which we parse continuous speech into discrete phonemes is highly language-dependant. Here, we first report that this phenomenon not only depends on the inventory of phonetic distinctions in the language, but also on the inventory of syllabic types. This is illustrated by studies showing that Japanese listeners perceptually insert epenthetic vowels inside illegal consonant clusters in order to make them legal. We then argue that this raises a bootstrapping problem for language acquisition, as the learning of phonetic inventories and syllabic types depend on each other. We present an acquisition model based on the storing and analysis of phonetic syllabic templates. We argue that this model has the potential of solving the bootstrapping problem as well as a range of observation regarding perceptual categorization for speech sounds.
  • pdf Peperkamp, S., Pettinato, M. & Dupoux, E. (2003). Allophonic variation and the acquisition of phoneme categories. In B. Beachley, A. Brown & F. Conlin (eds) BUCLD 27: Annual Boston University Conference on Language Development, Vols 1 and 2, Proceedings, (pp 650-661) .
  • pdf Peperkamp, S. & Dupoux, E. (2003). Reinterpreting loanword adaptations: The role of perception. In Proceedings of the 15th International Congress of Phonetic Sciences, (pp 367-370) . [abstract] Standard phonological accounts of loanword adaptations state that the inputs to the adaptations are constituted by the surface forms of the words in the source language and that the adaptations are computed by the phonological grammar of the borrowing language. In processing terms, this means that in perception, the phonetic form of the source words is faithfully copied onto an abstract underlying form, and that adaptations are produced by the standard phonological processes in production. We argue that this is at odds with speech perception models and propose that loanword adaptations take place in perception and are defined as phonetically minimal transformations.
  • pdf Peperkamp, S. & Dupoux, E. (2002). Coping with phonological variation in early lexical acquisition. In I. Lasser(ed) The Process of Language Acquisition, (pp 359-385) Berlin: Peter Lang Verlag. [abstract] Models of lexical acquisition assume that infants can somehow extract unique word forms out of the speech stream before they acquire the meaning of words (e.g. Siskind 1996). However, words often surface with different phonetic forms due to the application of postlexical phonological processes; that is, surface word forms exhibit what we call phonological variation. In this paper, we will examine if and how infants that do not have a semantic lexicon might undo phonological variation, i.e. deduce which phonological processes apply and infer unique underlying word forms that will constitute lexical entries. We will propose a learning mechanism that deduces which rule applies and infers underlying phonemes and word forms. This mechanism is based on an examination of the distribution of either surface segments or surface word forms. The distribution of segments will be shown to provide sufficient information in the case of allophonic rules, i.e. rules that involves segments that do not otherwise occur in the language; the distribution of segments that are introduced by this type of rule is complementary to that of segments that are the direct phonetic realization of certain phonemes. The distribution of word forms will be shown to be necessary in cases in which all surface segments have a phonemic status in the language. In particular, infants can make use of the fact that certain word forms - i.e. the ones that have undergone the rule - fail to occur at the left or right edge of certain phrasal constituents, where the context for application of the rule is never met. This proposal makes predictions regarding the order in which various types of phonological variations can be coped with in the infant.
  • pdf Dupoux, E. & Peperkamp, S. (2002). Fossil markers of language development: phonological deafnesses in adult speech processing. In B. Laks & J. Durand (eds) Phonetics, Phonology, and Cognition, (pp 168-190) Oxford: Oxford University Press.. [abstract] The sound pattern of the language(s) we have heard as infants affects the way in which we perceive linguistic sounds as adults. Typically, some foreign sounds are very difficult to perceive accurately, even after extensive training. For instance, native speakers of French have troubles distinguishing foreign words that differ only in the position of main stress, French being a language in which stress is not contrastive. In this paper, we propose to explore the perception of foreign sounds cross- linguistically in order to understand the processes that govern early language acquisition. Specifically, we propose to test the hypothesis that early language acquisition begins by using only regularities that infants can observe in the surface speech stream (Bottom-Up Bootstrapping), and compare it with the hypothesis that they use all possible sources of information, including, for instance, word boundaries (Interactive Bootstrapping). We set up a research paradigm using the stress system, since it allows to test the various options at hand within a single test procedure. We distinguish four types of regular stress systems the acquisition of which requires different sources of information. We show that the two hypotheses make contrastive predictions as to the pattern of stress perception of adults in these four types of languages. We conclude that cross-linguistic research of adults speech perception, when coupled with detailed linguistic analysis, can be brought to bear on important issues of language acquisition.
  • Bachoud-Lévi, A.C. & Dupoux, E. (2001). L'effet de longueur et la production des mots parlés. Psychologie française, 46, 65-76.
  • pdf Peperkamp, S., Dupoux, E. & Sebastián-Gallés, N. (1999). Perception of stress by french, spanish, and bilingual subjects. In Proceedings of Eurospeech '99, 6, (pp 2683-2686) . [abstract] Previous research has shown that French subjects, as opposed to Spanish subjects, have difficulties in distinguishing two words that differ only as far as the location of stress is concerned. In French, stress is not contrastive, and French subjects are `deaf' to stress contrasts. In Experiment 1, we replicate this finding with a new and more powerful paradigm for assessing the perception of stress. With this new method, we obtain a complete separation of the two subject populations. In Experiment 2, we test highly proficient French-Spanish bilinguals with the same paradigm. Our findings are that the performance of individual bilinguals is either Frenchlike or Spanish-like. The factor that best predicts the bilingual's performance is the country in which the subject is born. Consequences for models of bilingualism are discussed.
  • pdf Dupoux, E., Fushimi, T., Kakehi, K. & Mehler, J. (1999). Prelexical locus of an illusory vowel effect in japanese. In Eurospeech '99 Proceedings; ESCA 7th European Conference on Speech Communication and Technology. [abstract] Studies in vision have demonstrated that the visual system can induce the perception of illusory contours. In this study we document a similar phenomenon in the auditory mode: Japanese speakers report perceiving vowels that are absent in the acoustic signal. Such an illusion is due to the fact that in Japanese, succession of consonants are not allowed. Hence the linguistic system inserts an illusory vowel between adjacent vowels in order to conform to the expected pattern in this language. Here, we manipulate the lexical neighborhood of nonwords that contain illegal consonant clusters and show that this illusion is not due to lexical influence. Rather, it arises before lexical knowledge is activated, suggesting that phonotactics impact perception routines at a very early processing stage.
  • Dupoux, E. & Mehler, J. (1999). Non-Developmental studies of Development: examples from newborn research, bilingualism, and brain imaging. In C. Rovee-Collier, L. Lipsitt & Hayne H. (eds) Advances in infancy research, 12, (pp 375-406) Stamford, Connecticut: Ablex Publishing Corporation.
  • Mehler, J., Dupoux, E., Nazzi, T. & Dehaene-Lambertz, G. (1996). Coping with linguistic diversity: The infant's viewpoint. In J. Morgan & Demuth K.D. (eds) From Signal to Syntax: Bootstrapping from speech to grammar in early acquisition, (pp 101-116) Hillsdale, NJ: Erlbaum.
  • Hammond, M. & Dupoux, E. (1996). Psychophonology. In J. Durand & B. Laks (eds) Current Trends in Phonology: Models and Methods, (pp 281-304) .
  • Dupoux, E. (1993). The time course of prelexical processing: The syllabic hypothesis revisited. In G.&.S. Altmann (eds) Cognitive Models of Speech Processing, (pp 81-114) Hillsdale, NJ: Erlbaum.
  • pdf Dupoux, E. & Mehler, J. (1992). Unifying awareness and on line studies of speech: A tentative framework. In J. Alegria, D. Holender, J. Morais & Radeau M. (eds) Analytic approaches to human cognition, (pp 59-75) The Netherlands: Elsevier. [abstract] Generally, studies of speech recognitio n are related to theories of performance while studies of awareness are thought to bear upon language competence. In our concept ion, both area s or resea rch contribute to our unders tandlng of processing and of the represeetanons that the subjeds use when listening to speech. We present a unitary framework within which it becomes pouibk: to incorporate the results from on-line speech rccognitlon studies and from studies of the awareness that the language user has of speech sc:gmen~. In puticular, we argue that it is necessary to include a descriptlon of the manner in which acoustic-phonet ic information is transduced , and represented in order for us to understand how subjects come to decide to respond or not in a psycholinguistic expe riment. Particular attention is given to the data from on-line chunk detection experiments and to the potential role of orthographic representation.
  • Dupoux, E. & Mehler, J. (1992). La segmentation de la parole. Courier du CNRS.
  • Christophe, A., Dupoux, E. & Mehler, J. (1992). How do infants extract words from the speech stream? A discussion of the bootstrapping problem for lexical acquisition In Proceedings of Child Language Research Forum, Stanford, CA.
  • pdf Segui, J., Dupoux, E. & Mehler, J. (1990). The role of the syllable in speech segmentation, phoneme identification and lexical access. In G. Altmann(ed) Cognitive Models of Speech Processing, (pp 263-280) Cambridge Mass: MIT Press.
  • pdf Mehler, J., Dupoux, E. & Segui, J. (1990). Constraining models of lexical access: The onset of word recognition. In G. Altmann(ed) Cognitive Models of Speech Processing, (pp 236-262) Cambridge Mass: MIT Press.
  • Mehler, J. & Dupoux, E. (1987). De la psychologie à la science cognitive. Le Débat, 47, 65-87.
  • Unpublished manuscripts

    Attention: these manuscripts are either unpublished, or in revision. If you want to quote one of them, please send me an email.

  • pdf Pallier, C., Dupoux, E. & Jeannin, X. (1997). EXPE6 Reference manual.. [abstract] Expe is an experiment generator for PC computers: it allows to run cognitive psychology experiments that involve the presentation of audio or visual stimuli and the collection of on-line or off-line behavioral responses (e.g. discrimination tasks, auditory target detection tasks, lexical decision and picture naming experiments...). Its flexibility makes it also a very useful tool for the rapid design of protocols for testing neuropsychological patients. Expe provides a powerful scripting language which allows the user to specify with human readable commands, all the components of an experiment (materials, stimulus presentation, training, instructions, etc...). Subjects' responses are saved in readable ASCII files, in a user-specified format. Expe is an open system: the commands of the language are calls to functions written in independent Borland Pascal units. The power user can thus easily add new commands to the language by linking their own pascal procedures to meet any special need. This makes it possible, for example, to adapt Expe to new hardware, such as new sound, video boards, ERP collecting device, etc.
  • pdf Dupoux, E. (1994). A Syllabic Bottleneck in Prelexical Processing ? A Phoneme Monitoring Investigation LSCP Tech Report, 94(2), 1-14. [abstract] Previous research has found that phoneme detection latencies depend on the complexity of the syllable that bears the target phoneme. CV syllables give rise to faster latencies than CVC, that are faster than CCV (Treiman et al., 1982, Cutler et al., 1987). In Experiment~1, we replicate this result and extend it to a fourth structure: CCVC. In Experiment~2, we report a similar effect in first syllables of disyllabic items, showing that complexity effects cannot be reduced to stimulus duration effects. We argue that the complexity effect is inconsistent with the view that phonemes are the only units involved in speech perception, but supports models which stipulate larger sized units like syllables (Mehler, 1981; Segui, Dupoux & Mehler, 1990). In a series of post-hoc analyses, however, we show that the complexity effect is not uniform across subjects. Although both the complexity of onsets and codas of syllables influence phoneme detection latencies for slow subjects, fast subjects are only influenced by the nature of the onset. The interaction of speed of response with complexity effects is confirmed in Experiment~3, where it is found that when subjects are urged to respond as fast as possible, CVC items no longer show a complexity effect nor a lexical superiority effect. Implications for the existence of a syllabic bottleneck and the time course of prelexical processing are discussed.
  • pdf Dupoux, E., Christophe, A. & Mehler, J. (1994). Lexical effects in phoneme monitoring: Time-course versus attentional accounts. LSCP Tech Report, 94(1), 1-12. [abstract] Under what conditions do lexical factors influence phoneme detection times? Experiment 1 measured subjects' latencies to detect initial phonemes in monosyllabic and disyllabic words that were preceded by a semantically related or unrelated word. One group of subjects was instructed to pay attention to the semantic relations between words, and a second group was asked to focus on acoustic-phonetic information. A significant priming effect was found, only for monosyllabic words, and only in the first group. In Experiment 2, previously observed frequency effects (Dupoux and Mehler, 1990) disappeared when the detection task was biased towards acoustic-phonetic information. In Experiment 3, two student populations were tested with exactly the same instruction set and showed markedly different results: One group showed a consistent lexical superiority effect on monosyllabic items while the other group showed no such effect. Taken together, these results suggest that the presence or absence of lexical effects is extremely sensitive to attentional parameters that can be affected by explicit biasing instructions and/or individual differences. Importantly, these effects cannot be accounted for in terms of mean reaction time differences (where slow reaction times would be expected to lead to stronger lexical influences than fast ones). The results reported here are consistent with the view that phoneme detection can be carried out using either of two quite different routes. Implications for current models of lexical and prelexical processing are discussed.
  • pdf Dupoux, E. & Hammond, M. (1994). The role of stress in English: A fragment detection study. Unpublished Manuscript. [abstract] Previous investigations have claimed that speech perception uses language specific strategies and that, in particular, English does not use a strategy based on syllables (Cutler, Mehler, Norris, and Segui, 1983, 1986). This conclusion is based on a failure to replicate the interaction between target type and word type in fragment monitoring experiments that was originaly found in French (Mehler et al., 1981) with English subjects and materials. Here, we explore the possibility that this might be due to one of three related hypotheses: i. syllable boundaries in English depend on the stress value of the following syllable, ii. English listeners use the foot instead of the syllable in speech perception. iii. English subjects posit a perceptual boundary before an unreduced (or 'strong') syllable but not before a reduced, or 'weak' one. These three hypotheses can all be tested on the basis of the same contrasts and so we group them together under the rubric of 'stress-sensitive strategies' (SSSs). In Experiment 1, we fnd some incomplete support for a SSS, but the effect is not replicated in subsequent Experiments 2 (with slowed-down subjects) and 3 (with a different set of materials). An associated off-line task (Experiment 4) reveals that, according to subjects' intuitions, syllables have a rather different structure than we assumed at the outset. We conclude by rejecting these SSSs as the major source of the difference between French and English. In the final section, we discuss the possibility that the English perceptual system might still be based on the syllable, but not a stress-sensitive one.
  • More papers are available in the CogPrints server.

    Back to LSCP Home Page [go up]