Workshop on Computational Models of Early Language Acquisition and Zero Resource Speech Technologies

Monday July 29 - Friday August 2, 2013

29 rue d'Ulm, 75005 Paris

(access through the 24 rue Lhomond)

room Paul Langevin (1st floor)

The unsupervised (zero resource) discovery of linguistic structure from speech is generating a lot of interest from two largely disjoint communities: The cognitive science community (psycholinguists, linguists, neurolinguists) want to understand the mechanisms by which infants spontaneously discover linguistic structure. The machine learning community is more an more interested in deploying language/speech technologies in a variety of languages/dialects with limited or no linguistic resources. The aim of this workshop is to bring together a team of researchers and graduate students from these two communities, to engage into mutual presentations and discussion of current and future issues.

Specifically, this workshop has two aims: 1) identifying key issues and problems to be solved of interest to both communities, and 2) setting up standardized, common resources for comparing the different approaches to solving these problems (databases, evaluation criteria, software, etc). In this workshop, we will focus mainly but not exclusively on the discovery of two levels of linguistic structure: phonetic units (speech coding) and word-like units (higher order units). We are well aware of the fact that the definition of these levels, as well as their segregation from the rest of the linguistic system is itself a matter of debate, and we welcome discussions of these issues as well.

The workshop will start with 2 days of open symposium with formal presentations which is open without registration, and 3 days of hands-on workshop (registration required; write to syntheticlearner@gmail.com).

This workshop is supported by the Labex IEC program 'Frontiers in Cognition' and the ERC Program "Bootphon". It is the second issue of a series that started as a mini workshop in the Center for Language and Speech Processing at John's Hopkins University. See Zero Recource workshop #1

Monday July 29th: Symposium, Day 1

Each presentation will last for 35 min max, followed by 20 mins of discussion.

Morning 9:00a-12:30p

Lunch Break

Afternoon 2:30-5:30p

Tuesday July 30th: Symposium, Day 2

Lunch Break

Wednesday, July 31-Friday, Aug 2. Hands on workshop

The aim of the hands-on workshop is to advance some of the issues raised in the discussions and make some progress towards addressing them, either at the conceptual level, or at the level of performing pilot experiments.

Report

The following topics were discussed:

The Ideal Child-Directed Database

Participants: Goldwater, Dupoux, Versteegh

Noting that there is a lack of good quality, well annotated child directed corpora recorded in naturalistic situations, the group discussed what an ideal database would look like.

Setup:

Analysis:

Note: the event and context coding could be done, at least in part, through verbal descriptions collected by naive subjects (through mechanical Turk for instance).

Action:

What is semantics good for?

Participants: Swingley, Bordes, Fourtassi, Synnaeve, Johnson

The group discussed the possibility of evaluating the role of semantics in various learning tasks before having access to an ideal database. The idea is to generate synthetic data based on what we currently know about the availability and reliability of semantic cues in the infant's input (regarding context, objects and events) and plug these cues in a probabilistic fashion in existing unannotated child databases.

For instance, work by Swingley et al and others suggest that in about 60% of the cases when a concrete nouns is mentionned in a sentence, this noun is present in the scene and/or the focus of the child's attention. The group discussed the importance of consolidating this sort of data, both across learning contexts and cultures.

Later in the workshop, the semantic group used LDA on a section of the Providence corpus in order to derive another proxy for semantic representations, and applied it to unsupervized segmentation using Adaptor Grammars.

Islands of reliability

Participants: Boerschinger, Schatz,Moulin-Frier

The group discussed the possibility that the infant might use a strategy of data selection whereby not all input is analyzed, but only a fragment of the input for which the child has reasons to believe that it is reliable. The difficulty in implementing this idea it to find a way to evaluate the reliability while learning is not completed.

The group implemented a version of this idea in the segmentation task using the Adaptor Grammar, by relaxing the constraint that the entirety of a sentence should be parsed.

The role of phonological features

Participants: Dunbar, Feldman

The group explored the idea that discovering linguistic features simultaneously to constructing phonetic categories should actually be easier than constructing phonetic categories using clustering. The idea is to use an indian buffet process, and the group started to evaluate this idea on synthetic data.

The role of stress

Participants: Johnson, Demuth, Dupoux, Boerschinger

This group explored the idea that stress information could be usefully incorporated into word segmentation algorithms. It discussed a current proposal to add in stress information in a word segmentation algorithm based on adaptor grammar. The discussion was whether the system should try to learn the entire stress system of the language in a parametric fashion or simply to learn probabilistic word templates focussing on the word edges.

The group also discussed the ways in which continuous stress information could be plugged in the existing algorithms, as opposed to dicrete dictionnary citation form annotations. The feasibility of extending the AG framework to incorporate continuous input (Kalman filters?) has been discussed.

Finally, the group discussed the problem related to cross-linguistic variations in stress cues, and more generally how suprasegmental acoustic cues get mapped onto linguistic structures. Johnson, Demuth and Dupoux proposed to cofund a PHD project on this topic.

Bridging the gap between spoken term discovery and word segmentation

Participants: Dupoux, Ludusan, Versteegh

The speech and the NLP communities each have developped their own algorithms to that discover linguistic fragments out of the continuous speech. The former (spoken term discovery) take continuous speech features as input, the latter (word segmentation algorithms) take symbolic input. There are however a number of intermediate algorithms in the making (ie, symbolic systems that incorporate phonetic variations, or continuous systems that use high level features). It is therefore important to be able to compare these two sets of models using the same evaluation metrics.

The group started to decompose the models into separate components (fragment aligment, lexicon construction, utterance segmentation) and reviewed the different metrics used to evaluate each of them.

The group also explored the possibility of building a hybrid continuous/symbolic segmentation system.

The future

The participants of the workshop discussed the objectives for the next year, as well as the scope and organization of the next workshop.