Author Archives: weedegee

Wouter’s reflection

Perhaps the best way to illustrate the value of this course is to take its final lecture, by Richard Morey. In one fell swoop he demonstrates the inadequacy of the frequentist approach to statistical testing, and yet the core message of the course survives intact.

This core message is that the test is only the tool to extract what we really care about: knowledge. A well-crafted experiment teaches us about the world, about people, about the systems we build. By selecting the appropriate experimental designs and the right tests to judge our data by we tease out this knowledge.

Although the course itself was fairly heavy on specific tests for specific types of data, the big picture shines through: how can I get my data to tell me what the world is like? And although ultimately it seems like the focus was mostly on testing what data we already have, the process of getting that data is subject to this same concept. This all leads to a cycle of capturing what the world is like in data and distilling out of that data a continually more refined picture of the world: the empirical cycle.

Adding semantic similarity features to coreference resolution

Instead of a proposed project idea I’ve noted below the abstract of my bachelor’s project’s paper.

In automatic coreference resolution the object is to identify when two noun phrases refer to the same entity in the world. In this paper we use the Dutch language Knack-2002 coreference annotated corpus and example-based supervised machine learning to expeximent with adding semantic similarity features to the standard set of linguistic features used in previous work. We use the FROG language parser to extract standard features and the Cornetto database to add features for WordNet semantic classes and three semantic similarity metrics based on work by Lin (1998), Jiang and Conrath (1997) and Resnik (1995), respectively. Performance is tested using TiMBL’s k Nearest Neighbors algorithm on the data split into sets with common noun, proper noun and pronoun anaphors; we find the F-score improves from 0,299 to 0,325 in the common noun data but discern no difference in the other conditions.