|
Département d'ingénierie informatique |
The purpose of Word Sense Disambiguation (WSD) is to determine the exact sense of an instance of an ambiguous word according to its particular use. Disambiguation can be useful in principle in any linguistic application where word sense matters such as automatic translation, text categorization, speech understanding, etc. Word sense disambiguation techniques can be divided into three broad categories: supervised techniques, dictionary-based (or thesaurus-based) and unsupervised techniques. Supervised techniques require a semantically tagged corpus, which serves as training corpus, in which each ambiguous instance is correctly labeled with a semantic tag. Dictionary-based techniques work similarly as the supervised techniques but use a raw (i.e. untagged) corpus. A dictionary or a thesaurus is an additional knowledge source to define senses.
Our main focus is on word sense discrimination, a fully unsupervised disambiguation problem. Here the objective is to automatically determine which instances can be clustered as sharing the same sense, the sense labels being arbitrary. This work includes