|
Sense tagging is the process of assigning senses to words in the corpus where sense can be defined as semantic value (content) of a word when compared to other words; i.e. when it is part of a group or set of related words.
Urdu Sense Tagging Utility is a software tool developed to provide an easy interface for sense tagging, ensuring tagging consistency and accelerating the annotation speed. It enables manual disambiguation of large volume of texts.
This annotation tool uses POS tagged files as input and generates sense tagged files (where the words are tagged with synset ID) as output using the Urdu synset id developed through the Urdu WordNet. The user interface of the tool gives three views; selection view, WordNet view and corpus view. The selection view of the interface displays the list of high frequency words and enables the annotator to select a target word. This window makes use of Urdu Wordlist . The tool then matches these words with those in the corpus and WordNet. Wordnet view displays the linguistic information available in WordNet for the selected lexical item. The corpus view of the interface displays the corpus with all occurrences of the target word. Facilitated by the integration of the morphological analyzer, this corpus view not only displays all occurrence of the specific word in the corpus, but additionally displays all occurrences of the word’s complete morphological forms available in the Corpus. |
|