Computational Linguistics
Course Code: CSE 606 Course Objectives: The course aims to develop an understanding of linguistics and how it is modeled and processed. The course will focus on modeling words and phrases, on some reference to higher structures, including meaning and discourse. Challenges associated with corpus and multilingual text processing and their solutions will also be addressed.
Course Description:
Corpora: Scripts, Unicode Encoding and Processing, Normalization and Collation, Tokenization; Words: Word Segmentation, Spell Checking, Morphology and Finite State Transducers, N-Grams; Phrases:Word classes and POS tagging, Chunking, CFG and Language Grammars, Rule-Based and Probabilistic Parsing of CFGs, Features and Unifications, Annotated Grammars and Lexical Functional Grammar; Semantics: Lexical Semantics, Compositional Semantics, Word Sense Disambiguation
Text Book and Material:
Speech and Language Processing, Second Edition, by Daniel Jurafsky and James Martin
Additional reading material will be provided for sections in italics.
Course Code: CSE 723 Course Prerequisite: CSE 610 Course Objectives: The course focuses on developing three areas of Speech Processing: speech signal, processing and applications. A third of the course focuses on introducing acoustics of speech, to develop an understanding of the nature of the signal being processed. The course also introduces both the time and frequency domain analysis of speech signal to extract relevant higher level speech information from the digital signal. Finally, the course introduces fundamentals of speech synthesis and recognition. The course covers both theory and practical implementing these techniques.
Course Description:
Background: Periodic vs. aperiodic waves, resonance, standing waves, complex waves, spectrum; Speech signal: Source-Filter Theory of Speech Production, glottal waveform, acoustic properties of vowels and consonants; Acquisition of speech signal: A/D conversion including quantization and sampling; Filtering and amplification; Time-domain speech analysis: Framing, Zero-crossing rate, Short-term energy, Speech segmentation; Frequency domain representation: Windowing, Fourier Transforms; Parameterization of Speech: Autocorrelation, Linear Prediction including Autocorrelation method, Covariance method; Applications of LPC including Vocal tract area estimation, Pitch calculation, Formant estimation; Cepstral Analysis and applications including Pitch extraction; Speech recognition; Speech synthesis
Text Book and Material:
The Acoustics of Speech Communication by Pickett
Principles of Computer Speech by Witten
Digital Speech Processing by Rabiner and Schafer
Speech and Language Processing by Jurafsky and Martin
An Introduction to Text to Speech Synthesis by Dutoit
Fundamentals of Speech Recognition by Rabiner and Juang
Course Code: CS 722 Course Pre-Requisite: CSE 606 Course Objectives: This course will look at existing grammars of Urdu to develop a holistic understanding of Urdu morphology and syntax. The course will also look at mechanisms to model Urdu morphology and grammar using finite state methods and annotated context free grammars respectively. Students will also be expected to identify, analyze and implement some relevant open issues as well.
Course Description:
POS Tagset, Inflectional Morphology of Urdu; Derivational and Non-Concatenative Morphology; Reduplication; Noun Phrase; Nouns and Pronouns, Adjectives, Determiners in Noun Phrase; Quantifiers, Ordinals, Cardinals, Genitives, in Noun Phrase; Agreement, Structure and Order; Case; Case Markers, Case Phrase, Case and Grammatical Roles; Postpositional Phrase; Postpositions, Sub-categorization; Verbs; Sub-categorization and Agreement (gender, number, respect, person, form); Tense, Aspect and Mood; Complex Predicates; Adverbs, Negation, Verb Adjuncts, Wala; Coordinate and Subordinate Conjunctions; Relative Clause; Interrogative and Imperatives Sentences; Issues with Parsing the Corpus; Open Issues in Urdu grammar and its implementation
Text Book and Material: نئی اردو قوائد, Ismat Javed قوائد اردو, Maulvi Abdul Haq اردو صرف و نحو, Maulvi Abdul Haq قوائد اردو, Abu Allais Siddiqui
Urdu: An Essential Grammar, Ruth Laila Schmidt
A Grammar of the Hindustani or Urdu Language, John T. Platts
The Structure of Complex Predicates in Urdu, Miriam Butt
Lexical Functional Grammar, Mary Dalrymple
Grammar Writer’s Cookbook, Miriam Butt, Tracy Holloway King
Speech and Language Processing, Jurafsky and Martin
Course Code: CS 721 Course Prerequisite: CSE 606 Course Objectives: Due to the complexity of natural languages, rule based approaches have limited capacity for modeling. Statistical approaches provide feasible alternatives to model some of these challenges. This course presents the data centric approach to modeling language, building on top of the rule based approaches discussed in the first course on Computational Linguistics. The course will look at advanced topics in computational linguistics and explore statistical solutions to solving associated problems.
Course Description:
Foundations, N-Grams and Smoothing, HMMs, POS Tagging, Chunking, Prob. CFG and Parsing, Lexical Acquisition, Word Sense Disambiguation, Text Alignment and Machine Translation