Center for Language Engineering





  اُردو / English  
[ Courses Offered ] [ CLE Post-Graduate Certificates ]
  Courses Offered  

Computational Linguistics

Course Code:
CSE 606
Course Objectives: The course aims to develop an understanding of linguistics and how it is modeled and processed.  The course will focus on modeling words and phrases, on some reference to higher structures, including meaning and discourse.  Challenges associated with corpus and multilingual text processing and their solutions will also be addressed. 

Course Description:
Corpora: Scripts, Unicode Encoding and Processing, Normalization and Collation, Tokenization; Words: Word Segmentation, Spell Checking, Morphology and Finite State Transducers, N-Grams; Phrases:Word classes and POS tagging, Chunking, CFG and Language Grammars, Rule-Based and Probabilistic Parsing of CFGs, Features and Unifications, Annotated Grammars and Lexical Functional Grammar; Semantics: Lexical Semantics, Compositional Semantics, Word Sense Disambiguation

Text Book and Material:
Speech and Language Processing, Second Edition, by Daniel Jurafsky and James Martin
Additional reading material will be provided for sections in italics.



Speech Processing

Course Code: CSE 723
Course Prerequisite: CSE 610
Course Objectives: The course focuses on developing three areas of Speech Processing: speech signal, processing and applications.  A third of the course focuses on introducing acoustics of speech, to develop an understanding of the nature of the signal being processed.  The course also introduces both the time and frequency domain analysis of speech signal to extract relevant higher level speech information from the digital signal.  Finally, the course introduces fundamentals of speech synthesis and recognition.  The course covers both theory and practical implementing these techniques.

Course Description: Background: Periodic vs. aperiodic waves, resonance, standing waves, complex waves, spectrum; Speech signal: Source-Filter Theory of Speech Production, glottal waveform, acoustic properties of vowels and consonants; Acquisition of speech signal: A/D conversion including quantization and sampling; Filtering and amplification; Time-domain speech analysis: Framing, Zero-crossing rate, Short-term energy, Speech segmentation; Frequency domain representation: Windowing, Fourier Transforms; Parameterization of Speech: Autocorrelation, Linear Prediction including Autocorrelation method, Covariance method; Applications of LPC including Vocal tract area estimation, Pitch calculation, Formant estimation; Cepstral Analysis and applications including Pitch extraction; Speech recognition; Speech synthesis

Text Book and Material:

The Acoustics of Speech Communication by Pickett
Principles of Computer Speech by Witten
Digital Speech Processing by Rabiner and Schafer
Speech and Language Processing by Jurafsky and Martin
An Introduction to Text to Speech Synthesis by Dutoit
Fundamentals of Speech Recognition by Rabiner and Juang



Seminar in Urdu Computational Grammar

Course Code: CS 722
Course Pre-Requisite: CSE 606
Course Objectives: This course will look at existing grammars of Urdu to develop a holistic understanding of Urdu morphology and syntax.  The course will also look at mechanisms to model Urdu morphology and grammar using finite state methods and annotated context free grammars respectively.  Students will also be expected to identify, analyze and implement some relevant open issues as well.

Course Description:
POS Tagset, Inflectional Morphology of Urdu; Derivational and Non-Concatenative Morphology; Reduplication; Noun Phrase; Nouns and Pronouns, Adjectives, Determiners in Noun Phrase; Quantifiers, Ordinals, Cardinals, Genitives, in Noun Phrase; Agreement, Structure and Order; Case; Case Markers, Case Phrase, Case and Grammatical Roles; Postpositional Phrase; Postpositions, Sub-categorization; Verbs; Sub-categorization and Agreement (gender, number, respect, person, form); Tense, Aspect and Mood; Complex Predicates; Adverbs, Negation, Verb Adjuncts, Wala; Coordinate and Subordinate Conjunctions; Relative Clause; Interrogative and Imperatives Sentences; Issues with Parsing the Corpus; Open Issues in Urdu grammar and its implementation

Text Book and Material:
نئی اردو قوائد , Ismat Javed
قوائد اردو , Maulvi Abdul Haq
اردو صرف و نحو , Maulvi Abdul Haq
قوائد اردو , Abu Allais Siddiqui
Urdu: An Essential Grammar, Ruth Laila Schmidt
A Grammar of the Hindustani or Urdu Language, John T. Platts
The Structure of Complex Predicates in Urdu, Miriam Butt
Lexical Functional Grammar, Mary Dalrymple
Grammar Writer’s Cookbook, Miriam Butt, Tracy Holloway King
Speech and Language Processing, Jurafsky and Martin



Seminar in Statistical Language Processing

Course Code: CS 721
Course Prerequisite: CSE 606
Course Objectives: Due to the complexity of natural languages, rule based approaches have limited capacity for modeling.  Statistical approaches provide feasible alternatives to model some of these challenges. This course presents the data centric approach to modeling language, building on top of the rule based approaches discussed in the first course on Computational Linguistics.  The course will look at advanced topics in computational linguistics and explore statistical solutions to solving associated problems. 

Course Description:
Foundations, N-Grams and Smoothing, HMMs, POS Tagging, Chunking, Prob. CFG and Parsing, Lexical Acquisition, Word Sense Disambiguation, Text Alignment and Machine Translation

Text Book and Material:
Foundations of Statistical Natural Language Processing, Chris Manning and Hinrich Schütze, MIT Press. Cambridge, MA: May 1999.