Center for Language Engineering

 
 



 

 

KICS
KICS-UET


 
 

[ Text Corpora ] [ Image Corpora ] [ Speech Corpora ] [ Lexical Resources ] [ NLP Applications ]

 
 

[ How to Order ]

 
   
 

CLE is making these linguistic resources available without cost for supporting academic, non-commercial research. The processing fees being charged will be used to maintain these resources. You are requested to contact CLE directly for any discounts (applicable only for selective public organizations in Pakistan) or for commercial licensing options.

 
     
  CLE Pakistan District Names Speech Corpus - Urdu Speakers
  [ Pakistan ] [ International ]
   
 
CLE Catalog #: CLE16S008
Release Date: 12 July 2016
First Language of Speakers: Urdu
Duration: 52 minutes
Number of Utterances: 3424
Distribution: 1 DVD, Web Download
Processing Fee (Pakistan): 15000 PKR
Processing Fee (International): 250 USD
License: Yes
   
  Introduction
  This package is a collection of speech data of district names of Pakistan recorded from Urdu speakers. The corpus comprises of 139 single word vocabulary items. The data is recorded through mobile channel at a sampling rate of 8 KHz and digitization rate of 16 bits. Gender and district of origin of each speaker is also provided with the corpus. Age of the speakers ranges from 18 to 50 years. The data was collected in outdoor and office environments. The corpus has been cleaned and verified by expert linguists. The data is annotated at word level using CI SAMPA which is mapped on the Urdu IPA symbols.
   
  Data Source
  Data is collected from students and employees of different universities and research institutes largely from Lahore, Quetta, Bahawalnagar, Peshawar, Gujranwala, Gujrat, Karachi, Faisalabad and Rawalpindi.
   
  Data
  List of vocabulary items covered in the corpus is available here. The package contains three folders. The details of each folder are as follows:
  • male: This folder contains audio files from male speakers in wav format.
  • female: This folder contains audio files from female speakers in wav format.
  • info: This folder contains information about corpus.
   
  Sample
  Download Sample
   
 
 
 

webmaster@cle.org.pk