|
[
Localization ] [
Language Processing ]
[
Linguistic Resources ] |
|
|
|
|
|
Urdu 5000
most Frequently Used Words |
|
|
|
|
|
Release
Notes |
|
|
The wordlist has been
extracted from 19.3 million corpus gathered from a wide
range of domains
as mentioned in the following table, keeping in view the
end user perspective. |
|
|
|
|
|
Domains |
Sub
domains |
|
|
C1.
Sports/Games |
C1.1. Sports (special events) |
C2.
News
|
C2.1.
Local and international affairs |
C2.2.
Editorials and opinions |
C3.
Finance
|
C3.1.
Business, domestic and
foreign market |
C4.
Culture/Entertainment |
C4.1.
Music, theatre,exhibitions,
review articles on
literature |
C4.2.
Travel / tourism |
C5.
Consumer Information |
C5.1.
Health |
C5.2.
Popular science |
C5.3.
Consumer technology |
C6.
Personal communications
|
C6.1.
Emails, online, discussions,
editorials, e-zines |
|
|
|
|
|
Domain wise
corpus size distribution is given in the following
table. |
|
|
|
|
|
Domains |
Raw Corpora |
Size |
Distinct words |
|
|
|
C1.
Sports/Games |
1666304 |
23118 |
C2.
News |
8957259 |
67365 |
C3.
Finance |
1162019 |
17024 |
C4.
Culture/Entertainment |
3845117 |
59214 |
C5.
Consumer Information |
1980723 |
34151 |
C6.
Personal communications |
1685424 |
30469 |
|
|
|
Total |
19296846 |
104341 |
|
|
|
|
|
|
|
Download
(This file has been accessed:
times, since 01 September 2010) |
|
|
Urdu
5000 Most Frequently Used Words List |
License |
|
|
|
|