Normalization is a process to convert
multiple equivalent representations of data to
consistent underlying normal forms. Normalized data may
have two forms: composed or decomposed. Composition is a
process to combine the characters wherever possible, for
example (0627+0653) ا+ٓ will assume
(آ (0622.
Decomposition is an opposite process, breaking
pre-composed characters back into their constituents.
The
Unicode Normalization
Standard defines two
equivalences between characters: canonical equivalence
and compatibility equivalence. And four normalization
forms have been defined by Unicode standard, that are:
1. Normalization form D (NFD) or canonical decomposition
2. Normalization form C (NFC) or canonical decomposition
followed by canonical composition
3. Normalization form KD (NFKD) or compatibility
decomposition
4. Normalization form KC (NFKC) or compatibility
decomposition followed by canonical composition
Urdu Normalization Utility v1.0 provides support for three
normalization forms: Normalization form D (NFD)
Normalization form C (NFC) and Normalization form KD (NFKD).
The normalization form KD (NFKD) provided by utility
is a non-reversal process (the result may not be
converted back to its original form).