WackoWiki: Grapheme

https://wackowiki.org/doc     Version: 12.03.2021 11:24
यूनिकोड

यूनिकोड

file:/書記素.png?right

The Hindi word for Unicode "यूनिकोड" is composed of seven Unicode characters from the Devanagari script.

Most users would identify this word as containing four units of text. Each of the first three graphemes consists of two characters: a syllable and a modifying vowel character. So the word contains seven Unicode characters, but only four graphemes:


Word यूनिकोड
Graphemes यू नि को ड
Code Points य ू न ि क ो ड
U+092F U+0942 U+0928 U+093F U+0915 U+094B U+0921


!/यूनिकोड <— with Graphemes (form a link with current regex, possible but not desirable)
!/यनकड <— Normalized via $tag = normalizer_normalize ($tag, Normalizer::FORM_C); in run()
यूनिकोड (combined nomalized & unnormalized, title is also in unnormalized form)