𝔘𝔫𝔦𝔠𝔬𝔡𝔢 Resources
- The original paper from Bell Labs on UTF-8[link1]
- A tutorial on character code issues[link2] - the MUST READ
- Mojibake[link3]
- Character Sets / Character Encoding Issues[link4]
- Handling UTF-8 with PHP[link5]
- Migrating to Unicode[link6]
- http://en.wikipedia.org/wiki/UTF-8
- Unicode block[link7]
- Unicode Cheat Sheet[link8]
- http://htmlpurifier.org/docs/enduser-utf8.html
- What every programmer absolutely, positively needs to know about encodings and character sets to work with text[link9]
- https://www.sitepoint.com/brin[...]-with-portable-utf8/[link10]
- https://www.utf8-chartable.de/
- W3C: Character Model for the World Wide Web: String Matching[link11]
- https://unicode.org
- Unicode Regular Expressions[link12]
- Unicode Bidirectional Algorithm[link13]
- Unicode Security Considerations[link14]
- Unicode Normalization Forms[link15]
- http://www.utf8everywhere.org/
- Quotes[link16]
̲ᴛ̲ʜ̲ᴇ̲ʀ̲ᴇ̲ ̲ɪ̲s̲ ̲ɴ̲ᴏ̲ ̲U̲ɴ̲ɪ̲ᴄ̲ᴏ̲ᴅ̲ᴇ̲ ̲ᴍ̲ᴀ̲ɢ̲ɪ̲ᴄ̲ ̲ʙ̲ᴜ̲ʟ̲ʟ̲ᴇ̲ᴛ̲ ̲
💩 𝔸 𝕤 𝕤 𝕦 𝕞 𝕖 𝔹 𝕣 𝕠 𝕜 𝕖 𝕟 𝕟 𝕖 𝕤 𝕤 💩
😈 ¡ƨdləɥ ƨᴉɥʇ ədoɥ puɐ ʻλɐp əɔᴉu ɐ əʌɐɥ ʻʞɔnl poo⅁ 😈
💩 𝔸 𝕤 𝕤 𝕦 𝕞 𝕖 𝔹 𝕣 𝕠 𝕜 𝕖 𝕟 𝕟 𝕖 𝕤 𝕤 💩
😈 ¡ƨdləɥ ƨᴉɥʇ ədoɥ puɐ ʻλɐp əɔᴉu ɐ əʌɐɥ ʻʞɔnl poo⅁ 😈
PHP 7.x/8.x
- Strings literals in PHP are still fundamentally composed of bytes.
- It is up to developers to deal with character encoding issues using mbstring[link17], iconv[link18], uconverter[link19], etc.
- The intl extension[link20] wraps a lot of the functionality that was originally going to be a part of PHP 6 for use in PHP 7/8.
- PHP 7 helps by adding the inline UTF-8 literal syntax
\u{[0-9A-Fa-f]+}
and IntlChar class[link21].
- [link1] http://doc.cat-v.org/plan_9/4th_edition/papers/utf
- [link2] http://jkorpela.fi/chars.html
- [link3] https://en.wikipedia.org/wiki/Mojibake
- [link4] https://web.archive.org/web/20070622045515/http://www.phpwact.org/php/i18n/charsets
- [link5] https://web.archive.org/web/20070622045515/http://www.phpwact.org/php/i18n/utf-8
- [link6] http://www.w3.org/International/articles/unicode-migration/
- [link7] https://en.wikipedia.org/wiki/Unicode_block
- [link8] https://wackowiki.org/doc/Dev/Guidelines/UnicodeCheatSheet
- [link9] http://kunststube.net/encoding/
- [link10] https://www.sitepoint.com/bringing-unicode-to-php-with-portable-utf8/
- [link11] http://www.w3.org/TR/charmod-norm/
- [link12] https://unicode.org/reports/tr18/
- [link13] https://unicode.org/reports/tr9/
- [link14] https://unicode.org/reports/tr36/tr36-9.html#visual_spoofing
- [link15] http://unicode.org/reports/tr15/
- [link16] https://wackowiki.org/doc/Dev/Release/R6.0/Unicode/Quotes
- [link17] https://www.php.net/manual/en/book.mbstring.php
- [link18] https://www.php.net/manual/en/book.iconv.php
- [link19] https://www.php.net/manual/en/class.uconverter.php
- [link20] https://www.php.net/manual/en/book.intl.php
- [link21] https://www.php.net/manual/en/class.intlchar.php