WackoWiki: 𝔘𝔫𝔦𝔠𝔬𝔡𝔢 Resources

https://wackowiki.org/doc     Version: 16 (07/06/2025 14:02)

𝔘𝔫𝔦𝔠𝔬𝔡𝔢 Resources

  1. The original paper from Bell Labs on UTF-8[link1]
  2. A tutorial on character code issues[link2] - the MUST READ
  3. Mojibake[link3]
  4. Character Sets / Character Encoding Issues[link4]
  5. Handling UTF-8 with PHP[link5]
  6. Migrating to Unicode[link6]
  7. http://en.wikipedia.org/wiki/UTF-8
  8. Unicode block[link7]
  9. Unicode Cheat Sheet[link8]
  10. http://htmlpurifier.org/docs/enduser-utf8.html
  11. What every programmer absolutely, positively needs to know about encodings and character sets to work with text[link9]
  12. https://www.sitepoint.com/brin[...]-with-portable-utf8/[link10]
  13. W3C: Character Model for the World Wide Web: String Matching[link11]
  14. https://unicode.org
    1. Unicode Regular Expressions[link12]
    2. Unicode Bidirectional Algorithm[link13]
    3. Unicode Security Considerations[link14]
    4. Unicode Normalization Forms[link15]
  15. http://www.utf8everywhere.org/
  16. Quotes[link16]

̲ᴛ̲ʜ̲ᴇ̲ʀ̲ᴇ̲ ̲ɪ̲s̲ ̲ɴ̲ᴏ̲ ̲U̲ɴ̲ɪ̲ᴄ̲ᴏ̲ᴅ̲ᴇ̲ ̲ᴍ̲ᴀ̲ɢ̲ɪ̲ᴄ̲ ̲ʙ̲ᴜ̲ʟ̲ʟ̲ᴇ̲ᴛ̲ ̲

💩 𝔸 𝕤 𝕤 𝕦 𝕞 𝕖 𝔹 𝕣 𝕠 𝕜 𝕖 𝕟 𝕟 𝕖 𝕤 𝕤 💩

😈 ¡ƨdləɥ ƨᴉɥʇ ədoɥ puɐ ʻλɐp əɔᴉu ɐ əʌɐɥ ʻʞɔnl poo⅁ 😈

PHP 7.x/8.x

Unicode & IDNA

RFC 3986 neither supports IDNA, nor non-ASCII characters. WHATWG URL supports IDNA and Unicode characters, and it explicitly suggests that browsers should render the host component by displaying Unicode characters.

The recommendation is not just for user-friendliness: it's necessary for security reasons, alleviating the human risk factor in exploits. E.g. “xn--google.com” could deceive the uninitiated reader that it is a Google domain, however the IDNA domain decodes to “䕮䕵䕶䕱.com” in fact.