𝔘𝔫𝔦𝔠𝔬𝔡𝔢 Resources

  1. The original paper from Bell Labs on UTF-8
  2. A tutorial on character code issues - the MUST READ
  3. Mojibake
  4. Character Sets / Character Encoding Issues
  5. Handling UTF-8 with PHP
  6. Migrating to Unicode
  7. http://en.wikipedia.org/wiki/UTF-8
  8. Unicode block
  9. Unicode Cheat Sheet
  10. http://htmlpurifier.org/docs/enduser-utf8.html
  11. What every programmer absolutely, positively needs to know about encodings and character sets to work with text
  12. https://www.sitepoint.com/brin[...]-with-portable-utf8/
  13. https://www.utf8-chartable.de/
  14. W3C: Character Model for the World Wide Web: String Matching
  15. https://unicode.org
    1. Unicode Regular Expressions
    2. Unicode Bidirectional Algorithm
    3. Unicode Security Considerations
    4. Unicode Normalization Forms
  16. http://www.utf8everywhere.org/
  17. Quotes

̲ᴛ̲ʜ̲ᴇ̲ʀ̲ᴇ̲ ̲ɪ̲s̲ ̲ɴ̲ᴏ̲ ̲U̲ɴ̲ɪ̲ᴄ̲ᴏ̲ᴅ̲ᴇ̲ ̲ᴍ̲ᴀ̲ɢ̲ɪ̲ᴄ̲ ̲ʙ̲ᴜ̲ʟ̲ʟ̲ᴇ̲ᴛ̲ ̲


💩 𝔸 𝕤 𝕤 𝕦 𝕞 𝕖 𝔹 𝕣 𝕠 𝕜 𝕖 𝕟 𝕟 𝕖 𝕤 𝕤 💩


😈 ¡ƨdləɥ ƨᴉɥʇ ədoɥ puɐ ʻλɐp əɔᴉu ɐ əʌɐɥ ʻʞɔnl poo⅁ 😈

PHP 7.x/8.x

  • Strings literals in PHP are still fundamentally composed of bytes.
  • It is up to developers to deal with character encoding issues using mbstring, iconv, uconverter, etc.
  • The intl extension wraps a lot of the functionality that was originally going to be a part of PHP 6 for use in PHP 7/8.
  • PHP 7 helps by adding the inline UTF-8 literal syntax \u{[0-9A-Fa-f]+} and IntlChar class.

Read comment (1 comment)