View source for Migration to Unicode (UTF-8)

|| {{toc numerate=1}} | {{tree}} ||

  * mishmash of external libraries
    1. (( PHP Multibyte String extention))
    2. (( iconv))
    3. (( Internationalization extension)) -- ICU library  wrapper
    2. (( PHP UTF-8))
    3. (( Patchwork-UTF8))
    4. (( Portable UTF-8))
    3. (( HTML Purifier))
    4. (( forceutf8))
  * development culture means that people work on what they’re interested in

  1. PCRE UTF-8 support: will not run if your PHP installation is not compiled with UTF-8 support in the PCRE extension.
  2. 3rd party libraries used by WackoWiki also require UTF-8 support
  3. utf8mb4: MySQL versions prior to 5.7.7 or MariaDB 10.2.2 do not have the innodb_large_prefix option enabled by default.

===Unicode normalization===

In essence, normalize all input from sources where you can't be sure that it's in normal form. In most cases, you should use NFC because most data will be in NFC already.

===Steps ===
  1. Convert to UTF-8 __without__ BOM.
    * make sure that any included/required file is in either in ASCII or UTF-8 without BOM, as php doesn't handle non-ascii file very good
    * Tools: iconv
      * %% find . -name "*.php" -exec iconv -f ISO-8859-1 -t UTF-8 {} -o {}.new \; %% 
  1.#3 ((/Dev/Guidelines/UnicodeCheatSheet search code base for non-UTF8 compatible functions and replace them))
    *   add ((!/MbExtends missing functions)), not covered by PHP Multibyte String extention, to ##lib/mb_extends/mb_extends.php##  
  1. add optional support to replace SafeHtml with (( HTMLPurifier))
  1. write a ((/Dev/Release/R6.0/Upgrade/DatabaseConversion database conversion script)) that runs through the database and re-encodes everything as UTF-8
  1. String access by character
  1. remove unneeded functions (( notes))
    1. ##htmlentities##
    1. ##html_entity_decode##

====MySQL – Migrating a database data that is already encoded in latin1 to UTF-8====
If you have an existing MySQL database that is already encoded in latin1, here’s how to convert the latin1 to UTF-8:

  * migrate all code and data in a single shot
  * downtime needed
  * trial runs needed
  * watch for cp1252 vs. ISO-8859-1

  1. ((/Dev/Release/R6.0/Unicode/utf8mb4 utf8mb4))
  1. ((/Dev/Release/R6.0/Unicode/DbConversion Converting your MySQL database to UTF8))
  2. ((/Dev/Release/R6.0/Upgrade/DatabaseConversion Database Conversion Script))