View source for Migration to Unicode (UTF-8)

#||
|| {{toc numerate=1}} | {{tree}} ||
||#

disadvantages
  * mishmash of external libraries
    1. ((http://www.php.net/manual/en/book.mbstring.php PHP Multibyte String extention))
    2. ((http://www.php.net/manual/en/ref.iconv.php iconv))
    3. ((https://www.php.net/manual/en/intro.intl.php Internationalization extension)) -- ICU library  wrapper
    2. ((http://sourceforge.net/projects/phputf8/ PHP UTF-8))
    3. ((https://github.com/nicolas-grekas/Patchwork-UTF8 Patchwork-UTF8))
    4. ((http://pageconfig.com/post/portable-utf8 Portable UTF-8))
    3. ((http://htmlpurifier.org/ HTML Purifier))
    4. ((https://github.com/neitanod/forceutf8 forceutf8))
  * development culture means that people work on what they’re interested in

requirements
  1. PCRE UTF-8 support: will not run if your PHP installation is not compiled with UTF-8 support in the PCRE extension.
  2. 3rd party libraries used by WackoWiki also require UTF-8 support
  3. utf8mb4: MySQL versions prior to 5.7.7 or MariaDB 10.2.2 do not have the innodb_large_prefix option enabled by default.


===Unicode normalization===
  1. http://www.w3.org/TR/charmod-norm/

In essence, normalize all input from sources where you can't be sure that it's in normal form. In most cases, you should use NFC because most data will be in NFC already.

===Steps ===
  1. Convert to UTF-8 __without__ BOM.
    * make sure that any included/required file is in either in ASCII or UTF-8 without BOM, as php doesn't handle non-ascii file very good
    * Tools: iconv
      * %% find . -name "*.php" -exec iconv -f ISO-8859-1 -t UTF-8 {} -o {}.new \; %% 
  1.#3 ((/Dev/Guidelines/UnicodeCheatSheet search code base for non-UTF8 compatible functions and replace them))
    *   add ((!/MbExtends missing functions)), not covered by PHP Multibyte String extention, to ##lib/mb_extends/mb_extends.php##  
  1. add optional support to replace SafeHtml with ((http://htmlpurifier.org/ HTMLPurifier))
  1. write a ((/Dev/Release/R6.0/Upgrade/DatabaseConversion database conversion script)) that runs through the database and re-encodes everything as UTF-8
  1. String access by character
  1. remove unneeded functions ((https://web.archive.org/web/20150310212247/http://www.phpwact.org/php/i18n/charsets#common_problem_areas_with_utf-8 notes))
    1. ##htmlentities##
    1. ##html_entity_decode##

====MySQL – Migrating a database data that is already encoded in latin1 to UTF-8====
If you have an existing MySQL database that is already encoded in latin1, here’s how to convert the latin1 to UTF-8:

  * migrate all code and data in a single shot
  * downtime needed
  * trial runs needed
  * watch for cp1252 vs. ISO-8859-1

  1. ((/Dev/Release/R6.0/Unicode/utf8mb4 utf8mb4))
  1. ((/Dev/Release/R6.0/Unicode/DbConversion Converting your MySQL database to UTF8))
  2. ((/Dev/Release/R6.0/Upgrade/DatabaseConversion Database Conversion Script))

===Resources===
((!/Resources))