#||
|| {{toc numerate=1}} | {{tree}} ||
||#
disadvantages
* mishmash of external libraries
1. ((http://www.php.net/manual/en/book.mbstring.php PHP Multibyte String extention))
2. ((http://www.php.net/manual/en/ref.iconv.php iconv))
3. ((https://www.php.net/manual/en/intro.intl.php Internationalization extension)) -- ICU library wrapper
2. ((http://sourceforge.net/projects/phputf8/ PHP UTF-8))
3. ((https://github.com/nicolas-grekas/Patchwork-UTF8 Patchwork-UTF8))
4. ((http://pageconfig.com/post/portable-utf8 Portable UTF-8))
3. ((http://htmlpurifier.org/ HTML Purifier))
4. ((https://github.com/neitanod/forceutf8 forceutf8))
* development culture means that people work on what they’re interested in
requirements
1. PCRE UTF-8 support: will not run if your PHP installation is not compiled with UTF-8 support in the PCRE extension.
2. 3rd party libraries used by WackoWiki also require UTF-8 support
3. utf8mb4: MySQL versions prior to 5.7.7 or MariaDB 10.2.2 do not have the innodb_large_prefix option enabled by default.
===Unicode normalization===
1. http://www.w3.org/TR/charmod-norm/
In essence, normalize all input from sources where you can't be sure that it's in normal form. In most cases, you should use NFC because most data will be in NFC already.
===Steps ===
1. Convert to UTF-8 __without__ BOM.
* make sure that any included/required file is in either in ASCII or UTF-8 without BOM, as php doesn't handle non-ascii file very good
* Tools: iconv
* %% find . -name "*.php" -exec iconv -f ISO-8859-1 -t UTF-8 {} -o {}.new \; %%
1.#3 ((/Dev/Guidelines/UnicodeCheatSheet search code base for non-UTF8 compatible functions and replace them))
* add ((!/MbExtends missing functions)), not covered by PHP Multibyte String extention, to ##lib/mb_extends/mb_extends.php##
1. add optional support to replace SafeHtml with ((http://htmlpurifier.org/ HTMLPurifier))
1. write a ((/Dev/Release/R6.0/Upgrade/DatabaseConversion database conversion script)) that runs through the database and re-encodes everything as UTF-8
1. String access by character
1. remove unneeded functions ((https://web.archive.org/web/20150310212247/http://www.phpwact.org/php/i18n/charsets#common_problem_areas_with_utf-8 notes))
1. ##htmlentities##
1. ##html_entity_decode##
====MySQL – Migrating a database data that is already encoded in latin1 to UTF-8====
If you have an existing MySQL database that is already encoded in latin1, here’s how to convert the latin1 to UTF-8:
* migrate all code and data in a single shot
* downtime needed
* trial runs needed
* watch for cp1252 vs. ISO-8859-1
1. ((/Dev/Release/R6.0/Unicode/utf8mb4 utf8mb4))
1. ((/Dev/Release/R6.0/Unicode/DbConversion Converting your MySQL database to UTF8))
2. ((/Dev/Release/R6.0/Upgrade/DatabaseConversion Database Conversion Script))
===Resources===
((!/Resources))