Migration to Unicode (UTF-8)


disadvantages


requirements

  1. PCRE UTF-8 support: will not run if your PHP installation is not compiled with UTF-8 support in the PCRE extension.
  2. 3rd party libraries used by WackoWiki also require UTF-8 support
  3. utf8mb4: MySQL versions prior to 5.7.7 or MariaDB 10.2.2 do not have the innodb_large_prefix option enabled by default.

1. Unicode normalization

  1. http://www.w3.org/TR/charmod-norm/
  2. Unicode Test Installation

In essence, normalize all input from sources where you can't be sure that it's in normal form. In most cases, you should use NFC because most data will be in NFC already.

2. Steps

  1. Convert to UTF-8 without BOM.
    • make sure that any included/required file is in either in ASCII or UTF-8 without BOM, as php doesn't handle non-ascii file very good
    • Tools: iconv
      •  find . -name "*.php" -exec iconv -f ISO-8859-1 -t UTF-8 {} -o {}.new \;
  2. search code base for non-UTF8 compatible functions and replace them
  3. add optional support to replace SafeHtml with HTMLPurifier
  4. write a conversion script that runs through the database and re-encodes everything as UTF-8
  5. check our cloned branch (more soon):
    •  hg clone https://bitbucket.org/wackowiki/wackowiki-dev
  6. String access by character
  7. remove unneeded functions notes
    1. htmlentities
    2. html_entity_decode

2.1. MySQL – Migrating a database data that is already encoded in latin1 to UTF-8

If you have an existing MySQL database that is already encoded in latin1, here’s how to convert the latin1 to UTF-8:


  • migrate all code and data in a single shot
  • downtime needed
  • trial runs needed
  • watch for cp1252 vs. ISO-8859-1

  1. utf8mb4
  2. Converting your MySQL database to UTF8
  3. Datebase Conversion Script

3. Resources

!/Resources