View source for Database UTF-8 conversion

Also available in ((!/de Deutsch)) and ((!/ru Русский))

{{toc numerate=1}}
file:/unicode.svg?right&100

Guide
  * migrate all code and data in a single shot
  * downtime needed
  * trial runs needed

===Steps===

====Pre-Upgrade====
file:/utf8_conversion_script_1.png?right&400
  1. Convert all tables from MyISAM to InnoDB if you have existing MyISAM tables
    * Admin Panel -> Database -> Convert
  1. create a ((/Doc/English/Backup backup)) of your current wiki (database & files)
  2. create a local installation  (or remote - to avoid timeouts and script terminations)
  1. import the backup -> **!!(green)5.5.17!!** (ca. 15 minutes)
  2. run the **((source:6.0/community/action/admin_utf8.php conversion script))**  as Admin (-> see action ##""{{admin_utf8}}""## below)
    1. Alters table ##file##, ##page##, ##page_link## and ##revision## to work without key prefixes longer than 767 bytes (**optional**)
      * MySQL versions prior to **5.7.7** or MariaDB **10.2.2** do not support index key prefixes up to **3072** bytes by default.
      * If you have the possibility to upgrade your database to meet the requirements for the large key prefix, you are advised to do so.
    2. Converts your database and tables to ##utf8mb4##
    3. Converts cross charset records (e.g. Greek saved as Latin1)
      * This is only necessary if you have run your wiki in multilanguage mode and pages with a different charset have been created.
  4. **!!CHECK!!** via phpMyAdmin or Adminer all relevant columns if the conversion was successful **before you continue**!
    * E.g. the scripts may fail to convert the ##body## column due  timeout or script termination.

====Upgrade to R6.0====
file:/wackowiki_r6_en_upgrade_1.png?right&400&caption
  3. switch repo from **5.5** to **6.0** branch (keep! config.php) OR remove the old version and replace it with R6.0.x
  4. empty all data in ##_cache/## folder
  3. upgrade via installer to 6.0.x

====Post-Upgrade====
file:/utf8_conversion_script_2.png?right&400
  1. run the post-upgrade conversion scripts ##""{{admin_utf8}}""##
    1. Reset up-sized TEXT columns back to TEXT or MEDIUMTEXT
    2. Convert HTML entities to their corresponding Unicode characters
    3. Remove column ##converted## from tables
  2. Remove the action **admin_utf8.php** again from the ##action/## folder
  3. synchronize the upgraded data via the Admin Panel

((/Dev/Release/R6.0/Upgrade <-- back to Upgrade notices for 6.0.x))
===Script===

%% ln -s ../../community/action/admin_utf8.php admin_utf8.php%%

{{include page="/Dev/Release/R6.0/Upgrade/DatabaseConversion/Script"}}

===ToDo===
  * comments with another charset (page [ru] -> comment in Russian [en]) -> Mojibake
    * the comment was not converted because the user writing the comments was logged in with English as user profile language and therefore the comment was saved with [en] as comment language, since the page set the encoding and that was not an issue but it is NOW
    * %%(sql) SELECT page_id, tag, title, body FROM prefix_page;%%
  * improve process for regular users, add better description and add localized versions in German and Russian.
  * add check to analyze the database prior to show the suggested actions
  * currently it uses only the MySQL API to update the cross charset records
====Localization====
%%
['en']
	'Utf8Utilities'					=> 'Unicode conversion utilities',
	'Utf8PreUpgrade'				=> 'Pre-Upgrade Routines for R6.x',
	'Utf8KeyPrefixes'				=> 'Alter tables to work without key prefixes longer than 767 bytes',
	'Utf8AlterTablesStarted'		=> 'Alter tables started',
	'Utf8Tables'					=> 'Tables:  file, page, page_link and revision',
	'Utf8ConvertDatabase'			=> 'Convert database and tables based on charset to',
	'Utf8TableConversionStarted'	=> 'Table conversion started',
	'Utf8TablesConverted'			=> 'Tables converted.',
	'Utf8ConvertCrossCharset'		=> 'Convert all cross charset records',
	'Utf8TablesConversionStarted'	=> 'Tables record conversion started',
	'Utf8RecordsConverted'			=> 'Records converted',
	'Utf8TablesRecordsConverted'	=> 'Tables records converted.',
	'Utf8PostUpgrade'				=> 'Post-Upgrade Routines for R6.x',
	'Utf8RemoveColumnConverted'		=> 'Remove column \'converted\' from tables',
	'Utf8DropedColumnConverted'		=> 'Droped column \'converted\' from tables',
	'Utf8ResetTextColumns'			=> 'Reset up-sized TEXT columns back to TEXT or MEDIUMTEXT',
	'Utf8StartedTextConversion'		=> 'Started conversion of TEXT columns',
	'Utf8ResetTextColumns'			=> 'Reset converted TEXT columns',
	'Utf8ConvertHtmlEntities'		=> 'Convert HTML entities to their corresponding Unicode characters',
%%
%%
['de']
	'Utf8Utilities'					=> 'Funktionen zur Unicode-Konvertierung',
	'Utf8PreUpgrade'				=> 'Pre-Upgrade-Routinen für R6.x',
	'Utf8KeyPrefixes'				=> 'Änderung der Tabellen, damit sie ohne Schlüsselpräfixe funktionieren, die länger als 767 Bytes sind.',
	'Utf8AlterTablesStarted'		=> 'Ändern von Tabellen gestartet',
	'Utf8Tables'					=> 'Tabllen:  file, page, page_link und revision',
	'Utf8ConvertDatabase'			=> 'Konvertiert Datenbank und Tabellen basierend auf dem Zeichensatz nach',
	'Utf8TableConversionStarted'	=> 'Konvertierung der Tabelle gestartet',
	'Utf8TablesConverted'			=> 'Tabellen konvertiert.',
	'Utf8ConvertCrossCharset'		=> 'Konvertiert alle Zeichensatz-übergreifenden Datensätze.',
	'Utf8TablesConversionStarted'	=> 'Konvertierung der Datensätze in den Tabellen gestartet',
	'Utf8RecordsConverted'			=> 'Datensätze konvertiert',
	'Utf8TablesRecordsConverted'	=> 'Datensätze in den Tabellen konvertiert.',
	'Utf8PostUpgrade'				=> 'Post-Upgrade-Routinen für R6.x',
	'Utf8RemoveColumnConverted'		=> 'Löscht die Spalte \'converted\' aus den Tabellen.',
	'Utf8DropedColumnConverted'		=> 'Die Spalte \'converted\' wurde aus den Tabellen entfernt.',
	'Utf8ResetTextColumns'			=> 'Zurücksetzen der vergrößerten TEXT-Spalten auf TEXT oder MEDIUMTEXT',
	'Utf8StartedTextConversion'		=> 'Konvertierung der TEXT-Spalten gestartet',
	'Utf8ResetTextColumns'			=> 'Zurücksetzen der konvertierten TEXT-Spalten',
	'Utf8ConvertHtmlEntities'		=> 'Konvertierung von HTML-Entitäten in die entsprechenden Unicode-Zeichen',
%%