Database UTF-8 conversion
Also available in Deutsch and Русский
Guide
- migrate all code and data in a single shot
- downtime needed
- trial runs needed
1. Steps
1.1. Pre-Upgrade
- Convert all tables from MyISAM to InnoDB if you have existing MyISAM tables
- Admin Panel -> Database -> Convert
- create a backup of your current wiki (database & files)
- create a local installation (or remote - to avoid timeouts and script terminations)
- import the backup -> 5.5.17 (ca. 15 minutes)
- run the conversion script as Admin (-> see action
{{admin_utf8}}
below)- Alters table
file
,page
,page_link
andrevision
to work without key prefixes longer than 767 bytes (optional)- MySQL versions prior to 5.7.7 or MariaDB 10.2.2 do not support index key prefixes up to 3072 bytes by default.
- If you have the possibility to upgrade your database to meet the requirements for the large key prefix, you are advised to do so.
- Converts your database and tables to
utf8mb4
- Converts cross charset records (e.g. Greek saved as Latin1)
- This is only necessary if you have run your wiki in multilanguage mode and pages with a different charset have been created.
- Alters table
- CHECK via phpMyAdmin or Adminer all relevant columns if the conversion was successful before you continue!
- E.g. the scripts may fail to convert the
body
column due timeout or script termination.
- E.g. the scripts may fail to convert the
1.2. Upgrade to R6.0
- switch repo from 5.5 to 6.0 branch (keep! config.php) OR remove the old version and replace it with R6.0.x
- empty all data in
_cache/
folder - upgrade via installer to 6.0.x
1.3. Post-Upgrade
- run the post-upgrade conversion scripts
{{admin_utf8}}
- Reset up-sized TEXT columns back to TEXT or MEDIUMTEXT
- Convert HTML entities to their corresponding Unicode characters
- Remove column
converted
from tables
- Remove the action admin_utf8.php again from the
action/
folder - synchronize the upgraded data via the Admin Panel
<-- back to Upgrade notices for 6.0.x
2. Script
ln -s ../../community/action/admin_utf8.php admin_utf8.php
<?php
if (!defined('IN_WACKO'))
{
exit;
}
##########################################################
## WackoWiki UTF-8 Conversion Routines ##
##########################################################
/*
* status: stable
* https://wackowiki.org/doc/Dev/Release/R6.0/Upgrade/DatabaseConversion
* modify the script for your needs, please contribute your improvements
*
* place the script under action/admin_utf8.php
* call the action via {{admin_utf8}} as Administrator
*
* 1. Pre-Upgrade Routines for R6.x
* 1.0. Alter tables to work without key prefixes longer than 767 bytes
* 1.1. Convert all tables based on charset to utf8mb4
* 1.2. Convert all cross charset records
* 2. Post-Upgrade Routines for R6.x
* 2.1. Reset upsized TEXT columns back to TEXT or MEDIUMTEXT
* 2.2. Convert HTML entities to their corresponding Unicode characters
* 2.3. Remove column 'converted' from tables
*/
/* TODO:
* set progress in config
* add check to analyzed the database prior to show the suggested actions
* currently it uses only the Mysqli API to update the cross charset records
*/
$prefix = $this->db->table_prefix;
$charset = 'utf8mb4';
$collation = 'utf8mb4_unicode_520_ci'; // Unicode (UCA 5.2.0), case-insensitive, https://dev.mysql.com/doc/refman/8.0/en/charset-collation-names.html
// get MariaDB / MySQL version
$_db_version = $this->db->load_single("SELECT version()");
$db_version = $_db_version['version()'];
$large_prefix = false;
echo '<h1>Unicode conversion utilities</h1>';
$info[] = ['WackoWiki version', $this->format('**!!(green)' . $this->db->wacko_version . '!!**', 'wacko')];
$info[] = ['MariaDB / MySQL version', $db_version];
$info[] = ['Database charset', $this->db->database_charset];
$info[] = ['Database collation', $this->db->database_collation];
echo '<table style="max-width:800px; border-spacing: 1px; border-collapse: separate; padding: 4px;" class="formation lined">' . "\n";
foreach ($info as $value)
{
echo '<tr>' .
'<td class="label"><strong>' . $value[0] . '</strong></td>' .
'<td> </td>' .
'<td>' . $value[1] . '</td>' . "\n";
}
echo '</table>' . "\n";
echo '<h2>1. Pre-Upgrade Routines for R6.x</h2>';
if ($this->is_admin())
{
// MySQL versions prior to 5.7.7 or MariaDB 10.2.2 do not support index key prefixes up to 3072 bytes by default.
$min_db_version = preg_match('/MariaDB/', $db_version, $matches)
? '10.2.2'
: '5.7.7';
if (version_compare($db_version, $min_db_version , '>='))
{
$large_prefix = true;
}
if (! $large_prefix)
{
echo '<h3>1.0. Alter tables to work without key prefixes longer than 767 bytes:</h3>';
if (!isset($_POST['set_large_prefix_tables']))
{
echo $this->form_open('large_prefix');
echo '<button type="submit" name="set_large_prefix_tables">' . $this->_t('UpdateButton') . '</button>';
echo $this->form_close();
}
else if (isset($_POST['set_large_prefix_tables']))
{
$results =
'<strong>' . date('H:i:s') . ' - ' . 'Alter tables started' . "\n" .
'================================================</strong>' . "\n";
$results .=
'<strong>' . 'VARCHAR(191): ' . "\n" .
'Tables: file, page, page_link and revision:</strong>' . "\n\n";
$this->db->sql_query("
ALTER TABLE {$prefix}file
CHANGE file_name file_name VARCHAR(191) NOT NULL DEFAULT '';");
$this->db->sql_query("
ALTER TABLE {$prefix}page
CHANGE title title VARCHAR(191) NOT NULL DEFAULT '',
CHANGE tag tag VARCHAR(191) BINARY NOT NULL DEFAULT '';");
$this->db->sql_query("
ALTER TABLE {$prefix}page_link
CHANGE to_tag to_tag VARCHAR(191) BINARY NOT NULL DEFAULT '';");
$this->db->sql_query("
ALTER TABLE {$prefix}revision
CHANGE title title VARCHAR(191) NOT NULL DEFAULT '',
CHANGE tag tag VARCHAR(191) BINARY NOT NULL DEFAULT '';");
$results .=
'<strong>' . date('H:i:s') . ' - ' . 'Tables altered.' . "\n" .
'================================================</strong>' . "\n";
echo
'<div class="code">' .
'<pre>' . $results . '</pre>' .
'</div><br>';
}
}
########################################################
## Convert charset for tables ##
########################################################
echo '<h3>1.1. Convert database and tables based on charset to ' . $charset . ':</h3>';
if (!isset($_POST['set_charset_tables']))
{
echo $this->form_open('charset_tables');
echo '<button type="submit" name="set_charset_tables">' . $this->_t('UpdateButton') . '</button>';
echo $this->form_close();
}
else if (isset($_POST['set_charset_tables']))
{
$tables = [
'acl',
'auth_token',
'cache',
'category',
'category_assignment',
'config',
'external_link',
'file',
'file_link',
'log',
'menu',
'page',
'page_link',
'poll',
'rating',
'referrer',
'revision',
'user',
'usergroup',
'usergroup_member',
'user_setting',
'watch',
'word',
];
$results =
'<strong>' . date('H:i:s') . ' - ' . 'Table conversion started' . "\n" .
'================================================</strong>' . "\n";
$results .=
'<strong>' . 'Charset: ' . $charset . "\n" .
'Collation: ' . ' ' . $collation . ':</strong>' . "\n\n";
// Database
$this->db->sql_query("ALTER DATABASE {$this->db->database_database} CHARACTER SET = {$charset} COLLATE = '{$collation}';");
// Tables
foreach ($tables as $table)
{
$this->db->sql_query("ALTER TABLE " . $prefix . $table . " DEFAULT CHARACTER SET {$charset} COLLATE {$collation};");
$this->db->sql_query("ALTER TABLE " . $prefix . $table . " CONVERT TO CHARACTER SET {$charset} COLLATE {$collation};");
// B: convert records in a different charset -> see next routine
// C: HTML entities -> see post upgrade routine
// TODO: convert mojibake
// TODO: deal with old supertag links
$results .=
"\t" . '<strong>' . date('H:i:s') . ' - ' . $table."\n" .
"\t" . '------------------------------------------------</strong>' . "\n";
}
$results .=
'<strong>' . date('H:i:s') . ' - ' . 'Tables converted.' . "\n" .
'================================================</strong>' . "\n";
echo
'<div class="code">' .
'<pre>' . $results . '</pre>' .
'</div><br>';
}
########################################################
## Convert strings of cross charset records ##
########################################################
if ($this->db->database_charset != 'utf8mb4')
{
echo '<h3>1.2. Convert all cross charset records</h3>';
if (!isset($_POST['set_charset_record']))
{
echo $this->form_open('charset_records');
echo '<button type="submit" name="set_charset_record">' . $this->_t('UpdateButton') . '</button>';
echo $this->form_close();
}
else if (isset($_POST['set_charset_record']))
{
# set_time_limit(3600);
// iso-8859-1 -> windows-1252
$_charset = [
'bg' => 'windows-1251',
'da' => 'windows-1252',
'de' => 'windows-1252',
'el' => 'iso-8859-7',
'en' => 'windows-1252',
'es' => 'windows-1252',
'et' => 'windows-1257',
'fr' => 'windows-1252',
'hu' => 'iso-8859-2',
'it' => 'windows-1252',
'nl' => 'windows-1252',
'pl' => 'iso-8859-2',
'pt' => 'windows-1252',
'ru' => 'windows-1251',
];
// selects the subset of languages to convert according your 'database_charset'
if ($this->db->database_charset == 'latin1') // Latin1 (iso-8859-1)
{
$_lang_set = "'bg', 'el', 'et', 'hu', 'pl', 'ru'";
}
else if ($this->db->database_charset == 'latin2') // Latin2 (iso-8859-2)
{
$_lang_set = "'bg', 'da', 'de', 'el', 'en', 'es', 'et', 'fr', 'it', 'nl', 'pt', 'ru'";
}
else if ($this->db->database_charset == 'cp1251') // Kyrillic (windows-1251)
{
$_lang_set = "'da', 'de', 'el', 'en', 'es', 'et', 'fr', 'hu', 'it', 'nl', 'pl', 'pt'";
}
else if ($this->db->database_charset == 'cp1257') // Baltic (windows-1257)
{
$_lang_set = "'bg', 'da', 'de', 'el', 'en', 'es', 'fr', 'hu', 'it', 'nl', 'pl', 'pt', 'ru'";
}
else if ($this->db->database_charset == 'greek') // Greek (iso-8859-7)
{
$_lang_set = "'bg', 'da', 'de', 'en', 'es', 'et', 'fr', 'hu', 'it', 'nl', 'pl', 'pt', 'ru'";
}
/* Tables: sets value 1 in converted record to avoid double conversion
* 1 - utf8
* 2 - HTML entities
* 3 - mojibake
*/
$tables = [
'category',
'file',
'menu',
'page',
'revision',
'user',
'usergroup',
];
// add field 'converted'
foreach ($tables as $table)
{
$this->db->sql_query("ALTER TABLE " . $prefix . $table . " ADD converted TINYINT(1) UNSIGNED NOT NULL DEFAULT '0';");
}
#########################################################################################################################
// DATABASE utf8mb4 link
$sql_modes = !empty($this->db->sql_mode_strict) ? SQL_MODE_STRICT : SQL_MODE_PERMISSIVE;
$dblink = mysqli_connect($this->db->database_host, $this->db->database_user, $this->db->database_password, null, $this->db->database_port);
mysqli_select_db($dblink, $this->db->database_database);
mysqli_set_charset($dblink, $charset);
mysqli_query($dblink, "SET SESSION sql_mode='$sql_modes'");
#########################################################################################################################
$results =
'<strong>' . date('H:i:s') . ' - ' . 'Tables record conversion started ' . "\n" .
'================================================</strong>' . "\n";
## 1 .CATEGORIES ##
if ($categories = $this->db->load_all(
"SELECT category_id, category, category_description, category_lang " .
"FROM {$prefix}category " .
"WHERE " .
"category_lang IN (" . $_lang_set . ") " .
"AND converted <> 1 ")
)
{
$total = 0;
$results .=
"\t" . '<strong>' . date('H:i:s') . ' - ' . 'category' . "\n" .
"\t" . '------------------------------------------------</strong>' . "\n";
foreach($categories as $category)
{
$category_name = iconv($_charset[$category['category_lang']], 'UTF-8', $category['category']);
$category_description = iconv($_charset[$category['category_lang']], 'UTF-8', $category['category_description']);
#echo $category_name . '<br>';
// update category
mysqli_query($dblink,
"UPDATE {$prefix}category SET " .
"category = " . $this->db->q($category_name) . ", " .
"category_description = " . $this->db->q($category_description) . ", " .
"converted = 1 " .
"WHERE category_id = " . (int) $category['category_id'] . " " .
"LIMIT 1");
# echo mysqli_error($dblink) . '<br>';
$total++;
}
$results .= "\t" . '<strong>' . date('H:i:s') . ' - ' . 'Records converted' . ': ' . $total . '</strong>' . "\n\n\n";
}
## 2. FILES ##
if ($files = $this->db->load_all(
"SELECT file_id, file_name, file_description, caption, file_lang " .
"FROM {$prefix}file " .
"WHERE " .
"file_lang IN (" . $_lang_set . ") " .
"AND converted <> 1 ")
)
{
$total = 0;
$results .=
"\t" . '<strong>' . date('H:i:s') . ' - ' . 'file' . "\n" .
"\t" . '------------------------------------------------</strong>' . "\n";
foreach($files as $file)
{
$file_description = iconv($_charset[$file['file_lang']], 'UTF-8', $file['file_description']);
$caption = iconv($_charset[$file['file_lang']], 'UTF-8', $file['caption']);
# echo $file['file_name'] . '<br>';
// update file meta data
mysqli_query($dblink,
"UPDATE {$prefix}file SET " .
"file_description = " . $this->db->q($file_description) . ", " .
"caption = " . $this->db->q($caption) . ", " .
"converted = 1 " .
"WHERE file_id = " . (int) $file['file_id'] . " " .
"LIMIT 1");
# echo mysqli_error($dblink) . '<br>';
$total++;
}
$results .= "\t" . '<strong>' . date('H:i:s') . ' - ' . 'Records converted' . ': ' . $total . '</strong>' . "\n\n\n";
}
## 3. MENU ##
if ($items = $this->db->load_all(
"SELECT menu_id, menu_title, menu_lang " .
"FROM {$prefix}menu " .
"WHERE " .
"menu_lang IN (" . $_lang_set . ") " .
"AND converted <> 1 ")
)
{
$total = 0;
$results .=
"\t" . '<strong>' . date('H:i:s') . ' - ' . 'menu' . "\n" .
"\t" . '------------------------------------------------</strong>' . "\n";
foreach($items as $item)
{
$menu_title = iconv($_charset[$item['menu_lang']], 'UTF-8', $item['menu_title']);
# echo $menu_title . '<br>';
// update menu title
mysqli_query($dblink,
"UPDATE {$prefix}menu SET " .
"menu_title = " . $this->db->q($menu_title) . ", " .
"converted = 1 " .
"WHERE menu_id = " . (int) $item['menu_id'] . " " .
"LIMIT 1");
# echo mysqli_error($dblink) . '<br>';
$total++;
}
$results .= "\t" . '<strong>' . date('H:i:s') . ' - ' . 'Records converted' . ': ' . $total . '</strong>' . "\n\n\n";
}
## 4. PAGES ##
if ($pages = $this->db->load_all(
"SELECT page_id, tag, title, body, edit_note, description, keywords, page_lang " .
"FROM {$prefix}page " .
"WHERE " .
"page_lang IN (" . $_lang_set . ") " .
"AND converted <> 1 ")
)
{
$total = 0;
$results .=
"\t" . '<strong>' . date('H:i:s') . ' - ' . 'page' . "\n" .
"\t" . '------------------------------------------------</strong>' . "\n";
foreach($pages as $page)
{
$tag = iconv($_charset[$page['page_lang']], 'UTF-8', $page['tag']);
$title = iconv($_charset[$page['page_lang']], 'UTF-8', $page['title']);
$body = iconv($_charset[$page['page_lang']], 'UTF-8', $page['body']);
$edit_note = iconv($_charset[$page['page_lang']], 'UTF-8', $page['edit_note']);
$description = iconv($_charset[$page['page_lang']], 'UTF-8', $page['description']);
$keywords = iconv($_charset[$page['page_lang']], 'UTF-8', $page['keywords']);
# echo $tag . '<br>';
// update current page copy
mysqli_query($dblink,
"UPDATE {$prefix}page SET " .
"tag = " . $this->db->q($tag) . ", " .
"title = " . $this->db->q($title) . ", " .
"body = " . $this->db->q($body) . ", " .
"edit_note = " . $this->db->q($edit_note) . ", " .
"description = " . $this->db->q($description) . ", " .
"keywords = " . $this->db->q($keywords) . ", " .
"converted = 1 " .
"WHERE page_id = " . (int) $page['page_id'] . " " .
"LIMIT 1");
# echo mysqli_error($dblink) . '<br>';
$total++;
}
$results .= "\t" . '<strong>' . date('H:i:s') . ' - ' . 'Records converted' . ': ' . $total . '</strong>' . "\n\n\n";
}
## 5. REVISIONS ##
if ($revisions = $this->db->load_all(
"SELECT revision_id, tag, title, body, edit_note, description, keywords, page_lang " .
"FROM {$prefix}revision " .
"WHERE " .
"page_lang IN (" . $_lang_set . ") " .
"AND converted <> 1 ")
)
{
$total = 0;
$results .=
"\t" . '<strong>' . date('H:i:s') . ' - ' . 'revision' . "\n" .
"\t" . '------------------------------------------------</strong>' . "\n";
foreach($revisions as $revision)
{
$tag = iconv($_charset[$revision['page_lang']], 'UTF-8', $revision['tag']);
$title = iconv($_charset[$revision['page_lang']], 'UTF-8', $revision['title']);
$body = iconv($_charset[$revision['page_lang']], 'UTF-8', $revision['body']);
$edit_note = iconv($_charset[$revision['page_lang']], 'UTF-8', $revision['edit_note']);
$description = iconv($_charset[$revision['page_lang']], 'UTF-8', $revision['description']);
$keywords = iconv($_charset[$revision['page_lang']], 'UTF-8', $revision['keywords']);
# echo $revision['revision_id'] . ': ' . $tag . '<br>';
// update revision
mysqli_query($dblink,
"UPDATE {$prefix}revision SET " .
"tag = " . $this->db->q($tag) . ", " .
"title = " . $this->db->q($title) . ", " .
"body = " . $this->db->q($body) . ", " .
"edit_note = " . $this->db->q($edit_note) . ", " .
"description = " . $this->db->q($description) . ", " .
"keywords = " . $this->db->q($keywords) . ", " .
"converted = 1 " .
"WHERE revision_id = " . (int) $revision['revision_id'] . " " .
"LIMIT 1");
# echo mysqli_error($dblink) . '<br>';
$total++;
}
$results .= "\t" . '<strong>' . date('H:i:s') . ' - ' . 'Records converted' . ': ' . $total . '</strong>' . "\n\n\n";
}
## 6. USERS ##
if ($users = $this->db->load_all(
"SELECT user_id, user_name, real_name, account_lang " .
"FROM {$prefix}user " .
"WHERE " .
"account_lang IN (" . $_lang_set . ") " .
"AND converted <> 1 ")
)
{
$total = 0;
$results .=
"\t" . '<strong>' . date('H:i:s') . ' - ' . 'user' . "\n" .
"\t" . '------------------------------------------------</strong>' . "\n";
foreach($users as $user)
{
$user_name = iconv($_charset[$user['account_lang']], 'UTF-8', $user['user_name']);
$real_name = iconv($_charset[$user['account_lang']], 'UTF-8', $user['real_name']);
# echo $user_name . '<br>';
// update user
mysqli_query($dblink,
"UPDATE {$prefix}user SET " .
"user_name = " . $this->db->q($user_name) . ", " .
"real_name = " . $this->db->q($real_name) . ", " .
"converted = 1 " .
"WHERE user_id = " . (int) $user['user_id'] . " " .
"LIMIT 1");
# echo mysqli_error($dblink) . '<br>';
$total++;
}
$results .= "\t" . '<strong>' . date('H:i:s') . ' - ' . 'Records converted' . ': ' . $total . '</strong>' . "\n\n\n";
}
## 7. USERGROUPS ##
if ($usergroups = $this->db->load_all(
"SELECT group_id, group_name, description, group_lang " .
"FROM {$prefix}usergroup " .
"WHERE " .
"group_lang IN (" . $_lang_set . ") " .
"AND converted <> 1 ")
)
{
$total = 0;
$results .=
"\t" . '<strong>' . date('H:i:s') . ' - ' . 'usergroup' . "\n" .
"\t" . '------------------------------------------------</strong>' . "\n";
foreach($usergroups as $group)
{
$group_name = iconv($_charset[$group['group_lang']], 'UTF-8', $group['group_name']);
$description = iconv($_charset[$group['group_lang']], 'UTF-8', $group['description']);
# echo $group_name . '<br>';
// update user group
mysqli_query($dblink,
"UPDATE {$prefix}usergroup SET " .
"group_name = " . $this->db->q($group_name) . ", " .
"description = " . $this->db->q($description) . ", " .
"converted = 1 " .
"WHERE group_id = " . (int) $group['group_id'] . " " .
"LIMIT 1");
# echo mysqli_error($dblink) . '<br>';
$total++;
}
$results .= "\t" . '<strong>' . date('H:i:s') . ' - ' . 'Records converted' . ': ' . $total . '</strong>' . "\n\n\n";
}
$results .=
'<strong>' . date('H:i:s') . ' - ' . 'Tables records converted.' . "\n" .
'================================================</strong>' . "\n";
echo
'<div class="code">' .
'<pre>' . $results . '</pre>' .
'</div><br>';
}
}
echo '<h2>2. Post-Upgrade Routines for R6.x</h2>';
echo '<h3>2.1. Reset upsized TEXT columns back to TEXT or MEDIUMTEXT:</h3>';
if (!isset($_POST['reset_text_column']))
{
echo $this->form_open('reset_text');
echo '<button type="submit" name="reset_text_column">' . $this->_t('UpdateButton') . '</button>';
echo $this->form_close();
}
else if (isset($_POST['reset_text_column']))
{
/*
* tables having converted to utf8mb4 auto upsized the TEXT columns to avoid insufficient storage
* TEXT -> MEDIUMTEXT -> LONGTEXT
* most of these columns store Latin1 strings and do not require that storage size, set them back to default size
* LONGTEXT -> MEDIUMTEXT -> TEXT
*/
$sql = [
"ALTER TABLE {$prefix}acl CHANGE list list TEXT COLLATE {$collation} NOT NULL AFTER privilege;",
"ALTER TABLE {$prefix}config CHANGE config_value config_value TEXT COLLATE {$collation} NULL AFTER config_name;",
"ALTER TABLE {$prefix}external_link CHANGE link link TEXT COLLATE {$collation} NOT NULL AFTER page_id;",
"ALTER TABLE {$prefix}file CHANGE caption caption TEXT COLLATE {$collation} NOT NULL AFTER file_description;",
"ALTER TABLE {$prefix}log CHANGE message message TEXT COLLATE {$collation} NOT NULL AFTER ip;",
"ALTER TABLE {$prefix}page CHANGE body body MEDIUMTEXT COLLATE {$collation} NOT NULL AFTER modified,
CHANGE body_r body_r MEDIUMTEXT COLLATE {$collation} NOT NULL AFTER body,
CHANGE body_toc body_toc TEXT COLLATE {$collation} NOT NULL AFTER body_r;",
"ALTER TABLE {$prefix}referrer CHANGE user_agent user_agent TEXT COLLATE {$collation} NOT NULL AFTER ip;",
"ALTER TABLE {$prefix}revision CHANGE body body MEDIUMTEXT COLLATE {$collation} NOT NULL AFTER modified,
CHANGE body_r body_r MEDIUMTEXT COLLATE {$collation} NOT NULL AFTER body;",
];
$results =
'<strong>' . date('H:i:s') . ' - ' . 'Started conversion of TEXT columns' . "\n" .
'================================================</strong>' . "\n";
// reset TEXT columns
foreach ($sql as $query)
{
$this->db->sql_query($query);
}
$results .=
'<strong>' . date('H:i:s') . ' - ' . 'Reset converted TEXT columns' . "\n" .
'================================================</strong>' . "\n";
echo
'<div class="code">' .
'<pre>' . $results . '</pre>' .
'</div><br>';
}
if (version_compare($this->db->wacko_version, '6.0.beta1' , '>='))
{
echo '<h3>2.2. Convert HTML entities to their corresponding Unicode characters:</h3>';
if (!isset($_POST['convert_html_entities']))
{
echo $this->form_open('html_entities');
echo '<button type="submit" name="convert_html_entities">' . $this->_t('UpdateButton') . '</button>';
echo $this->form_close();
}
else if (isset($_POST['convert_html_entities']))
{
# set_time_limit(3600);
$convert_entities = function($input)
{
return preg_replace_callback('/(&#[0-9]+;)/', function($m) { return mb_convert_encoding($m[1], 'UTF-8', 'HTML-ENTITIES'); }, $input);
};
/* Tables: sets value 2 in converted record to avoid double conversion
* 1 - utf8
* 2 - HTML entities
* 3 - mojibake
*/
$tables = [
'file',
'page',
'revision',
];
// add field 'converted'
foreach ($tables as $table)
{
$this->db->sql_query("ALTER TABLE " . $prefix . $table . " ADD converted TINYINT(1) UNSIGNED NOT NULL DEFAULT '0';");
}
$results =
'<strong>' . date('H:i:s') . ' - ' . 'Tables record conversion started ' . "\n" .
'================================================</strong>' . "\n";
## 1. FILES ##
if ($files = $this->db->load_all(
"SELECT file_id, file_name, file_description, caption " .
"FROM {$prefix}file " .
"WHERE converted <> 2 ")
)
{
$total = 0;
$results .=
"\t" . '<strong>' . date('H:i:s') . ' - ' . 'file' . "\n" .
"\t" . '------------------------------------------------</strong>' . "\n";
foreach($files as $file)
{
$file_description = $convert_entities($file['file_description']);
$caption = $convert_entities($file['caption']);
# echo $file['file_name'] . '<br>';
// update file meta data
$this->db->sql_query(
"UPDATE {$prefix}file SET " .
"file_description = " . $this->db->q($file_description) . ", " .
"caption = " . $this->db->q($caption) . ", " .
"converted = 2 " .
"WHERE file_id = " . (int) $file['file_id'] . " " .
"LIMIT 1");
$total++;
}
$results .= "\t" . '<strong>' . date('H:i:s') . ' - ' . 'Records converted' . ': ' . $total . '</strong>' . "\n\n\n";
}
## 2. PAGES ##
if ($pages = $this->db->load_all(
"SELECT page_id, title, body, edit_note, description, keywords " .
"FROM {$prefix}page " .
"WHERE converted <> 2 ")
)
{
$total = 0;
$results .=
"\t" . '<strong>' . date('H:i:s') . ' - ' . 'page' . "\n" .
"\t" . '------------------------------------------------</strong>' . "\n";
foreach($pages as $page)
{
$title = $convert_entities($page['title']);
$body = $convert_entities($page['body']);
$edit_note = $convert_entities($page['edit_note']);
$description = $convert_entities($page['description']);
$keywords = $convert_entities($page['keywords']);
# echo $tag . '<br>';
// update current page copy
$this->db->sql_query(
"UPDATE {$prefix}page SET " .
"title = " . $this->db->q($title) . ", " .
"body = " . $this->db->q($body) . ", " .
"edit_note = " . $this->db->q($edit_note) . ", " .
"description = " . $this->db->q($description) . ", " .
"keywords = " . $this->db->q($keywords) . ", " .
"converted = 2 " .
"WHERE page_id = " . (int) $page['page_id'] . " " .
"LIMIT 1");
$total++;
}
$results .= "\t" . '<strong>' . date('H:i:s') . ' - ' . 'Records converted' . ': ' . $total . '</strong>' . "\n\n\n";
}
## 3. REVISIONS ##
if ($revisions = $this->db->load_all(
"SELECT revision_id, title, body, edit_note, description, keywords " .
"FROM {$prefix}revision " .
"WHERE converted <> 2 ")
)
{
$total = 0;
$results .=
"\t" . '<strong>' . date('H:i:s') . ' - ' . 'revision' . "\n" .
"\t" . '------------------------------------------------</strong>' . "\n";
foreach($revisions as $revision)
{
$title = $convert_entities($revision['title']);
$body = $convert_entities($revision['body']);
$edit_note = $convert_entities($revision['edit_note']);
$description = $convert_entities($revision['description']);
$keywords = $convert_entities($revision['keywords']);
# echo $revision['revision_id'] . ': ' . $tag . '<br>';
// update revision
$this->db->sql_query(
"UPDATE {$prefix}revision SET " .
"title = " . $this->db->q($title) . ", " .
"body = " . $this->db->q($body) . ", " .
"edit_note = " . $this->db->q($edit_note) . ", " .
"description = " . $this->db->q($description) . ", " .
"keywords = " . $this->db->q($keywords) . ", " .
"converted = 2 " .
"WHERE revision_id = " . (int) $revision['revision_id'] . " " .
"LIMIT 1");
$total++;
}
$results .= "\t" . '<strong>' . date('H:i:s') . ' - ' . 'Records converted' . ': ' . $total . '</strong>' . "\n\n\n";
}
$results .=
'<strong>' . date('H:i:s') . ' - ' . 'Tables records converted.' . "\n" .
'================================================</strong>' . "\n";
echo
'<div class="code">' .
'<pre>' . $results . '</pre>' .
'</div><br>';
}
echo '<h3>2.3. Remove column \'converted\' from tables:</h3>';
if (!isset($_POST['remove_converted_column']))
{
echo $this->form_open('converted_column');
echo '<button type="submit" name="remove_converted_column">' . $this->_t('UpdateButton') . '</button>';
echo $this->form_close();
}
else if (isset($_POST['remove_converted_column']))
{
// tables having 'converted' column to avoid double conversion
$tables = [
'category',
'file',
'menu',
'page',
'revision',
'user',
'usergroup',
];
// drop field 'converted'
foreach ($tables as $table)
{
$this->db->sql_query("ALTER TABLE " . $prefix . $table . " DROP converted;");
}
$results =
'<strong>' . date('H:i:s') . ' - ' . 'Droped column \'converted\' from tables' . "\n" .
'================================================</strong>' . "\n";
echo
'<div class="code">' .
'<pre>' . $results . '</pre>' .
'</div><br>';
}
}
}
3. ToDo
- comments with another charset (page [ru] -> comment in Russian [en]) -> Mojibake
- the comment was not converted because the user writing the comments was logged in with English as user profile language and therefore the comment was saved with [en] as comment language, since the page set the encoding and that was not an issue but it is NOW
-
SELECT page_id, tag, title, body FROM prefix_page;
- improve process for regular users, add better description and add localized versions in German and Russian.
- add check to analyze the database prior to show the suggested actions
- currently it uses only the MySQL API to update the cross charset records
3.1. Localization
['en'] 'Utf8Utilities' => 'Unicode conversion utilities', 'Utf8PreUpgrade' => 'Pre-Upgrade Routines for R6.x', 'Utf8KeyPrefixes' => 'Alter tables to work without key prefixes longer than 767 bytes', 'Utf8AlterTablesStarted' => 'Alter tables started', 'Utf8Tables' => 'Tables: file, page, page_link and revision', 'Utf8ConvertDatabase' => 'Convert database and tables based on charset to', 'Utf8TableConversionStarted' => 'Table conversion started', 'Utf8TablesConverted' => 'Tables converted.', 'Utf8ConvertCrossCharset' => 'Convert all cross charset records', 'Utf8TablesConversionStarted' => 'Tables record conversion started', 'Utf8RecordsConverted' => 'Records converted', 'Utf8TablesRecordsConverted' => 'Tables records converted.', 'Utf8PostUpgrade' => 'Post-Upgrade Routines for R6.x', 'Utf8RemoveColumnConverted' => 'Remove column \'converted\' from tables', 'Utf8DropedColumnConverted' => 'Droped column \'converted\' from tables', 'Utf8ResetTextColumns' => 'Reset up-sized TEXT columns back to TEXT or MEDIUMTEXT', 'Utf8StartedTextConversion' => 'Started conversion of TEXT columns', 'Utf8ResetTextColumns' => 'Reset converted TEXT columns', 'Utf8ConvertHtmlEntities' => 'Convert HTML entities to their corresponding Unicode characters',
['de'] 'Utf8Utilities' => 'Funktionen zur Unicode-Konvertierung', 'Utf8PreUpgrade' => 'Pre-Upgrade-Routinen für R6.x', 'Utf8KeyPrefixes' => 'Änderung der Tabellen, damit sie ohne Schlüsselpräfixe funktionieren, die länger als 767 Bytes sind.', 'Utf8AlterTablesStarted' => 'Ändern von Tabellen gestartet', 'Utf8Tables' => 'Tabllen: file, page, page_link und revision', 'Utf8ConvertDatabase' => 'Konvertiert Datenbank und Tabellen basierend auf dem Zeichensatz nach', 'Utf8TableConversionStarted' => 'Konvertierung der Tabelle gestartet', 'Utf8TablesConverted' => 'Tabellen konvertiert.', 'Utf8ConvertCrossCharset' => 'Konvertiert alle Zeichensatz-übergreifenden Datensätze.', 'Utf8TablesConversionStarted' => 'Konvertierung der Datensätze in den Tabellen gestartet', 'Utf8RecordsConverted' => 'Datensätze konvertiert', 'Utf8TablesRecordsConverted' => 'Datensätze in den Tabellen konvertiert.', 'Utf8PostUpgrade' => 'Post-Upgrade-Routinen für R6.x', 'Utf8RemoveColumnConverted' => 'Löscht die Spalte \'converted\' aus den Tabellen.', 'Utf8DropedColumnConverted' => 'Die Spalte \'converted\' wurde aus den Tabellen entfernt.', 'Utf8ResetTextColumns' => 'Zurücksetzen der vergrößerten TEXT-Spalten auf TEXT oder MEDIUMTEXT', 'Utf8StartedTextConversion' => 'Konvertierung der TEXT-Spalten gestartet', 'Utf8ResetTextColumns' => 'Zurücksetzen der konvertierten TEXT-Spalten', 'Utf8ConvertHtmlEntities' => 'Konvertierung von HTML-Entitäten in die entsprechenden Unicode-Zeichen',
Estonian: R6.0 Unicode Migration and cp1257 Windows Baltic
cp1257
) as their default language.In earlier versions the installer did not offered 'cp1257 Windows Baltic' as codepage for the database. While this not affected the final output in the Wiki because the database did the conversion according the requested charset in the connection, it does now because it converts the tables in the database itself.
latin1 --> utf8mb4 (wrong)
cp1257 --> utf8mb4 (right)
'cp1257 Windows Baltic'
now for all your tables BEFORE you run the Unicode conversion script, else the script uses the wrong charset!action/convert_cp1257.php
Do not forget to remove the action
{{convert_cp1257}}
before you continue with with the Pre-Upgrade conversion script{{admin_utf8}}
.