utf8mb4
🐌🍏🍎🐝
1. Check the maximum length of columns and index keys
This is probably the most tedious part of the whole upgrading process.
When converting from utf8
to utf8mb4
, the maximum length of a column or index key is unchanged in terms of bytes. Therefore, it is smaller in terms of characters, because the maximum length of a character is now four bytes instead of three.
For example, a TINYTEXT
column can hold up to 255 bytes, which correlates to 85 three-byte or 63 four-byte characters. Let’s say you have a TINYTEXT
column that uses utf8 but must be able to contain more than 63 characters. Given this requirement, you can’t convert this column to utf8mb4 unless you also change the data type to a longer type such as TEXT
— because if you’d try to fill it with four-byte characters, you’d only be able to enter 63 characters, but not more.
The same goes for index keys. The InnoDB storage engine has a maximum index length of 767 bytes, so for utf8 or utf8mb4 columns, you can index a maximum of 255 or 191 characters, respectively. If you currently have utf8 columns with indexes longer than 191 characters, you will need to index a smaller number of characters when using utf8mb4. (Because of this, I had to change some indexed VARCHAR(255)
columns to VARCHAR(191)
.)
1.1. innodb_large_prefix
Enable this option to allow index key prefixes longer than 767 bytes (up to 3072 bytes) for InnoDB tables that use DYNAMIC or COMPRESSED row format. (Creating such tables also requires the option values innodb_file_format=barracuda and innodb_file_per_table=true.)
Barracuda is the newest file format, it supports index key prefixes up to 3072 bytes.
MySQL versions prior to 5.7.7 or MariaDB 10.2.2 do not support index key prefixes up to 3072 bytes by default. To enable it in one of these versions, add the following to your my.cnf file:
[mysqld] innodb_large_prefix=1 innodb_file_format=barracuda innodb_file_per_table=true
https://dev.mysql.com/doc/refm[...]code-conversion.html
SET @@global.innodb_large_prefix = 1;
SET GLOBAL innodb_file_format=Barracuda; SET GLOBAL innodb_file_per_table=on; SET GLOBAL innodb_large_prefix=on;
2. Types
Type | Size | Description |
---|---|---|
CHAR[Length] | Length bytes | A fixed-length field from 0 to 255 characters long |
VARCHAR[Length] | String length + 1 or 2 bytes | A variable-length field from 0 to 65,535 characters long |
TINYTEXT | String length + 1 bytes | A string with a maximum length of 255 characters |
TEXT | String length + 2 bytes | A string with a maximum length of 65,535 characters |
MEDIUMTEXT | String length + 3 bytes | A string with a maximum length of 16,777,215 characters |
LONGTEXT | String length + 4 bytes | A string with a maximum length of 4,294,967,295 characters |
types | bytes | utf8 | utf8mb4 |
---|---|---|---|
TINYTEXT , CHAR | 255 | 85 | 64 |
TEXT , VARCHAR | 65.535 | 21.845 | 16.384 |
MEDIUMTEXT | 16.777.215 | 5.592.405 | 4.194.304 |
LONGTEXT | 4.294.967.295 | 1.431.655.765 | 1.073.741.824 |
3. Settings
'database_charset' => 'utf8mb4',
'database_collation' => 'utf8mb4_unicode_520_ci',
For Unicode character sets, collation names may include a version number to indicate the version of the Unicode Collation Algorithm (UCA) on which the collation is based. UCA-based collations without a version number in the name use the version-4.0.0 UCA weight keys. For example:
-
utf8mb4_0900_ai_ci
is based on UCA 9.0.0 weight keys ( http://www.unicode.org/Public/UCA/9.0.0/allkeys.txt ). -
utf8mb4_unicode_520_ci
is based on UCA 5.2.0 weight keys ( http://www.unicode.org/Public/UCA/5.2.0/allkeys.txt ). -
utf8mb4_unicode_ci
(with no version named) is based on UCA 4.0.0 weight keys ( http://www.unicode.org/Public/[...].0/allkeys-4.0.0.txt ).