utf8mb4


🐌🍏🍎🐝

1. Check the maximum length of columns and index keys


This is probably the most tedious part of the whole upgrading process.


When converting from utf8 to utf8mb4, the maximum length of a column or index key is unchanged in terms of bytes. Therefore, it is smaller in terms of characters, because the maximum length of a character is now four bytes instead of three.


For example, a TINYTEXT column can hold up to 255 bytes, which correlates to 85 three-byte or 63 four-byte characters. Let’s say you have a TINYTEXT column that uses utf8 but must be able to contain more than 63 characters. Given this requirement, you can’t convert this column to utf8mb4 unless you also change the data type to a longer type such as TEXT — because if you’d try to fill it with four-byte characters, you’d only be able to enter 63 characters, but not more.


The same goes for index keys. The InnoDB storage engine has a maximum index length of 767 bytes, so for utf8 or utf8mb4 columns, you can index a maximum of 255 or 191 characters, respectively. If you currently have utf8 columns with indexes longer than 191 characters, you will need to index a smaller number of characters when using utf8mb4. (Because of this, I had to change some indexed VARCHAR(255) columns to VARCHAR(191).)

1.1. innodb_large_prefix

Enable this option to allow index key prefixes longer than 767 bytes (up to 3072 bytes) for InnoDB tables that use DYNAMIC or COMPRESSED row format. (Creating such tables also requires the option values innodb_file_format=barracuda and innodb_file_per_table=true.)


Barracuda is the newest file format, it supports index key prefixes up to 3072 bytes.


MySQL versions prior to 5.7.7 or MariaDB 10.2.2 do not support index key prefixes up to 3072 bytes by default. To enable it in one of these versions, add the following to your my.cnf file:

[mysqld]
innodb_large_prefix=1
innodb_file_format=barracuda
innodb_file_per_table=true	

https://dev.mysql.com/doc/refm[...]code-conversion.html


 SET @@global.innodb_large_prefix = 1;	

SET GLOBAL innodb_file_format=Barracuda;
SET GLOBAL innodb_file_per_table=on;
SET GLOBAL innodb_large_prefix=on;	

2. Types


Type Size Description
CHAR[Length] Length bytes A fixed-length field from 0 to 255 characters long
VARCHAR[Length] String length + 1 or 2 bytes A variable-length field from 0 to 65,535 characters long
TINYTEXT String length + 1 bytes A string with a maximum length of 255 characters
TEXT String length + 2 bytes A string with a maximum length of 65,535 characters
MEDIUMTEXT String length + 3 bytes A string with a maximum length of 16,777,215 characters
LONGTEXT String length + 4 bytes A string with a maximum length of 4,294,967,295 characters

types bytes utf8 utf8mb4
TINYTEXT, CHAR 255 85 64
TEXT, VARCHAR 65.535 21.845 16.384
MEDIUMTEXT 16.777.215 5.592.405 4.194.304
LONGTEXT 4.294.967.295 1.431.655.765 1.073.741.824

3. Settings


'database_charset' => 'utf8mb4',
'database_collation' => 'utf8mb4_unicode_520_ci',


For Unicode character sets, collation names may include a version number to indicate the version of the Unicode Collation Algorithm (UCA) on which the collation is based. UCA-based collations without a version number in the name use the version-4.0.0 UCA weight keys. For example:



4. Resources

  1. https://dev.mysql.com/doc/refm[...]code-conversion.html
  2. https://stackoverflow.com/ques[...]all-text-based-field
  3. http://mysql.rjweb.org/doc.php[...]osing_charset_issues
  4. MySQL: Data Type Storage Requirements
  5. MySQL: Collation Naming Conventions