View source for utf8mb4

{{toc numerate=1}}
🐌🍏🍎🐝

===Check the maximum length of columns and index keys===

This is probably the most tedious part of the whole upgrading process.

When converting from ##utf8## to ##utf8mb4##, the maximum length of a column or index key is unchanged in terms of bytes. Therefore, it is smaller in terms of characters, because the maximum length of a character is now **four bytes** instead of **three**.

For example, a ##TINYTEXT## column can hold up to 255 bytes, which correlates to 85 three-byte or 63 four-byte characters. Let’s say you have a ##TINYTEXT## column that uses utf8 but must be able to contain more than 63 characters. Given this requirement, you can’t convert this column to utf8mb4 unless you also change the data type to a longer type such as ##TEXT## — because if you’d try to fill it with four-byte characters, you’d only be able to enter 63 characters, but not more.

The same goes for index keys. The InnoDB storage engine has a maximum index length of 767 bytes, so for utf8 or utf8mb4 columns, you can index a maximum of 255 or 191 characters, respectively. If you currently have utf8 columns with indexes longer than **191** characters, you will need to index a smaller number of characters when using utf8mb4. (Because of this, I had to change some indexed ##VARCHAR(255)## columns to ##VARCHAR(191)##.)

====innodb_large_prefix====
Enable this option to allow index key prefixes longer than 767 bytes (up to 3072 bytes) for InnoDB tables that use DYNAMIC or COMPRESSED row format. (Creating such tables also requires the option values innodb_file_format=barracuda and innodb_file_per_table=true.)

((https://dev.mysql.com/doc/refman/5.6/en/innodb-file-format.html Barracuda is the newest file format, it  supports index key prefixes up to 3072 bytes.))

MySQL versions prior to **5.7.7** or MariaDB **10.2.2** do not support index key prefixes up to **3072 bytes** by default. To enable it in one of these versions, add the following to your my.cnf file:
%%
[mysqld]
innodb_large_prefix=1
innodb_file_format=barracuda
innodb_file_per_table=true
%%

https://dev.mysql.com/doc/refman/5.7/en/charset-unicode-conversion.html


%% SET @@global.innodb_large_prefix = 1; %%
%%
SET GLOBAL innodb_file_format=Barracuda;
SET GLOBAL innodb_file_per_table=on;
SET GLOBAL innodb_large_prefix=on;
%%

===Types===

#|
*| Type | Size | Description |*
|| ##CHAR[Length]## | Length bytes | A fixed-length field from 0 to 255 characters long ||
|| ##VARCHAR[Length]## | String length + 1 or 2 bytes | A variable-length field from 0 to 65,535 characters long ||
|| ##TINYTEXT## | String length + 1 bytes | A string with a maximum length of 255 characters ||
|| ##TEXT## | String length + 2 bytes | A string with a maximum length of 65,535 characters ||
|| ##MEDIUMTEXT## | String length + 3 bytes | A string with a maximum length of 16,777,215 characters ||
|| ##LONGTEXT## | String length + 4 bytes | A string with a maximum length of 4,294,967,295 characters ||
|#

#|
*| types | bytes | utf8 | utf8mb4 |*
|| ##TINYTEXT##, ##CHAR## | 255 | 85 | 64 ||
|| ##TEXT##, ##VARCHAR## | 65.535 | 21.845 | 16.384 ||
|| ##MEDIUMTEXT## | 16.777.215 | 5.592.405 | 4.194.304 ||
|| ##LONGTEXT## | 4.294.967.295 | 1.431.655.765 | 1.073.741.824 ||
|#

===Settings===

##'database_charset' => 'utf8mb4',##
##'database_collation' => 'utf8mb4_unicode_520_ci',##


For Unicode character sets, collation names may include a version number to indicate the version of the Unicode Collation Algorithm (UCA) on which the collation is based. UCA-based collations without a version number in the name use the version-4.0.0 UCA weight keys. For example:

  * ##utf8mb4_0900_ai_ci## is based on UCA 9.0.0 weight keys ( http://www.unicode.org/Public/UCA/9.0.0/allkeys.txt ).
  * ##utf8mb4_unicode_520_ci## is based on UCA 5.2.0 weight keys ( http://www.unicode.org/Public/UCA/5.2.0/allkeys.txt ).
  * ##utf8mb4_unicode_ci## (with no version named) is based on UCA 4.0.0 weight keys ( http://www.unicode.org/Public/UCA/4.0.0/allkeys-4.0.0.txt ). 


===Resources===
  1. https://dev.mysql.com/doc/refman/5.7/en/charset-unicode-conversion.html
  2. https://stackoverflow.com/questions/262238/are-there-disadvantages-to-using-a-generic-varchar255-for-all-text-based-field
  3. http://mysql.rjweb.org/doc.php/charcoll#diagnosing_charset_issues
  4. ((https://dev.mysql.com/doc/refman/5.7/en/storage-requirements.html MySQL: Data Type Storage Requirements))
  5. ((https://dev.mysql.com/doc/refman/8.0/en/charset-collation-names.html MySQL: Collation Naming Conventions))