UTF-8 Action Plan
1. PHP
Version | Release | Supported until | Notes |
---|---|---|---|
7.4 | December 2019 | December 2022 | Typed Properties 2.0, Preloading, Null Coalescing Assignment Operator, Improve openssl_random_pseudo_bytes, Weak References, FFI – Foreign Function Interface |
8.0 | Q1 2021 | Q1 2024 | Just In Time compilation (JIT), arrays starting with a negative index, consistent type errors for internal functions, fatal error for incompatible method signatures |
Supported Versions[link1]
Unicode Cheat Sheet[link2]
2. Questions
2.1. Where do we need Unicode support?
- all text fields in the database
- all text files
2.2. Where do we need Normalization?
- tag
- file_name
- user_name
Normalization[link3]
2.3. Where do we need Transliteration support?
- file names for attachments
- optional transliteration[link4] via
upload_translit
setting - All the control characters and Unicode ones should be removed from the filenames and their extensions without any exception. Also, the special characters such as "
;
", ":
", ">
", "<
", "/
" ,"\
", additional ".
", "*
", "%
", "$
", and so on should be discarded as well. If it is applicable and there is no need to have Unicode characters, it is highly recommended to only accept Alpha-Numeric characters and only 1 dot as an input for the file name and the extension; in which the file name and also the extension should not be empty at all (regular expression:[a-zA-Z0-9]{1,200}\.[a-zA-Z0-9]{1,10}
).
- optional transliteration[link4] via
- user names
- supertag (droped)
- supertag is no longer needed and suited as unique identifier
Transliteration[link4]
2.4. Why you may want to limit allowed characters?
if (preg_match('/[^\p{Common}\p{Latin}]/u', $_POST['text'])) { // post your data to the database. }
3. Core
- convert all PHP files to UTF-8 - DONE
- set utf-8 as default encoding - DONE
- remove do_unicode_entities() calls - DONE
- remove all supertag usage - DONE
- Templatest filters -> Unicode
4. Database
- drop support for MyISAM. - DONE
- drop supertag fields - DONE
- utf8mb4[link5]
- Database UTF-8 conversion script[link6] - DONE
5. Libraries
- php-diff - add unicode support - DONE
- diff - DONE
6. Migration
🍏🍎🙂🐉7. Testing
- notifications with diff - DONE
- [link1] https://www.php.net/supported-versions.php
- [link2] https://wackowiki.org/doc/Dev/Guidelines/UnicodeCheatSheet
- [link3] https://wackowiki.org/doc/Dev/Release/R6.0/Unicode/Normalization
- [link4] https://wackowiki.org/doc/Dev/Release/R6.0/Unicode/Transliteration
- [link5] https://wackowiki.org/doc/Dev/Release/R6.0/Unicode/utf8mb4
- [link6] https://wackowiki.org/doc/Dev/Release/R6.0/Upgrade/DatabaseConversion