UTF-8 Action Plan


1. PHP

Version Release Supported until Notes
7.4 December 2019 December 2022 Typed Properties 2.0, Preloading, Null Coalescing Assignment Operator, Improve openssl_random_pseudo_bytes, Weak References, FFI – Foreign Function Interface
8.0 Q1 2021 Q1 2024 Just In Time compilation (JIT), arrays starting with a negative index, consistent type errors for internal functions, fatal error for incompatible method signatures

Supported Versions

2. Questions

2.1. Where do we need Unicode support?


  1. all text fields in the database
  2. all text files

2.2. Where do we need Normalization?

  1. tag 
  2. file_name
  3. user_name

Normalization

2.3. Where do we need Transliteration support?


  1. file names for attachments
    • All the control characters and Unicode ones should be removed from the filenames and their extensions without any exception. Also, the special characters such as ";", ":", ">", "<", "/", "\", additional ".", "*", "%", "$", and so on should be discarded as well. If it is applicable and there is no need to have Unicode characters, it is highly recommended to only accept Alpha-Numeric characters and only 1 dot as an input for the file name and the extension; in which the file name and also the extension should not be empty at all (regular expression: [a-zA-Z0-9]{1,200}\.[a-zA-Z0-9]{1,10}).
  2. user names
  3. supertag (?)

Transliteration

2.4. Why you may want to limit allowed characters?

if (preg_match('/[^\p{Common}\p{Latin}]/u', $_POST['text']))
{
	// post your data to the database.
}

3. Core

  • converted all PHP files to UTF-8 – DONE
  • set utf-8 as default encoding – DONE
  • removed do_unicode_entities() calls – DONE
  • remove all supertag usage – DONE

4. Database


5. Libraries

6. Migration

7. Testing