UTF-8 Action Plan

1. PHP

Version Release Supported until Notes
7.4 December 2019 December 2022 Typed Properties 2.0, Preloading, Null Coalescing Assignment Operator, Improve openssl_random_pseudo_bytes, Weak References, FFI – Foreign Function Interface
8.0 Q1 2021 Q1 2024 Just In Time compilation (JIT), arrays starting with a negative index, consistent type errors for internal functions, fatal error for incompatible method signatures

Supported Versions
Unicode Cheat Sheet

2. Questions

2.1. Where do we need Unicode support?

  1. all text fields in the database
  2. all text files

2.2. Where do we need Normalization?

  1. tag
  2. file_name
  3. user_name


2.3. Where do we need Transliteration support?

  1. file names for attachments
    • optional transliteration via upload_translit setting
    • All the control characters and Unicode ones should be removed from the filenames and their extensions without any exception. Also, the special characters such as ";", ":", ">", "<", "/" ,"\", additional ".", "*", "%", "$", and so on should be discarded as well. If it is applicable and there is no need to have Unicode characters, it is highly recommended to only accept Alpha-Numeric characters and only 1 dot as an input for the file name and the extension; in which the file name and also the extension should not be empty at all (regular expression: [a-zA-Z0-9]{1,200}\.[a-zA-Z0-9]{1,10}).
  2. user names
  3. supertag (droped)
    • supertag is no longer needed and suited as unique identifier


2.4. Why you may want to limit allowed characters?

if (preg_match('/[^\p{Common}\p{Latin}]/u', $_POST['text']))
	// post your data to the database.

3. Core

  • convert all PHP files to UTF-8 - DONE
  • set utf-8 as default encoding - DONE
  • remove do_unicode_entities() calls - DONE
  • remove all supertag usage - DONE
  • Templatest filters -> Unicode

4. Database

5. Libraries

  • php-diff - add unicode support - DONE
  • diff - DONE

6. Migration


7. Testing

  • notifications with diff - DONE