WackoWiki: UTF-8 Action Plan

https://wackowiki.org/doc     Version: 60 (24.01.2024 10:34)

UTF-8 Action Plan


1. PHP

Version Release Supported until Notes
7.4 December 2019 December 2022 Typed Properties 2.0, Preloading, Null Coalescing Assignment Operator, Improve openssl_random_pseudo_bytes, Weak References, FFI – Foreign Function Interface
8.0 Q1 2021 Q1 2024 Just In Time compilation (JIT), arrays starting with a negative index, consistent type errors for internal functions, fatal error for incompatible method signatures

Supported Versions[link1]
Unicode Cheat Sheet[link2]

2. Questions

2.1. Where do we need Unicode support?

  1. all text fields in the database
  2. all text files

2.2. Where do we need Normalization?

  1. tag
  2. file_name
  3. user_name

Normalization[link3]

2.3. Where do we need Transliteration support?


  1. file names for attachments
    • optional transliteration[link4] via upload_translit setting
    • All the control characters and Unicode ones should be removed from the filenames and their extensions without any exception. Also, the special characters such as ";", ":", ">", "<", "/" ,"\", additional ".", "*", "%", "$", and so on should be discarded as well. If it is applicable and there is no need to have Unicode characters, it is highly recommended to only accept Alpha-Numeric characters and only 1 dot as an input for the file name and the extension; in which the file name and also the extension should not be empty at all (regular expression: [a-zA-Z0-9]{1,200}\.[a-zA-Z0-9]{1,10}).
  2. user names
  3. supertag (droped)
    • supertag is no longer needed and suited as unique identifier

Transliteration[link4]

2.4. Why you may want to limit allowed characters?

if (preg_match('/[^\p{Common}\p{Latin}]/u', $_POST['text']))
{
	// post your data to the database.
}	

3. Core

4. Database


5. Libraries

6. Migration

🍏🍎🙂🐉

7. Testing