BINARY tag


Ðỉäçȑîṭĭčś

1. Current situation


BINARY

For some reason we changed in R4.3 the tag in the page table from
VARCHAR(250) BINARY to VARCHAR(250).


Til R6.0 WackoWiki used Translit and the supertag (Latin1, case-insensitive) as cross charset reference.

Currently you can create any tag you like, but then you can't rename it or create a new page with or without diacritics of the same word.

  • Português / Portugues (same)
  • Krąków / Kraków (different locations)

Either way it directs to the existing version since the default SQL coalition treats diacritics as not significant (_ci versus _bin).


Link Page Note
_ci:
PrawaDostępu
PrawaDostepu
Prawadostępu
Prawadostepu
prawadostępu
prawadostepu
PrawaDostępu accent-insensitive, case-insensitive
_bin:
PrawaDostępu
PrawaDostepu
Prawadostępu
Prawadostepu
prawadostępu
prawadostepu
PrawaDostępu
PrawaDostepu
Prawadostępu
Prawadostepu
prawadostępu
prawadostepu
accent-sensitive, case-sensitive

2. Make the page tag accent and case-sensitive


This may render existing internal and incoming external links invalid.

ALTER TABLE {$pref}page CHANGE tag tag VARCHAR(250) BINARY NOT NULL DEFAULT '';

commit:0c346e9


translates with current defaults into:

ALTER TABLE `prefix_page` CHANGE `tag` `tag` VARCHAR(250CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_520_ci NOT NULL DEFAULT '';

3. Considerations


We could show a hint / warning if someone creates a similar page with just different diacritics or case folding:

Hint

A similar page already exists:
-> Krąków

There are the following cases to consider:

  • user_name
  • file_name
  • tag

All three can constitute a part of a internal link.


How we should deal, proceed, with the issue, what behavior is desired for each case, e.g. avoid user_name misuse.

3.1. Search

Make search accent and case-insensitive.

3.2. Tag naming policies

4. VARCHAR BINARY versus VARBINARY

The BINARY and VARBINARY Types


The BINARY attribute cause the binary collation for the column character set to be used, and the column itself contains nonbinary character strings rather than binary byte strings.


5. Collation

Collation Suffix Meanings

Suffix Meaning
_ai Accent-insensitive
_as Accent-sensitive
_ci Case-insensitive
_cs Case-sensitive
_bin Binary

Examples

  • utf8mb4_0900_ai_ci
  • utf8mb4_0900_as_ci
  • utf8mb4_0900_as_cs
  • utf8mb4_0900_bin
  • utf8mb4_bin


6. Tools