BINARY tag


Ðỉäçȑîṭĭčś

1. Current situation


BINARY

For some reason we changed in R4.3 the tag in the page table from
VARCHAR(250) BINARY to VARCHAR(250).


Til R6.0 WackoWiki used Translit and the supertag (Latin1, case-insensitive) as cross charset reference.

Currently you could create any tag you like, but then you couldn't rename it or create a new page with or without diacritics of the same word.

  • Português / Portugues (same)
  • Krąków / Kraków (different locations)

Either way it directed to the existing version since the default SQL coalition treats diacritics as not significant (_ci versus _bin).


Link Page Note
_ci:
PrawaDostępu
PrawaDostepu
Prawadostępu
Prawadostepu
prawadostępu
prawadostepu
PrawaDostępu accent-insensitive, case-insensitive
_bin:
PrawaDostępu
PrawaDostepu
Prawadostępu
Prawadostepu
prawadostępu
prawadostepu
PrawaDostępu
PrawaDostepu
Prawadostępu
Prawadostepu
prawadostępu
prawadostepu
accent-sensitive, case-sensitive

2. Make the page tag accent and case-sensitive


This may render existing internal and incoming external links invalid.

ALTER TABLE {$pref}page CHANGE tag tag VARCHAR(250) BINARY NOT NULL DEFAULT '';	

commit:0c346e9


translates with current defaults into utf8mb4_bin:

ALTER TABLE `prefix_page` CHANGE `tag` `tag` VARCHAR(250CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL DEFAULT '';	

2.1. Backup script

Backup script ignored COLUMNS collation, but page tag columns require the BINARY collation. The tag must be accent and case-sensitive.



Set back collation via VARCHAR BINARY manually

ALTER TABLE `wacko_page` CHANGE `tag` `tag` VARCHAR(250) BINARY NOT NULL
ALTER TABLE `wacko_page` CHANGE `keywords` `keywords` VARCHAR(250) BINARY NOT NULL
ALTER TABLE `wacko_revision` CHANGE `tag` `tag` VARCHAR(250) BINARY NOT NULL
ALTER TABLE `wacko_revision` CHANGE `keywords` `keywords` VARCHAR(250) BINARY NOT NULL
ALTER TABLE `wacko_page_link` CHANGE `to_tag` `to_tag` VARCHAR(250) BINARY NOT NULL;

FIXED: commit:833adea67dde8a1b9fe64ea8c4563e1660018ef5

3. Considerations


We could show a hint / warning if someone creates a similar page with just different diacritics or case folding:

Hint

A similar page already exists:
-> Krąków

There are the following cases to consider:

  • user_name
  • file_name
  • tag

All three can constitute a part of a internal link.


How we should deal, proceed, with the issue, what behavior is desired for each case, e.g. avoid user_name misuse.

3.1. Search & Sorting

Make search and sorting accent and case-insensitive.


COLLATE tag column case-insensitive for ORDER BY and LIKE
  •  ORDER BY tag COLLATE utf8mb4_unicode_520_ci ASC	
  •  WHERE tag COLLATE utf8mb4_unicode_520_ci LIKE	

3.2. Tag naming policies

4. VARCHAR BINARY versus VARBINARY

The BINARY and VARBINARY Types


The BINARY attribute cause the binary collation for the column character set to be used, and the column itself contains nonbinary character strings rather than binary byte strings.


5. Collation

Collation Suffix Meanings

Suffix Meaning
_ai Accent-insensitive
_as Accent-sensitive
_ci Case-insensitive
_cs Case-sensitive
_bin Binary

Examples (MySQL 8.0)

  • utf8mb4_0900_ai_ci
  • utf8mb4_0900_as_ci
  • utf8mb4_0900_as_cs
  • utf8mb4_0900_bin
  • utf8mb4_bin

Unicode Collation Algorithm (UCA)