BINARY tag
Ðỉäçȑîṭĭčś
1. Current situation
BINARY
For some reason we changed in R4.3 the tag
in the page table from
VARCHAR(250) BINARY
to VARCHAR(250)
.
Til R6.0 WackoWiki used Translit and the
supertag
(Latin1, case-insensitive) as cross charset reference.
Currently you could create any tag you like, but then you couldn't rename it or create a new page with or without diacritics of the same word.
-
Português
/Portugues
(same) -
Krąków
/Kraków
(different locations)
Either way it directed to the existing version since the default SQL coalition treats diacritics as not significant (_ci
versus _bin
).
Link | Page | Note |
---|---|---|
_ci : | ||
PrawaDostępu PrawaDostepu Prawadostępu Prawadostepu prawadostępu prawadostepu | PrawaDostępu | accent-insensitive, case-insensitive |
_bin : | ||
PrawaDostępu PrawaDostepu Prawadostępu Prawadostepu prawadostępu prawadostepu | PrawaDostępu PrawaDostepu Prawadostępu Prawadostepu prawadostępu prawadostepu | accent-sensitive, case-sensitive |
2. Make the page tag accent and case-sensitive
ALTER TABLE {$pref}page CHANGE tag tag VARCHAR(250) BINARY NOT NULL DEFAULT '';
translates with current defaults into utf8mb4_bin
:
ALTER TABLE `prefix_page` CHANGE `tag` `tag` VARCHAR(250) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL DEFAULT '';
2.1. Backup script
Backup script ignored COLUMNS collation, but page tag columns require the BINARY
collation. The tag
must be accent and case-sensitive.
Set back collation via VARCHAR BINARY
manually
ALTER TABLE `wacko_page` CHANGE `tag` `tag` VARCHAR(250) BINARY NOT NULL;
ALTER TABLE `wacko_page` CHANGE `keywords` `keywords` VARCHAR(250) BINARY NOT NULL;
ALTER TABLE `wacko_revision` CHANGE `tag` `tag` VARCHAR(250) BINARY NOT NULL;
ALTER TABLE `wacko_revision` CHANGE `keywords` `keywords` VARCHAR(250) BINARY NOT NULL;
ALTER TABLE `wacko_page_link` CHANGE `to_tag` `to_tag` VARCHAR(250) BINARY NOT NULL;
FIXED: commit:833adea67dde8a1b9fe64ea8c4563e1660018ef5
3. Considerations
We could show a hint / warning if someone creates a similar page with just different diacritics or case folding:
Hint
A similar page already exists:-> Krąków
There are the following cases to consider:
- user_name
- file_name
- tag
All three can constitute a part of a internal link.
How we should deal, proceed, with the issue, what behavior is desired for each case, e.g. avoid user_name misuse.
3.1. Search & Sorting
Make search and sorting accent and case-insensitive.
COLLATE tag column case-insensitive for ORDER BY and LIKE
-
ORDER BY tag COLLATE utf8mb4_unicode_520_ci ASC
-
WHERE tag COLLATE utf8mb4_unicode_520_ci LIKE
3.2. Tag naming policies
4. VARCHAR BINARY versus VARBINARY
The BINARY and VARBINARY Types
The BINARY
attribute cause the binary collation for the column character set to be used, and the column itself contains nonbinary character strings rather than binary byte strings.
5. Collation
Collation Suffix Meanings
Suffix | Meaning |
---|---|
_ai | Accent-insensitive |
_as | Accent-sensitive |
_ci | Case-insensitive |
_cs | Case-sensitive |
_bin | Binary |
Examples (MySQL 8.0)
-
utf8mb4_0900_ai_ci
-
utf8mb4_0900_as_ci
-
utf8mb4_0900_as_cs
-
utf8mb4_0900_bin
-
utf8mb4_bin
Unicode Collation Algorithm (UCA)