Quotes about Unicode

Even the simple-minded minimal measures that (some) people seem to think they want are guaranteed to miserably break millions of lines of code, code that has no chance to “upgrade” to your spiffy new Brave New World modernity.


It is way way way more complicated than people pretend. I’ve thought about this a huge, whole lot over the past few years. I would love to be shown that I am wrong. But I don’t think I am. Unicode is fundamentally more complex than the model that you would like to impose on it, and there is complexity here that you can never sweep under the carpet. If you try, you’ll break either your own code or somebody else’s. At some point, you simply have to break down and learn what Unicode is about. You cannot pretend it is something it is not.


[...]


Assume Brokenness


And that’s not all. There are million broken assumptions that people make about Unicode. Until they understand these things, their code will be broken.


The lack of Unicode support in PHP remains a serious problem, but at least we’re no longer nursing a dying animal (PHP 6) in the vain hopes of making it work when the entire surrounding situation has changed. This opens the door for new ideas on how to fix the issue.


When I discovered that the popular web development tool PHP has almost complete ignorance of character encoding issues, blithely using 8 bits for characters, making it darn near impossible to develop good international web applications, I thought, enough is enough.