×
Login Register an account
Top Submissions Explore Upgoat Search Random Subverse Random Post Colorize! Site Rules
1

Is there any decisive criticism against using UTF-8 everywhere as standard?

submitted by SithEmpire to programming 2.0 yearsMay 13, 2022 15:04:47 ago (+1/-0)     (programming)

Infogalactic has UTF-8 at accounting for 85% of websites in 2015, and Wikipedia claims 98% presently. The latter whores itself out and has no criticism of UTF-8 at all, and the former only really covers the obvious points about some operations being mildly less convenient with variable-width encoding and sometimes using 3 bytes instead of 2 for an asian glyph. Everyone recommends it as a standard, even Microsoft.

I have no direct problem supporting UTF-8 and running tests against UTF-8 input and those icon characters it has, but its dominance has me very suspicious. If it has the blessing of that much of the world, that must include communists and degenerates. This almost never happens unless it's a format lock-in to secure future compliance of the developer, or incur royalties if a piece of a program happens to be able to encode it (MP3 is that way).

I don't doubt its usefulness, I just get concerned at the lack of anyone presenting any real contrary position at all. Is my suspicion unfounded, or does UTF-8 have some deep rabbit-hole beneath the surface?


13 comments block

Interesting stuff, though I think that is almost entirely a UTF-16 problem with JS, something I noticed a bit later before posting that. I would have thought unless the server is actually processing the HTML content somehow, it shouldn't need to care about UTF-8, just deliver the bytes and it will work.

I also found out that Python 3 checks constantly whether non-ASCII characters are put into a string, and resizes the char width of the entire string if so, presumably blighting normal programs and server backends alike.