我有一个简单的问题 – UTF-8,UTF-16和UTF-32有什么区别?我知道编码的字符串有不同的大小,但UTF-16和UTF-32是什么?UTF-8不能正确处理所有语言吗? UTF-7如何适应这种情况?


好吧,我比较了解整个事情的技术方面,但我仍然没有看到我应该在我的应用程序中使用UTF-16而不是UTF-8的原因.所以我的问题是 – 其他编码然后是UTF-8的实际用法是什么?

着名的Joel Spolsky的这篇文章解释了它: http://www.joelonsoftware.com/articles/Unicode.html


There are hundreds of traditional encodings which can only store some code points correctly and change all the other code points into question marks. Some popular encodings of English text are Windows-1252 (the Windows 9x standard for Western European languages) and ISO-8859-1, aka Latin-1 (also useful for any Western European language). But try to store Russian or Hebrew letters in these encodings and you get a bunch of question marks. UTF 7, 8, 16, and 32 all have the nice property of being able to store any code point correctly.