呢個係喺 http://bugs.mysql.com/bug.php?id=14052 mysql官方網上睇到嘅由於我雞腸唔多掂,所以各位睇下係唔係有辦法解決論壇顯示唔到 unicode漢字??有人話 顯示唔到唔係discuz嘅bug而係mysql!
Description: At Wikipedia we have some data which contains high Unicode characters beyond the Basic Multilingual Plane (>65536). In UTF-8 encoding these take up 4 bytes; in UTF-16 they would be stored as a "surrogate pair" of two 16-bit pseudocharacters.
Currently MySQL's Unicode charset support doesn't seem to allow storing these characters in text-encoded fields when using UTF-8 to communicate to the server: * ucs2 stores four question marks "????" in place of the char * utf8 truncates the string at the point the char appears
Some quick testing indicates that I can store surrogate pseudocharacters if I explicitly code for them in pseudo-UTF-8, but this complicates communicating with the server.
How to repeat: Tested with PHP 5.1.0RC1: <?php mysql_connect("localhost", "unitest"); mysql_select_db("unitest"); mysql_query("SET NAMES utf8"); mysql_query("CREATE TABLE demo( wide VARCHAR(50) CHARACTER SET ucs2, utf VARCHAR(50) CHARACTER SET utf8, raw VARBINARY(50) )"); $char = "\xf0\xa8\xa7\x80"; mysql_query("INSERT INTO demo(wide,utf,raw) VALUES ('$char','$char','$char')"); $result = mysql_query("SELECT * FROM demo"); $row = mysql_fetch_array($result, MYSQL_ASSOC); foreach($row as $field => $val) { $match = ($val == $char) ? "OK" : "FAILED"; print "$field: $match ($val)\n"; } ?>
Outputs: wide: FAILED (????) utf: FAILED () raw: OK (
Suggested fix: A sufficient compromise for our purposes would be for the ucs2 charset to be enhanced (or a second utf16 charset made available) to do encoding and decoding of UTF-16 surrogate pairs when communicating with the server in UTF-8.
So if we send the UTF-8 string: "\xf0\xa8\xa7\x80" It should interpret it as the UTF-16 string: "\ud862\uddc0" And on select we should get back UTF-8: "\xf0\xa8\xa7\x80"
Continuing to use UCS-2 collation semantics would be good enough for what we need; this would allow us to use collatable Unicode text fields for page titles and usernames without rare but existing data getting corrupted. |