依然顯示唔到超大字庫集嘅生僻字

余OK · 發表於 2007-10-7 21:41:34

　　寫帖時若果有生僻字，發帖後嗰隻字後邊所有內容都會顯示唔到，睇吓有冇辦法解決？

Ultra · 發表於 2007-10-8 00:02:01

下面咪有解决辦法囉，不過紛繁復雜又危險啫，免費嘅嘢就係噉，冇辦法。

http://bbs.cantonese.asia/viewthread.php?tid=7978&extra=page%3D1

huang · 發表於 2007-10-8 22:00:20

我對論壇呢個軟件唔熟，唔知將個字體改為宋體再加海峰嘅大字集唔知得唔得？？

芬1012 · 發表於 2007-10-10 00:30:12

希望微軟早日出粵文版操作系統啦。

huang · 發表於 2007-12-9 18:06:56

原帖由 Ultra 於 2007-10-8 00:02 發表下面咪有解决辦法囉，不過紛繁復雜又危險啫，免費嘅嘢就係噉，冇辦法。 http://bbs.cantonese.asia/viewthread.php?tid=7978&extra=page%3D1

唔支漢典論壇係唔係用呢種方法？？漢典同樣係 Discuz! 6.0.0

如果佢哋有其它辦法的話可以去話詢問一下！！

huang · 發表於 2007-12-14 12:32:14

呢個係喺 http://bugs.mysql.com/bug.php?id=14052 mysql官方網上睇到嘅由於我雞腸唔多掂，所以各位睇下係唔係有辦法解決論壇顯示唔到 unicode漢字？？有人話顯示唔到唔係discuz嘅bug而係mysql！

Description:
At Wikipedia we have some data which contains high Unicode characters beyond the Basic
Multilingual Plane (>65536). In UTF-8 encoding these take up 4 bytes; in UTF-16 they would
be stored as a "surrogate pair" of two 16-bit pseudocharacters.

Currently MySQL's Unicode charset support doesn't seem to allow storing these characters
in text-encoded fields when using UTF-8 to communicate to the server:
* ucs2 stores four question marks "????" in place of the char
* utf8 truncates the string at the point the char appears

Some quick testing indicates that I can store surrogate pseudocharacters if I explicitly
code for them in pseudo-UTF-8, but this complicates communicating with the server.

How to repeat:
Tested with PHP 5.1.0RC1:
<?php
mysql_connect("localhost", "unitest");
mysql_select_db("unitest");
mysql_query("SET NAMES utf8");
mysql_query("CREATE TABLE demo(
    wide VARCHAR(50) CHARACTER SET ucs2,
    utf VARCHAR(50) CHARACTER SET utf8,
    raw VARBINARY(50)
)");
$char = "\xf0\xa8\xa7\x80";
mysql_query("INSERT INTO demo(wide,utf,raw) VALUES ('$char','$char','$char')");
$result = mysql_query("SELECT * FROM demo");
$row = mysql_fetch_array($result, MYSQL_ASSOC);
foreach($row as $field => $val) {
    $match = ($val == $char) ? "OK" : "FAILED";
    print "$field: $match ($val)\n";
}
?>

Outputs:
wide: FAILED (????)
utf: FAILED ()
raw: OK (

Suggested fix:
A sufficient compromise for our purposes would be for the ucs2 charset to be enhanced (or
a second utf16 charset made available) to do encoding and decoding of UTF-16 surrogate
pairs when communicating with the server in UTF-8.

So if we send the UTF-8 string: "\xf0\xa8\xa7\x80"
It should interpret it as the UTF-16 string: "\ud862\uddc0"
And on select we should get back UTF-8: "\xf0\xa8\xa7\x80"

Continuing to use UCS-2 collation semantics would be good enough for what we need; this
would allow us to use collatable Unicode text fields for page titles and usernames without
rare but existing data getting corrupted.

		自動登錄	找回密碼
密碼			註冊