Windows下揮毫中州韻輸入法引擎 - 第2頁 - 粵拼、字碼及輸入法 Cantonese Computing - 粵語協會

佛振發表於 2010-8-30 13:35:11

夏曆八月初八之前，【小狼毫】計劃做以下改進：
＊首次安裝時，提示選擇要導入的輸入方案
＊粵拼支持聲調（感謝thhui朋友提供碼表）、簡拼
＊增設吳語上海話輸入方案（感謝上海閒話ABC先生提供碼表）

thhui 發表於 2010-8-30 20:00:44

我的劉式港拼
lsc-hk.txt
修正如下：-

# ZIME Schema v3
Schema = lsc-hk
DisplayName = 劉式港拼
Dict = jyutping
Parser = roman
AutoPrompt = yes
AutoDelimit = yes
Delimiter = [ ']
MaxKeywordLength = 6
FuzzyRule = aa$ a
FuzzyRule = a$ aa
SpellingRule = yu ue
SpellingRule = ui$ ooi
SpellingRule = un$ oon
SpellingRule = ut$ oot
SpellingRule = oe$ euh
SpellingRule = oeng$ eung
SpellingRule = oe eu
SpellingRule = eoi$ ui
SpellingRule = eon$ un
SpellingRule = eot$ ut
SpellingRule = o$ oh
SpellingRule = ou$ o
#SpellingRule = ^()w \1u
SpellingRule = ^j y
SpellingRule = ^z j
#AlternativeRule = ^j z
#AlternativeRule = ^()u() \1w\2
AlternativeRule = ^c ch
AlternativeRule = ^s sh
#FuzzyRule = ^l n
#FuzzyRule = ^n l
FuzzyRule = u$ oo
#SplitRule = $ ^
#SplitRule = $ ^
# punctuation
Punct = , ，
Punct = . 。
Punct = < 《〈
Punct = > 》〉
Punct = / ／
Punct = ? ？
Punct = ; ；
Punct = : ：
Punct = ' 『~』
Punct = " 「~」
Punct = \ 、
Punct = | ｜
Punct = ` ｀
Punct = ~ ～
Punct = ! ！
Punct = @ ＠
Punct = # ＃
Punct = % ％
Punct = $ ￥
Punct = ^ ……
Punct = & ＆
Punct = * ＊
Punct = ( （
Punct = ) ）
Punct = - －
Punct = _ ——
Punct = + ＋
Punct = = ＝
Punct = [ 「【［
Punct = ] 」】］
Punct = { 『｛
Punct = } 』｝
# edit keys
#EditKey = bracketleft Left
#EditKey = bracketright Right
EditKey = minus Up
EditKey = equal Down
EditKey = comma Page_Up
EditKey = period Page_Down
EditKey = I Up
EditKey = K Down
EditKey = J Left
EditKey = L Right
EditKey = U Page_Up
EditKey = O Page_Down
EditKey = H Home
EditKey = N End
EditKey = P Escape

自造一個
populate_db_lsc.bat 放在 c:\weasel\data\ 內
內容如下：-
call ..\env.bat
create-schema.py -v lsc-hk.txt
pause

就可以click this populate_db_lsc.bat 安裝劉式港拼方案了。

thhui 發表於 2010-8-30 20:43:06

有簡拼功能的
劉式港拼
lsc-hk.txt
修正如下：-

# ZIME Schema v3
Schema = lsc-hk
DisplayName = 劉式港拼
Dict = jyutping
Parser = roman
AutoPrompt = yes
AutoDelimit = yes
Delimiter = [ ']
MaxKeywordLength = 6
FuzzyRule = aa$ a
FuzzyRule = a$ aa
#SpellingRule = a a'
#SpellingRule = A a
#SpellingRule = ^jyut$ yuet
SpellingRule = yu ue
SpellingRule = ui$ ooi
SpellingRule = un$ oon
SpellingRule = ut$ oot
SpellingRule = oe$ euh
SpellingRule = oeng$ eung
SpellingRule = oe eu
SpellingRule = eoi$ ui
SpellingRule = eon$ un
SpellingRule = eot$ ut
SpellingRule = o$ oh
SpellingRule = ou$ o
#SpellingRule = ^()w \1u
SpellingRule = ^j y
SpellingRule = ^z j
#FuzzyRule = ^l n
#FuzzyRule = ^n l
FuzzyRule = u$ oo
# 簡拼
FuzzyRule = ^(ng).+$ \1
FuzzyRule = ^().+$ \1
# 兼容拼寫形式
#AlternativeRule = ^j z
#AlternativeRule = ^()u() \1w\2
AlternativeRule = ^c ch
AlternativeRule = ^s sh
#SplitRule = $ ^
#SplitRule = $ ^
# punctuation
Punct = , ，
Punct = . 。
Punct = < 《〈
Punct = > 》〉
Punct = / ／
Punct = ? ？
Punct = ; ；
Punct = : ：
Punct = ' 『~』
Punct = " 「~」
Punct = \ 、
Punct = | ｜
Punct = ` ｀
Punct = ~ ～
Punct = ! ！
Punct = @ ＠
Punct = # ＃
Punct = % ％
Punct = $ ￥
Punct = ^ ……
Punct = & ＆
Punct = * ＊
Punct = ( （
Punct = ) ）
Punct = - －
Punct = _ ——
Punct = + ＋
Punct = = ＝
Punct = [ 「【［
Punct = ] 」】］
Punct = { 『｛
Punct = } 』｝
# edit keys
#EditKey = bracketleft Left
#EditKey = bracketright Right
EditKey = minus Up
EditKey = equal Down
EditKey = comma Page_Up
EditKey = period Page_Down
EditKey = I Up
EditKey = K Down
EditKey = J Left
EditKey = L Right
EditKey = U Page_Up
EditKey = O Page_Down
EditKey = H Home
EditKey = N End
EditKey = P Escape

即是可以打第一個英文字母的詞語簡拼功能。

佛振發表於 2010-8-30 20:48:05

回覆 22# thhui

鼓勵DIY。
只是切記「中州韻」輸入法是GPL軟件，修改後的程序須保留協議文本、公開源代碼並自動成為自由軟件。
（核心算法及輸入方案源文件已在安裝包內，【小狼毫】Windows部份的源代碼將於正式發佈時公開）

thhui 發表於 2010-8-30 21:33:10

佛振兄，
可否將組詞的規則改成以下
ce2=p11+p12+p13+p21+p22+p23
ce3=p11+p12+p21+p22+p31+p32
ce4=p11+p12+p21+p31+p41+p42
ca5=p11+p12+p21+p31+p41+p51

ce2=雙字詞=第一字的第一碼+第一字的第二碼+第一字的第三碼+第二字的第一碼+第二字的第二碼+第二字的第三碼
ca5=五字及以下詞=第一字的第一碼+第一字的第二碼+第二字的第一碼+第三字的第一碼+第四字的第一碼+第五字的第一碼

佛振發表於 2010-8-30 22:21:57

本帖最後由佛振於 2010-8-30 22:36 編輯

回覆 25# thhui

請使用碼表式輸入法生成器實現這樣的編碼規則。
中州韻輸入法引擎現階段的技術方向是打破詞句的界線，做音碼智能整句輸入。若要取詞的尾碼，必然退化為不用智能組詞的單純碼表方式。因為假使用戶的輸入串對應到由若干詞組成的句，則編碼串中何為詞的尾碼斷難界定。又則對用戶來講，必須知道詞庫中編有哪些詞的編碼，才好分段取碼打字，哪還有一氣呵成連續輸入的暢快？:D

這類編碼規則最大的問題是，難以實現「字典中詞」與「用戶定義新詞」打法的一致性。用家卻如何知道要打的東西有沒有編在輸入法裡頭？
即使確定是字典中的詞，每次輸入他用戶還要現場按規則編碼一遍：耗費心神，竟不好用！

thhui 發表於 2010-8-31 15:21:23

本帖最後由 thhui 於 2010-8-31 19:00 編輯

其實以上的組詞簡碼只是為了縮短編碼，因為要打全碼有時真是比較費時。
這些簡碼可否在自訂碼表的 keywords or phrases 人手加上呢？

thhui 發表於 2010-8-31 19:04:55

本帖最後由 thhui 於 2010-8-31 20:03 編輯

另外，打完一個字，可否有9個候選最高頻的關聯字或詞？
這在windows 的傳統輸入法中是常見的，
但在linux 系統則沒有，
卻是深受大眾windows users 歡迎，
對不熟識輸入法輸入的人是很有幫助的。

xiss 發表於 2010-8-31 19:54:19

「小狼豪」正式發佈了嗎？哪裡可以下載？如何安裝的？看似很複雜的說，不過不怕，我想嘗鮮:D

thhui 發表於 2010-8-31 20:12:08

最新版本的小狼毫
weasel-preview-20100827.zip
比以前版本的安裝容易了不少。
install.bat very useful.

佛振發表於 2010-8-31 20:56:52

回覆 28# thhui

嘿嘿，還是碼表式輸入法的思路。聯想功能，以我的理解，是對不能連續輸入的一種補償措施。不過ZIME裡頭有一項功能沒有啟用，即候選詞的聯想（並非上屏後提供聯想選項，而是輸入前幾個音節就出現完整的詞語候選）。這個在Plume裡頭啟用了。桌面版反而未用。有他，有簡拼，就完全可以解決長詞編碼打全了不方便的問題。

佛振發表於 2010-8-31 20:59:42

回覆 29# xiss

虧我還把Xiss朋友加入 zime-devel 郵件組，原來我發的有關小狼毫的消息你並沒有讀過…… 那麼就關注ZIME主頁上的通知也好：http://code.google.com/p/zime/

xiss 發表於 2010-8-31 21:44:45

回覆xiss

虧我還把Xiss朋友加入 zime-devel 郵件組，原來我發的有關小狼毫的消息你並沒有讀過……...
佛振發表於 2010-8-31 20:59 http://bbs.cantonese.asia/images/common/back.gif
嘻嘻，不好意思，不是工作一族，不常查看郵件的，嘻嘻:$

xiss 發表於 2010-8-31 22:31:34

Then
uninstall weasel
by clicking xp_uninstall.bat
reintall weasel
by clicking xp_install.bat

Then the 9 options will be effective
thhui 發表於 2010-8-27 18:37 http://bbs.cantonese.asia/images/common/back.gif
其實，你都可以用中文表達呢啲意思㗎~~

佛振發表於 2010-9-1 00:20:17

我開始嘗試在新的安裝腳本中增加與用戶的交互，詢問操作系統版本（我不知如何取得），並讓用戶選擇安裝需要的輸入方案。此外再自動檢查是否安裝過Python及ZIME數據庫。

thhui 發表於 2010-9-1 14:44:53

本帖最後由 thhui 於 2010-9-1 19:36 編輯

佛振兄，
我發覺這個jyutping的keywords 只有繁體字，沒有簡體字。
是否要加上簡體字版。
我可給你繁簡已排序的粵拼碼表。(繁體優先及簡體優先)
請看你的gmail account.

thhui 發表於 2010-9-1 19:38:55

佛振兄，

我發覺速成不能像粵拼或劉式港拼般
生成首碼簡拼。
可否指教一下實行的方法。

例如：-
有限公司
kbnvcisr -> kncs

佛振發表於 2010-9-1 21:11:58

回覆 37# thhui

音節切分的問題。因為這輸入法不是直接用詞的編碼串做匹配，而是先做音節切分，然後再用音節代碼去查詢（為能支持各種形式的簡拼和模糊音）。輸入個 abcd，他是這麼切的：有個單字的編碼是ab，有個單字的編碼是cd，所以切成：ab | cd。
假使設定了簡拼的規則，多出一批單字母的編碼，那麼，切分的形式就不唯一了：
a | b | c | d
a | b | cd
a | bc | d
ab | c | d
ab | cd
按理都中，因為簡拼應當包含音節的全拼、簡拼混合的情況。
為了防止計算量過大，對於單個全拼音節可以拆分成若干音節編碼的組合的，默認不讓他拆開。
即：chang，只作 chang，不作 c | hang、ch | ang、chan | g 等。這符合人們的認知。
若需要拆開，如允許：xian --> xi | an，要設置 DevideRule，指定滿足哪種形式的音節，可以從中間拆開。

這個形碼的輸入解析方式table，卻又與羅馬字roman不同：連續輸入，長度達到最大碼長時，斷開一個碼。不足最大碼長的，必須用分隔符。因為形碼大多是固定碼長的。而且詞組的編碼有自己的規則。這些規則，與拼音輸入區別較大，不容易統一算法。所以說，對形碼是有限支持啦。

thhui 發表於 2010-9-1 21:44:02

對形碼可否另設divide-rule
就讓首碼為簡拼，另加一個詞語辨別鍵，例如z or ;
e.g.
有限公司
kb nv ci sr -> kncsz or kncs;

thhui 發表於 2010-9-1 22:02:32

回覆 35# 佛振

Run batch file checkver.bat

ver > .\ver.txt

You will findsomething like below in the ver.txt file

Microsoft Windows XP [版本 5.1.2600]

頁: 1 [2] 3 4 5 6

粵語協會's Archiver