fix vietnamese telex: qu-/gi- onset tone placement and dd+tone

transvi fixes:

1. qu-/gi- onset tone placement. The u after q, and the i after g when a
   vowel follows, are onset glides rather than the rime nucleus, so the
   tone must skip them: qua -> quá (was qúa), gia -> giá. The onset was
   previously passed straight through to the app, so transvi never saw it
   and toned the glide. Keep the onset in the preedit by adding qu-/gi-
   clusters to telex.map (mktelex.py onsets(), appended additively to the
   curated map), and add onsetglide() so transvi skips the glide. gi- with
   no following vowel keeps i as the nucleus (gì, gìn).

2. A tone key on a vowel-less preedit (e.g. "đ" from dd) now commits the
   preedit and lets the tone key pass through (eat=0), matching the engine
   commit-on-passthrough invariant, instead of eating it into the commit.

Verified against the running engine: qua/quan/quay/quê/quên/quyển,
gia/già/giàu/giữ/giúp/giống, gì/gìn, dd+s; unchanged mua->mùa, của, lúa;
all non-qu/gi words byte-identical to before.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-19 13:14:31 +09:00
parent aefe9bc618
commit a94d55c541
3 changed files with 951 additions and 3 deletions

View File

@@ -31,7 +31,10 @@ upper = str.maketrans(
def addtone(v, t):
return v.translate(tone[t])
entries = []
def emit(input, output):
entries.append((input, output))
print(f"{input}\t{output}")
def up(s):
c = s[0].translate(upper)
@@ -147,6 +150,24 @@ def final():
for t in tone:
emit(i+c+t, o.replace(v, addtone(v, t), 1)+c)
def onsets():
# Keep the qu-/gi- onset in the preedit so the tone lands on the rime
# nucleus, not on the onset glide (qua->quá not qúa, gia->giá not gía).
# transvi (onsetglide) knows to skip the glide; here we only need the
# composed clusters to exist so the preedit accumulates them.
vowels = set("aeiouy")
tones = set("sfrxj")
base = [(i, o) for (i, o) in list(entries)
if i and i[0] in vowels and not (set(i) & tones)]
for i, o in base:
if i[0] != 'u': # no qu+u syllable; u is the glide
emit("qu" + i, "qu" + o)
emit("gi" + i, "gi" + o)
# gi- with the i as nucleus (no following vowel): gì, gìn, gìm, ...
for c in ["", "c", "m", "n", "p", "t", "ch", "ng", "nh"]:
if c:
emit("gi" + c, "gi" + c)
vowel1()
vowel2()
vowel3()
@@ -157,3 +178,4 @@ tone1mod()
tone2mod()
escape()
final()
onsets()