[Tux4kids-tuxtype-dev] Localising tux type to Telugu regarding

Thu Aug 30 04:52:48 UTC 2012

On Wed, Aug 29, 2012 at 9:42 PM, David Bruce wrote:
> instead of immediately trying to look up the correct Unicode
> character when a keypress is received, the keypress gets sent to the IM
> state machine, which may or may not emit a Unicode character, depending on
> whether a valid sequence has been completed.  The IM state machine functions
> kind of like a telephone receptionist who listens to the phone until a
> complete message has been received, and then passes the completed message
> (the desired Unicode char) on to the rest of the program.  At least that's
> how I understand it.

David is correct, except there is an additional functionality to
return an intermediary Unicode character(s) after each keystroke.
That way the user sees that their keystroke is producing a response
from the software.  The IM state machine knows how many characters are
in the intermediary state, and tells the caller how many characters to
replace with the new character (or redraw.)

> For Telagu, we need to create a IM character map for that language.  I
> don't have any specific knowledge of how to do this.  The character maps in
> tuxtype's source tree were just copied from Tux Paint.

The basic format of each line in a *.im file is:

    <unicode_in_hex>    <character_sequence>    -

For example, if you want Unicode 0x308F to be generated by the
character sequence "wa", then you want this line:

    308F    wa    -

These lines form the states of the state machine - the character
sequences are the transitions, and the Unicode is the output of the
state.  If you want the output to be more than a single Unicode
character, you can separate multiple Unicode characters with a colon.
For example,

    30C3:30AD:30E3    kkya    -

... generates three Unicode characters when the character sequence
"kkya" is typed.

Some languages require more than one state machine.  Japanese, for
example, has two state machines, one for the Hiragana keyboard layout
and another for the Katakana layout.  For this reason, we provide
multiple sections in the *.im file, one per state machine.  Each
section begins with this single line:

    section

But how do you switch between the state machines when there are
multiple state machines?  This is where the language-specific event
handler comes in.  The IM event handler for Japanese (the im_event_ja
function in im.c), for example, switches between Hiragana, Katakana,
and English keyboard layout each time the Right-Alt button is pressed.
 It keeps an internal variable to remember which layout it is in,
toggles to the next one each time Right-Alt button is pressed, prints
the current language to inform the user, etc.  Also, it calls the
first state machine when it's in the Hiragana layout, or calls the
second state machine when in the Katakana layout, or just prints the
key as-is when in English layout.  The IM event handler is something
you'll need to write for each new language you add because it's so
specific to each language; I recommend studying how the Japanese code
since it's the simplest yet still makes use of the state machine.

You may have noticed the hyphen in the above examples.  The hyphen is
an extra argument returned from the state machine to the IM event
handler.  Its meaning depends on how you write the event handler.
It's something I needed for Korean but I don't see it being too useful
in other languages; but it's there if you need it.

Best,
Mark