[debiandoc-sgml-pkgs] [Fwd: Re: UTF-8 transition (debiandoc-sgml) and request for help with LaTeX (CJK)]

"Danai SAE-HAN (韓達耐)" danai.sae-han at edpnet.be
Sat Aug 25 23:11:28 UTC 2007


[Resending]

-------- Original Message --------
Subject: Re: UTF-8 transition (debiandoc-sgml) and request for help with	LaTeX
(CJK)
Date: Fri, 17 Aug 2007 02:39:29 +0200
From: "Danai SAE-HAN (韓達耐)" <danai.sae-han at edpnet.be>
To: Osamu Aoki <osamu at debian.org>
References: <20070811070653.GA16465 at debian.org> <46BD8D7B.2030506 at edpnet.be>
<20070811111450.GA24216 at debian.org>

[Sorry for this messy email; I'm a bit tired.]

I've checked DD-SGML vesion 1.2.4, and here's what I recommend to change in
order to get ja_JP.UTF-8 working.

In /tools/lib/Locale/ja_JP.UTF-8/LaTeX, the following changes need to be made:

## ----------------------------------------------------------------------
%locale = (
           'babel' => '',
           'inputenc' => '',
           'abstract' => '概要',
           'copyright notice' => '著作権表示',
           'before begin document' => '\\usepackage{CJKutf8}
\\usepackage[CJK, overlap]{ruby}
\\renewcommand{\rubysep}{-0.2ex}',

You could add [T1] in \usepackage[T1]{CJKutf8}; it is the same as
\usepacage[T1]{fontenc}; so this line is actually two \usepackage commands in
one.  But since debiandoc-sgml already provides a fontenc line, the [T1] isn't
necessary anymore.

The line after it allows ruby text, furigana.  Pretty cool, I'd say, but I'm
not sure if you that's available in SGML.  I think that for XML you need to
load an extra Ruby module, but I'm not sure.  You could leave the two lines
out if you don't intend to support Ruby tags in DD-SGML.

\rubysep (re)defines the space between the kanji and the furigana above.


Then:

           'after begin document' => '\\begin{CJK*}{UTF8}{min}

{CJK*} instead of {CJK} will make sure that the Japanese text contains no
spaces between the kanji.  Use * if the core of the text is a Chinese,
Japanese or Korean text; it's much prettier this way.  To get spaces, e.g.
when you have an English word in the middle of a sentence, you can use the
tilde (~) to get a space.

Example: ...韓達耐~Han Danai~韓達耐...

To get a non-breakable space, the original use of ~ by TeX, use \nbs.

When you have a block of English text, you could switch it off with
\CJKnospace.  When the next Japanese paragraph starts again, use \CJKspace to
activate it again.


When you use CJKutf8, you don't need [dnp] to get the Wadalab fonts.


For PDF hyperreferences, use:
  \usepackage[unicode]{hyperref}
if you want to use "latex+dvipdfmx".

Or use this line if you want to use "pdflatex" instead:
  \usepackage[pdftex,unicode]{hyperref}

I'm not sure if the [pdftex] option is necessary.  But I don't see why you
have this \ifpdf clause, where you only use [unicode] for PDF output.

And here are a few interesting options you can set with hyperref:

% Just a few test strings I found on the net.
\hypersetup{pdfauthor={李果正 Edward G.J. Lee},
            pdftitle={中文 PDF outline 測試},
            pdfsubject={Title},
            a4paper=true,
            colorlinks=true}

To get translations for things like Part, Chapter, Section and the TOC, add:
           'after begin document' => '\\begin{CJK*}{UTF8}{min}
\\CJKcaption{ja}

There's one catch though: they only work with the KOMA scripts.  It's a script
that I would really recommend, because it makes life so much easer to create
LaTeX documents.

The CJKcaptions that exist (leave out the suffix .cpx):
Bg5.cpx (zh_TW.Big5)
GB.cpx (zh_CN.GB2312)
JIS.cpx (ja_JP.EUCJP)
ja.cpx (ja_JP.UTF-8)
zh-Hans.cpx (zh_TW.UTF-8)
zh-Hant.cpx (zh_CN.UTF-8)

Other files (not of any use in the current DD-SGML build):
hangul2.cpx
hangul.cpx
hanja.cpx
SJIS.cpx
ko-Hang2.cpx (UTF-8)
ko-Hang.cpx (UTF-8)
ko-Hani.cpx (UTF-8)

Of course, the implementation of KOMA and the CJKcaptions is totally up to the
DD-SGML developers.


I'm afraid that ko_KR.UTF-8 will have to wait, because I haven't yet figured
out how to get all the Korean fonts in Unicode using CJK.  Most build alright,
but a few show some problems.

WRT zh_*.UTF-8, I think I can manage to get it working by this weekend.


Cheerio



-- 
Danai SAE-HAN (韓達耐)
--
題目:《春居雜興》二首
作者:王禹稱偁(954-1001)

            其一

兩株桃杏映籬斜,妝點商山副使家,
何事春風容不得,和鶯吹折數枝花。

            其二

春云如獸復如禽,日照風吹淺又深。
誰道無心便容與,亦同翻覆小人心。

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 252 bytes
Desc: OpenPGP digital signature
Url : http://lists.alioth.debian.org/pipermail/debiandoc-sgml-pkgs/attachments/20070826/111a1660/attachment.pgp 


More information about the Debiandoc-sgml-pkgs mailing list