[debiandoc-sgml-pkgs] Re: Debiandoc/zh-cn fix + UTF-8 modifications

Osamu Aoki osamu at debian.org
Thu Apr 12 14:38:40 UTC 2007


On Thu, Apr 12, 2007 at 03:24:50AM +0200, Danai SAE-HAN wrote:
> 
> Hi!
> 
> 1.
> zh_CN.GB2312 doesn't work because of one character: ä.
> The a with an umlaut doesn't exist in GB2312.
> So I'd like to ask to change "Esko Arajärvi" into "Esko Arajärvi"
> twice in zh-cn/append.sgml [qref].
> 
> I could do it myself, but I ought to go to bed now. =_=.zZ
> 
> 2.
> I've built qref with TeXLive2007 and it works perfectly.
> What we need to do know, is to find out which packages qref exactly
> needs from the TL packages.  "texlive-base-bin" and
> "texlive-latex-extra" look like obvious candidates, but how about
> other packages?  Perhaps texlive-lang-*?
> 
> 3.
> I have reencoded the zh_CN files into UTF-8, and it works (with TL2007
> + CJK4.7.0).  I have also made a few changes locally in my qref and
> debiandoc-sgml tree to allow zh_CN.UTF-8.  I could build all packages
> from qref without breaking anything; the resulting PDF and PS files
> compiled without problems.  All the fonts are embedded.
> 
> If you want, I could upload these changes to qref and debiandoc-sgml
> tomorrow, and if it works also for you, then I'll do the same for
> zh_TW and ja.
> 
> I'm not sure if 'charset' in tools/lib/Locale/{SG,XML}
> [debiandoc-sgml] should be changed or not.

I think you are in the right path but you need to be careful not to
break old behavior too.

'charset' in tools/lib/Locale/{SG,XML} uses traditional non-UTF-8
encodings.  If Japan, EUC-JP, If Wetern Europe, Latin-1, If Russia,
KOI-8.  

This is what we should do.

We convert all locale specific data to UTF-8 and use them as the base
data.  

We also make traditional non-UTF-8 encoding data at the pacjage build
time to make traditional behavior available.  

By using new script option (e.g. -u) or specifying full locale name with
".utf-8", this script should accept utf-8 encoded data.  Oh, html
generation code needs to be swichable too.

Another easier and safer approach is to create new UTF-8 version of
debiandoc-sgml (say, debiandoc-sgml-utf8 package conflicting with
debiandoc-sgml).  Simply use encoding change.  Fic html header and latex
code generation. This is more like what you are thinking.

Once you are successful, start filing all debiandoc-sgml depending
packages to start using new utf-8 version while converting source text
to UTF-8

I was thinking first option but that may be too complicated.  Your
thought may be good for migration since we still have old package for a
while.

Osamu




More information about the Debiandoc-sgml-pkgs mailing list