[Pkg-shadow-devel] Bug#333993: 'man faillog' typo: "occured"

Fri Nov 4 20:12:52 UTC 2005

On Tue, 18 Oct 2005 00:35:33 +0200
Nicolas François <nicolas.francois at centraliens.net> wrote:

> Given the results, I'm prety sure it is ready for a public release.
> Maybe you should contact the debian-i18n at lists.debian.org list. Someone(s)
> are already developping a spellchecker for the PO (used in translation) in
> the various languages.

Very glad to know it's well received, but it'll be a little while
yet before I split the code up for public use.  By split, I mean
abstract it into useful parts, as with the "software tools" school of
programming. 

> Your checker could be important to translate the man pages in a correct
> english.

That's one of the reasons for it.  It's funny, because I sometimes feel
as though it were too petty to submit a one-word bug, but when
translation is factored in, that one word is multiplied many times;
e.g. some poor translator whose second language is English may have to
waste minutes verifying that "resursively" isn't a word and should have
been "recursively".

> > (**for example, it needs a way to know what typos have already been
> > reported, otherwise the BTS could be clogged with redundant reports.)
> 
> This is "just" an infrastructure issue.

Typo bugs have an interesting property that's unlike most other bugs:
they're unambiguous -- every report must include the typo, its
location, and its correction.  Even using the BTS we now have, it should
be possible to search for any given typo bug.  It wouldn't be an
efficient search, but it needs no infrastructure, and until such an
infrastructure existed would be better than nothing.

> I can give you an example of what is done in the Debian French translation
> team:
>   * A robot checks which files need to be translated
>   * Another robot checks what has been done so far (a mailing list, with
>     special tags is used for that), and whether the patch reported on the
>     BTS was applied or not. It also permits to assign files to
>     translators.
> 
> I'm pretty sure we can setup such an infrastructure for the spellchecking
> of the man pages (e.g. on Alioth).
> And you could get help from other peoples in this HUGE task.
> (Unless you are trying to be the first bug submitter ;)

If I can help by making a tool or two, that'd be good.  By "HUGE task"
you mean translation, right?  I'm not sure spellchecking is so huge, or
rather maybe it needn't be once the proper tools exist -- but I
agree that brute force (manual proofreading) is a huge task.

BTW, lately I've been thinking about translation, and had this crazy
idea for machine aided translation of man pages; (which may be old and
rejected for all I know -- I'm not an expert).  The idea, or ideas, as
they come to mind:

1) Technical docs are an UNUSUAL type of language because they don't
require any ambiguity.  All terms and their relations should be
unambiguous.  So provided we know the "one meaning" of a tech noun, it
can be given an exact translation, when one exists.  The same should be
true of any unambiguous grammatical relation.

2) Computers are (currently) lousy at ambiguity anyway.

3) Make up an ad hoc unambiguous extra detailed tech oriented
meta-language, have humans translate a given source text into this
meta-language, and have the computer translate that to other human
languages, which could then be refined by human translators.

What might such a meta-language look like?  I'm thinking that if it
were based on English, (it needn't be), we'd start with a source
line like:

	"My dog's name is Fido, he likes people."

...and get something like, (only better):

	"My(possesive pronoun=A Costa, of next noun 'dog') dog(noun, 5
Webster)'s(possesive of next noun 'name') name(noun, 3 Webster) is
(present tense verb, 2 Webster) Fido(proper noun, leave alone), he
(pronoun masculine=Fido) likes(present tense plural verb, 2 Webster)
people(plural noun, 6 Webster)."

Every first-order pronoun would be identified with its noun.  Any word
with more than one sense would be attached to a specific definition in
a big dictionary; in the above case "dog(noun, 5 Webster)" would mean
"look in Webster's under 'dog', the noun definition, (not the verb), and
sense #5 is what was meant."  (Note: I don't know if sense #5 in any
Webster's is what's needed, it's a fake example.)

Obviously just translating it to a meta-language would be like five
times harder than plain old translation.  What is the gain if it's
more work than the old way?   The gain ought to be that the computer
would then have an unambiguous text it could work with from there, so
the time spent translating "upstream" to the meta language would save
much more time translating a text "downstream" to a real language.

So it first translates the grammar, (I'm assuming that's possible, or
that we reserve this technique only for languages where it's possible,
and that there are enough such languages to make it worthwhile), then
it translates each unambiguous term.  When it turns out that, say a
French dictionary has no equivalent for "dog(noun, 5 Webster)", it
prompts the human translator for one, adds it to the dictionary, and
now it "knows" how to translate it.  When it turns out the destination
language itself has no equivalent, the translator can highlight the
term somehow (italics, quotes, or whatnot), or attempt to coin their
own term.

Fringe benefit:  if the meta language was logical enough, algorithms to
simplify needlessly complex expressions might be used on a poorly written
redundant sentence, and have it translate it back to the source
language better than the original.

> I'm definitely still interrested;)
> And it could be interresting to make it speak French.

Thanks for the feedback and interest!  I'll certainly keep you posted
when there's something fit to post.