Bug#700914: Index process error in morphstr() function

YunQiang Su wzssyqa at gmail.com
Tue Feb 19 09:15:03 UTC 2013


Package: wordnet
Severity: serious

Forwarded from launchpad by Sundaram Ramaswamy
LP: #305407

---------------------------------------------------------------

Hi,
I am working on Wordnet for a particular project. I installed Wordnet
in Ubuntu via Synaptic, the latest package to date. Tried searching
"automata" in the Wordnet Browser (bash-command: wnb), it returned 0
results, while the installed Wordnet in Windows (installer from
Wordnet's site) shows a couple of definitions for "automata". In fact,
the latest version of Wordnet for Windows is just 2.1 while Linux's is
3.0.

Bascially, Wordnet's function morphstr() is supposed to give the root
words for a given inflected word. For example, when "knifes" is given
to morphstr, it returns "knife". Likewise for "axes" it should return
"ax", "axe" and "axis". It first searches an exceptions list file
(because of peculiar cases like axes), when it has an entry in it, it
returns the file's results. If not found in the list, it tries to
predict the root. While the prediction part (e.g. knifes) works fine
in Ubuntu, the search from file part doesn't (e.g. axes, automata,
etc.)

When I compared the source code of Wordnet (morph.c of Windows and
Linux), its the same for both the OSs (they have just used
preprocessor switches for the differences). This needs to be fixed
from our side, since Wordnet's source code doesn't have any
errors/diffs, as the same code is present on both the OSs. The Windows
installer was packaged by Wordnet guys themselves, while the deb was
packaged from their source by someone of Ubuntu/Deb guys, I guess.

PS: When I wrote my own code, and tried using morphstr(), I could spot
the error with Ubuntu's packaged wordnet.lib. The problem is that,
morphstr takes two args; 1: inflected word, 2: POS (Part of Speech -
NOUN, VERB, etc.) E,g. morphstr("knifes", NOUN); will return "knifes"
using the prediction technique (works right in Ubuntu). When I call
morphstr("automata", NOUN) it returns NULL but when I call
morphstr("automata", NOUN - 1); it returns "automata". Likewise, for
any word, which has an exception in the exception list file, when we
pass the actual POS value minus 1, we get the proper values. It has
some array indexing issue, I believe. The reason why Wordnet Browser
doesn't show "automata"'s definitions in Linux is that morphstr() when
called with proper POS value returns NULL, while in Windows, it
returns correct values for the same set of arguments, so Wordnet
Browser in Windows shows it.


-------------------------------------------------------------
I notice that 51_overflow.patch modify the index while it is not
processed correctly
and it is also not needed.

The attacement  is the new 51_overflow.patch with some hooks droped.
It works well now.

--
YunQiang
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 51_overflows.patch
Type: application/octet-stream
Size: 22147 bytes
Desc: not available
URL: <http://lists.alioth.debian.org/pipermail/debian-science-maintainers/attachments/20130219/9b51911d/attachment-0001.obj>


More information about the debian-science-maintainers mailing list