New tags for biology and medicine.

Thu Sep 6 11:30:22 UTC 2007

Dear all,

I was a bit lazily waiting for the conversation to settle before trying
to aswer :)

> +Tag: field::biology:bioinformatics
> +Tag: field::biology:molecular
> +Tag: field::biology:structural
> 
> This is probably a reasonable distinction, though we have to decide if
> we want such a fine-grained separation of the "field" facet. We would
> also end up with needing the same level of detail for electronics,
> chemistry, physics,...

I think that I would have a pragmatic approach : fine-graining as long
as there is a consensual demand. By this I mean that fine-graing a facet
should not become a hassle for the package maintainers who are not
interested in them. In the case of the Debian-Med project, I think that
each time we will propose such kind of tags it will mean that we have
people dedicated to screen all the parent tags and assign the
fine-grained if necessary.

(by the way, could there be a subscription mechanism to monitor addition
and removal of tags ?)

> +Tag: field::medicine:imaging

I support creating field::medicine:imaging, and using field::biology +
use::analysis and works-with::image instead of field::biology:imaging. I
think that as long as we do not package software such as microscope
control tools it would make sense. field::medicine:imaging, on the other
hand, already have candidate package whose usage is broader than just
taking and viewing pictures.

> +Tag: made-of::algorithm:dynamic-programming
> +Tag: made-of::algorithm:hashing
> +Tag: made-of::algorithm:hidden-markov-model
> +Tag: made-of::algorithm:neural-network

I like the idea, but I see that it is not consensual, and I think that
we did not reach a critical mass yet. I propose to postpone: let us keep
the proposition in debian-med's SVN, and see in a few monthes when we
have improved our coverage.

> +Tag: works-with::sequence
> +Description: Sequence
> + The program manipulates data made of a sequence of elements from a
> finite set.
> 
> Somehow this is different to the current tags in works-with, but I
> believe it could fit in. E.g. sorting applications could also fit in
> here?

I think that this is exactly the goal. Sometimes there is innovative
research which is done by taking tools for analysing genome sequence and
utilizing them on written language, or vice-versa. I would see this tag
with a high level of abstraction.

> +Tag: works-with::sequence:nuceleic
> +Description: Nucleic acids
> + Sequence of nucleic acids: DNA, RNA but also non-natural nucleic acids
> such as PNA or LNA.
> +
> +Tag: works-with::sequence:peptidic
> +Description: Proteins
> + Sequence of aminoacids: peptides and proteins.
> 
> Quite detailed, though otherwise, people proably won't pick
> works-with::sequence if searching for algorithms working on a DNA.

I made this proposition with the goal of having a lot of debian-med
packages which manipulate sequence. In that context, the biologist would
naturally want to distinguish between proteins and nucleic acids: this
is a very common distinction. But shall we wait before we have, say 50
packages wihich have field::biology and works-with::sequence?

> +Tag: works-with-format::plaintext:aln
> +Tag: works-with-format::plaintext:fasta
> +Tag: works-with-format::plaintext:nexus

This is definitely an area where there is an overlap between mime types
and tags. But I would definitely be excited if debtags could propose
toolchains which are connected by the formats they accept. Once again,
we do not have the critical mass yet...

> I am not sure it is a good idea to put those beneath "plaintext". There
> are the two cases: 
>      1. Someone searching for a tool for editing plaintext would end up
>         with the special purpose plaintext:aln editors, which IMO is
>         undesirable.
>      2. Someone searching for a special purpose plaintext:aln editor
>         could deduce from the tag name, that he could also use
>         plaintext, and if he knows that ALN is a plaintext format he
>         could navigate there smoothly (which assumes that the tags are
>         displayed in a hierarchical manner).
> 
> So the formats could as well be top level. Though this would mean
> cluttering the works-with-format facet. Could there be a
> works-with-format::special-purpose:* group?
> Do we need a way to express releationships beween tags like: show
> works-with-format::plaintext:aln only if field::biology or
> field::medicine is selected? Or do we want to cover this by requiring
> sophisticated UIs, which detect this in an automatic fashion.

I will bravely let you choose, as you know much better Debtags than I
do. I think that it could be useful to know that fasta, nexus, aln, ...
are plaintext format.

> +Tag: use::comparison:alignment
> +Description: Alignment
> + To identify similarities in two objects by maximising the overlap of
> identical parts.
> +
> +Tag: use::comparison:phylogeny
> +Description: Phylogenetic analysis
> + To infer lineage relationships.
> 
> Those seems to be covered by use analysis to me.

Alignment and phylogeny are very different (although one does often the
first before the second), and comparison is very broad. A researcher
will never search just for a comparison tool when looking for alignment
software or phylogenetic analysis software. Of course, looking at all
the pacakges tagged field::biology which have alignment in their
description can also do the job in most cases, so the need for these
tags depend on wether you feel like that tag searchs should be useable
independantly or complementarly to other tools. Pay attention that, in
the case of software having many functions such as EMBOSS, the keyword
may be absent from the description.

A few words of the proposals you made in another mail:

> * ::bioinformatics, ::molecular-biology, ::structural-biology
I would rather see field::biology:molecular than
field::biology:molecular-biology, but it is a matter of taste.
biology::molecular-biology:structural instead of
biology::structural(-biology) may horrify some of our colleagues, though.

>       * ::emboss
I strongly advocate suite::emboss we will get the critical mass for it.

In conclusion, about the possiblity to manage ourselves our sets of
tags. In the everyday work, one has a very narrow point of view of his
tools. I use a PCR machine to "make a PCR", I use a Pipetmanⓡ to
"pipette",... this could be expressed by biology::PCR, and
biology::pipetting. But if we think harder, we can have a higher point
of view. Instead of biology::PCR it would be use::amplification, or
use::diagnostic, for instance, because the PCR machine produces DNA, but
sometimes we want to keep it as a reagent, and some other times we just
want to see its size and then we throw it away.

So the questions I am wondering about are :

 - What is the most powerful approach ?
 - What is the expectations of our users ?
 - How can we interest our users in an unexpeced and powerful usage of
   the DebTags ?

Biological research is very conservative in the tools it uses, and some
software have an enormous advantage on others just because of the oral
tradition. Standard usage of DebTags will help us to show alternative
software to our userbase - just look at how much clustalw, non-free, is
still more popular than programs which are either faster, more precise,
or both:

http://people.debian.org/~igloo/popcon-graphs/index.php?packages=hmmer+boxshade+clustalw+clustalx+seaview+muscle+t-coffee+sim4+sibsim4+arb+dialign+kalign+probcons+wise+amap-align+poa&show_installed=on&want_legend=on&from_date=&to_date=&hlght_date=&date_fmt=&beenhere=1

I think that an advanced usage of Debtags is the only way to bring
attention of users and ourselves to programs which we do not expect to
be relevant to their fields. This is why I am pushing a bit for more
fine-grained tags in mutliple official facets, rather than a private
biology:: facet in which we will reproduce the idiosyncrasies of our
disciplines...

Many thanks for your patience with our long mails :)

-- 
Charles Plessy
http://charles.plessy.org
Wako, Saitama, Japan