[buildd-tools-devel] Bug#834736: Bug#834736: sbuild: Use basic format for ISO 8601 timestamps (for build logs filenames)

Johannes Schauer josch at debian.org
Fri Aug 19 11:50:49 UTC 2016


Hi,

Quoting Santiago Vila (2016-08-19 11:20:45)
> retitle 834736 sbuild: Being able to choose format for build log filenames
> thanks

if you use the Control pseudo header, then you don't have to also CC
control at bugs.debian.org but you would just write:

Control: retitle -1 sbuild: Being able to choose format for build log filenames

as the first line of your mail body.

> Fine, but I said AWK. It would be something like this:
> 
>   date = substr(stamp,1,4) substr(stamp,6,2) substr(stamp,9,2)
>   time = substr(stamp,12,2) substr(stamp,15,2) substr(stamp,18,2)

Indeed, this is more complex than the code to parse the other time format. But
here is the trade-off. What weighs worse:

 - all people who parse build log file names in awk or shell to have a slightly
   worse time (awk people have to write a few characters more code and shell
   people will have slow execution time when they process thousands of build
   logs)

 - all people who use sbuild having to cope with yet another piece of
   configurable complexity that they might not need

> I have thousands of build logs. If I have to invoke the "date" command for
> each one of them, the whole thing is going to be very slow.

Indeed invoking date 38000 times takes 18 seconds on my machine.

> Anyway, I'm changing the title to something which I believe it is more
> generic and realistic.
> 
> Suppose that I have more than 38000 build logs already (this is really
> the truth), and the new logs that are generated have a different
> format. Suppose that for convenience I would like to keep using the
> old format so that I don't have a mix of old and new formats.

I hereby offer you to write a script that converts your collection of existing
build logs using the old format into the new format. Would that help you?

> How would I modify sbuild for that?
> 
> By looking at the source, it seems that this is decided in line 2 of
> open_build_log function in lib/Sbuild/Build.pm, which reads like this:
> 
>   my $date = strftime_c "%FT%TZ", gmtime($self->get('Pkg Start Time'));

That is the one, yes.

> So: Instead of harcoding the timestamp format, would not be possible to just
> make "%FT%TZ" the default value for a variable that could be changed in
> .sbuildrc?

Possible, yes. But every addition knob that can be turned via a configuration
file entry has the cost of adding overall complexity. The question is where one
stops. Indeed one could make *everything* configurable but where is the limit?
The question here is which weighs more, the cost of the complexity or the
advantages of having that additional complexity.

> > [...] Why does the colon have to be escaped?
> 
> You are right, it does not have to be escaped, I was just confused
> about the fact that bash escapes it when I use tab for completing
> the filename.

I suspect it might do that to not accidentally turn it into executing the ':'
"command"?

> > > In either case: Nobody asked for a way to specify the filename format in
> > > a flexible way?
> > 
> > Correct, nobody did.
> 
> Well, maybe because the previous format was ok for almost everybody.
> Now that you changed it, I guess somebody had to be the first one to ask for
> this to be customizable.

That is all fair and that's why I talk with you about the trade-offs.

One could also argue: why would sbuild add complexity to itself for the sake of
consumers that use a programming language that lacks an strptime functionality
(like shell or awk apparently do)? As you described it, the problem is limited
to these languages. Then the question becomes, how much worse is the life of a
awk programmer for example to add a bit of extra code than the life of
everybody using sbuild by having yet another configuration setting in their
~/.subildrc? And could the consumer of the build logs not rather switch to
using a language that is better suited for the task of parsing timestamps?

I can imagine that you are not the only one consuming the filenames of build
logs, but do they all use awk or shell? For example even you could just resort
to perl and write:

   perl -MTime::Piece -ne 'chomp; print Time::Piece->strptime($_, "%Y-%m-%dT%H:%M:%SZ")."\n"'

And that would parse 38000 of these strings in 0.3 seconds. Or you could do it
in Python and write:

   python3 -c 'import datetime,sys;print([datetime.datetime.strptime(line, "%Y-%m-%dT%H:%M:%SZ\n") for line in sys.stdin])'

And that would parse your 38000 strings in 0.5 seconds.

So what I need from you is an argument that shows me that adding this feature
gives a great enough benefit that outweighs the cost of additional complexity
in sbuild. As far as I see it (and I might be wrong so please correct me if I
am), this change only makes a very tiny number of people unhappy and even the
unhappy people have workarounds (like yours for awk) that are trivial to
implement. Are your few characters more in your awk script worth such a change
for sbuild that we will have to keep maintaining for years?

What do you think?

Thanks!

cheers, josch
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: signature
URL: <http://lists.alioth.debian.org/pipermail/buildd-tools-devel/attachments/20160819/d4fb6dd4/attachment-0001.sig>


More information about the Buildd-tools-devel mailing list