[Daca-general] scan-build and metrics gsoc proposals and DACA

Thu Mar 21 09:23:59 UTC 2013

Hi Zack,

On 18 March 2013 11:01, Stefano Zacchiroli <zack at debian.org> wrote:
> On Sun, Mar 17, 2013 at 06:46:38PM +0100, Raphael Geissert wrote:
[...]
> There is one piece of the puzzle missing, which I've discussed with
> Sylvestre. At IRILL (http://www.irill.org) I work closely with
> Coccinelle authors (http://coccinelle.lip6.fr/) and we are going to have
> a student working this "summer" (May->July) on periodic Coccinelle runs
> on all the C code found in Debian.
>
> That's our main interest, but I'm myself interested in having something
> more organic to easily plug in other static analysis tool. Hence DACA
> comes to mind. We are quite flexible, we can go from a minimal setup
> where we only run Coccinelle on a local source mirror, but I would very
> much prefer proper integration in a more suitable framework. In addition
> to that, we are also going to need a sort of source.debian.org service,
> that does syntax highlighting for multiple languages, to be used as a
> cross-reference service to pinpoint errors to specific lines of code. I
> hope to keep this as a separate piece of the puzzle, and to offer an API
> that would allow to attach "pop-up" messages via some Javascript
> hackery. We also aim to produce output compatible with firehose
> (https://github.com/fedora-static-analysis/firehose), but that too is
> just a small piece of the puzzle.
>
> The most important part is clearly the infrastructure. We're considering
> DACA and I've looked at the list archives just a few days ago. Honestly,
> it didn't seem to me that DACA was much alive, but I wanted to check
> with you. At this point in time, do you still consider DACA architecture
> the right one and worth to be invested upon? Or do you rather think
> something else should be designed at this point?

I've mentioned already a few times that the current design is not
scalable. There was not much of a design as the idea was first to take
a look at what we could get from running some tools and then publish
them. Firehose would solve one piece of the puzzle if it actually
accomplishes what it says it can accomplish.

> In the former case, we'll be happy to direct our efforts toward DACA
> integration. We also have quite some computing power to offer, in case
> that's welcome to run DACA jobs. For context reasons we need to develop
> in Python, but gearman-Python integration doesn't seem to be a problem
> (I'm no gearmen's expert, but that's what a quick search reveals).

Yes, that's one of the advantages of gearman: you can easily provide
any function in a wide variety of languages.
Note, however, that the gearman-based approach was just a local
development and that in order to setup something like that on d.o
infrastructure there would probably be quite a bit of work to be done.

> Regarding my metrics proposal, that's something I'd like to keep
> separate, as I'd hope it could be use to graph much more than "only"
> statistics gathered from static analysis.

I know, but the point was exactly that. A more generic infrastructure
would allow you to accomplish all that in a distributed manner without
duplicating the core infrastructure.
Hence my suggestion of something based on Hadoop & friends. They
should be able to solve some of the architecture problems and just
leave us with the issue of implementing what we want to do: whether it
is running static analysers, dynamic analysers, data extraction or
gathering tools, and any sort of post-processing we would need.

Cheers,
-- 
Raphael Geissert - Debian Developer
www.debian.org - get.debian.net