Performance of find methods, comparison with linear search

Sylvain Bougerel sylvain.bougerel.devel at gmail.com
Wed Nov 21 01:48:42 UTC 2007


forgot the attachement...

On Nov 21, 2007 9:47 AM, Sylvain Bougerel
<sylvain.bougerel.devel at gmail.com> wrote:
> On Nov 20, 2007 9:59 PM, Paul Harris <paulharris at computer.org> wrote:
> > attached
> >
> >
> > On Nov 20, 2007 9:38 PM, Renaud Detry <renaudjdetry at airpost.net> wrote:
> > > > Then I moved
> > > >  triplet nearest = random_points[0];
> > > > to above the for loop, and added a
> > > >   std::cout << "PCH: " << nearest << std::endl;
> > > > below the for loop... I got this result
> > > >
>
> Hi,
>
> Trying Renaud's original version I also get:
>  - snip -
>  <Stopwatch> kdtree [lap_iterative_linear_find_nearest]: 0s (0 clock ticks)
>  lap_iterative_linear_find_nearest = [ lap_iterative_linear_find_nearest 0 ];
>  <Stopwatch> kdtree [lap_recursive_linear]: 112.57s (112570000 clock ticks)
>  lap_recursive_linear = [ lap_recursive_linear 1.1257e+08 ];
>
> That's an amazing result, and when I try Paul's version I get (full log):
> N: 100000
> N_PROBES: 10000
> Range: 2.15443
> N = [ N 100000 ];
> N_PROBES = [ N_PROBES 10000 ];
> <Stopwatch> kdtree [lap_building_vectors]: 0.02s (20000 clock ticks)
> lap_building_vectors = [ lap_building_vectors 20000 ];
> <Stopwatch> kdtree [lap_building_tree]: 0.21s (210000 clock ticks)
> lap_building_tree = [ lap_building_tree 210000 ];
> <Stopwatch> kdtree [lap_find_within_range]: 0.35s (350000 clock ticks)
> lap_find_within_range = [ lap_find_within_range 350000 ];
> Mean count 7
> <Stopwatch> kdtree [lap_count_within_range]: 0.35s (350000 clock ticks)
> lap_count_within_range = [ lap_count_within_range 350000 ];
> <Stopwatch> kdtree [lap_find_nearest]: 14.53s (14530000 clock ticks)
> lap_find_nearest = [ lap_find_nearest 1.453e+07 ];
> # comparisons: 32907492
> <Stopwatch> kdtree [lap_optimizing]: 1.06s (1060000 clock ticks)
> lap_optimizing = [ lap_optimizing 1.06e+06 ];
> <Stopwatch> kdtree [lap_find_within_range_opt]: 0.23s (230000 clock ticks)
> lap_find_within_range_opt = [ lap_find_within_range_opt 230000 ];
> Mean count 7
> <Stopwatch> kdtree [lap_count_within_range_opt]: 0.23s (230000 clock ticks)
> lap_count_within_range_opt = [ lap_count_within_range_opt 230000 ];
> <Stopwatch> kdtree [lap_find_nearest_opt]: 13.81s (13810000 clock ticks)
> lap_find_nearest_opt = [ lap_find_nearest_opt 1.381e+07 ];
> # comparisons: 31189607
> <Stopwatch> kdtree [lap_iterative_linear_find_within_range]: 20.09s
> (20090000 clock ticks)
> lap_iterative_linear_find_within_range = [
> lap_iterative_linear_find_within_range 2.009e+07 ];
> PCH: 50880000
> <Stopwatch> kdtree [lap_iterative_linear_find_nearest]: 19.94s
> (19940000 clock ticks)
> lap_iterative_linear_find_nearest = [
> lap_iterative_linear_find_nearest 1.994e+07 ];
> PCH: 70820000
> PCH: (26.0063,39.9494,46.6781)
> # comparisons: 1000000000
> <Stopwatch> kdtree [lap_recursive_linear]: 109.1s (109100000 clock ticks)
> lap_recursive_linear = [ lap_recursive_linear 1.091e+08 ];
>
> Wow, it went from 0.0s to 19.94s for the
> lap_iterative_linear_find_nearest. This difference in the loop is:
>
> @@ -208,13 +212,17 @@
>    }
>    sw.lap("iterative_linear_find_within_range");
>
> -  triplet nearest = random_points[0];
> +
> +  std::cout << "PCH: " << clock() << std::endl;
> + triplet nearest = random_points[0];
> + int num_com_linear = 0;
>    for (int i = 0; i < N_PROBES; ++i)
>    {
>      for (int j = 0; j < N; ++j)
>      {
>        double d = distance(random_probes[i], random_points[j]);
> +      ++num_com_linear;
>        if (d < mindist)
>        {
>          mindist = d;
>
> I was surprised and I tried again without the counter...
>
> @@ -208,13 +212,17 @@
>    }
>    sw.lap("iterative_linear_find_within_range");
>
> + // std::cout << "PCH: " << clock() << std::endl;
>    triplet nearest = random_points[0];
>    int num_com_linear = 0;
>    for (int i = 0; i < N_PROBES; ++i)
>    {
>      for (int j = 0; j < N; ++j)
>      {
>        double d = distance(random_probes[i], random_points[j]);
> +      //++num_com_linear;
>        if (d < mindist)
>        {
>          mindist = d;
>
> And I still get the same results (around 19s). So I don't know how I
> can get these 0.0s at first. I'm puzzled.
>
>
>
> Contrary to Paul, I do not believe that kdtree find_nearest is
> "normally" slow because of memory lookup. We do not allocate many
> things in this example. "triplet" only weight 12 octect of memory...
> 10000 of them does not represent much memory space.
>
> To prove my point, I designed another performance test (in
> attachement). This one purely test the performance of the tree. I ran
> it with "valgrind --tool=cachegrind". Below is the output without
> valgrind:
>
> [sylvain at sapiens examples]$ ./perf_kdtree 100000 1000
> For 100000 triplets and 1000 tries:
> find_nearest()      takes 1.84
> find_within_range() takes 22.48
> optimize()          takes 1.55
> find_nearest()      takes 2.88
> find_within_range() takes 23.47
>
> Where it showes again that after optimisation, it takes longer to get
> an average find_nearest() and find_within_range() response. This is
> clearly a fault in our algorithm! It should not be the case! Now the
> output of valgrind:
>
> [sylvain at sapiens examples]$ valgrind --tool=cachegrind ./perf_kdtree 100000 1000
> ==7511== Cachegrind, an I1/D1/L2 cache profiler.
> ==7511== Copyright (C) 2002-2007, and GNU GPL'd, by Nicholas Nethercote et al.
> ==7511== Using LibVEX rev 1732, a library for dynamic binary translation.
> ==7511== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
> ==7511== Using valgrind-3.2.3, a dynamic binary instrumentation framework.
> ==7511== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al.
> ==7511== For more details, rerun with: -v
> ==7511==
> For 100000 triplets and 1000 tries:
> find_nearest()      takes 66.14
> find_within_range() takes 1439.35
> optimize()          takes 117.32
> find_nearest()      takes 183.93
> find_within_range() takes 1411.65
> ==7511==
> ==7511== I   refs:      83,179,497,189
> ==7511== I1  misses:             1,893
> ==7511== L2i misses:             1,885
> ==7511== I1  miss rate:           0.00%
> ==7511== L2i miss rate:           0.00%
> ==7511==
> ==7511== D   refs:      57,363,357,908  (33,633,704,194 rd + 23,729,653,714 wr)
> ==7511== D1  misses:       109,279,903  (   107,674,718 rd +      1,605,185 wr)
> ==7511== L2d misses:        83,157,268  (    82,998,770 rd +        158,498 wr)
> ==7511== D1  miss rate:            0.1% (           0.3%   +            0.0%  )
> ==7511== L2d miss rate:            0.1% (           0.2%   +            0.0%  )
> ==7511==
> ==7511== L2 refs:          109,281,796  (   107,676,611 rd +      1,605,185 wr)
> ==7511== L2 misses:         83,159,153  (    83,000,655 rd +        158,498 wr)
> ==7511== L2 miss rate:             0.0% (           0.0%   +            0.0%  )
>
>
> If you've never seen a valgrind output before, what it means is that
> there is simply NO significant hopping in memory. In other words, when
> you build your tree in a block, like in the precedent test, hopping in
> memory will not happen. Therefore, the long execution time we get for
> find_nearest() and find_within_range() may be due to an error in the
> algorithm.
>
>
> That's when I started to look deeper at find_nearest() and I realized
> the following:
> 1. The algorithm is recursive. Recursive is slower than iterative, so
> iterative would improve
>     performance.
> 2. The algorithm seems to behave this way:
>        at each node, compare the hyperrectangle formed by the center
> point and the node value
>        with the current best hyperrectange. Then if the current best
> hyperrectangle intersect with
>        the left node and the right node value, continue in each node.
>     I think it's better to get the approximate best hyperrectangle
> during iterative deepening
>     search like when you want to insert the point (if it was an
> insertion). Now walk up the tree
>     from that point on and go into the node that intersect the HYPERSPHERE only.
>     This is where the current algorithm looses efficiency IMHO.
>
> Please comment on this.
>
> During the weekend I will try to reimplement find_nearest taking these
> two points into consideration. I'll send you the result over the week
> end.
>
>
> Sylvain
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: perf_kdtree.cpp
Type: text/x-c++src
Size: 4028 bytes
Desc: not available
Url : http://lists.alioth.debian.org/pipermail/libkdtree-devel/attachments/20071121/91edf987/attachment.cpp 


More information about the libkdtree-devel mailing list