[med-svn] [r-cran-fastcluster] 02/04: New upstream version 1.1.24

Andreas Tille tille at debian.org
Tue Oct 10 09:42:48 UTC 2017


This is an automated email from the git hooks/post-receive script.

tille pushed a commit to branch master
in repository r-cran-fastcluster.

commit 7dad9b45075f169ed049ac22811216d7f6d7da49
Author: Andreas Tille <tille at debian.org>
Date:   Tue Oct 10 11:24:45 2017 +0200

    New upstream version 1.1.24
---
 DESCRIPTION                       |  11 +++--
 INSTALL                           |   5 ++-
 LICENSE                           |   4 +-
 MD5                               |  38 ++++++++---------
 NEWS                              |  17 +++++++-
 R/fastcluster.R                   |   9 ++--
 README                            |   5 ++-
 build/vignette.rds                | Bin 206 -> 206 bytes
 inst/doc/fastcluster.Rtex         |  16 ++++++-
 inst/doc/fastcluster.pdf          | Bin 115699 -> 117210 bytes
 src/fastcluster.cpp               |   5 ++-
 src/fastcluster_R.cpp             |  68 ++++++++++++++++++++---------
 src/python/fastcluster.py         |  15 ++++---
 src/python/fastcluster_python.cpp |   5 ++-
 src/python/setup.py               |  87 ++++++++++++++++++++++++++------------
 src/python/tests/nantest.py       |   6 ++-
 src/python/tests/test.py          |   6 ++-
 src/python/tests/vectortest.py    |   6 ++-
 tests/test_fastcluster.R          |   5 ++-
 vignettes/fastcluster.Rtex        |  16 ++++++-
 20 files changed, 223 insertions(+), 101 deletions(-)

diff --git a/DESCRIPTION b/DESCRIPTION
index e13fd99..1eed5c6 100644
--- a/DESCRIPTION
+++ b/DESCRIPTION
@@ -1,10 +1,13 @@
 Package: fastcluster
 Encoding: UTF-8
 Type: Package
-Version: 1.1.22
-Date: 2016-12-06
+Version: 1.1.24
+Date: 2017-08-14
 Title: Fast Hierarchical Clustering Routines for R and Python
 Authors at R: person("Daniel", "Müllner", email = "daniel at danifold.net", role = c("aut", "cph", "cre"))
+Copyright: Until package version 1.1.23: © 2011 Daniel Müllner
+        <http://danifold.net>. All changes from version 1.1.24 on: ©
+        Google Inc. <http://google.com>.
 Enhances: stats, flashClust
 Depends: R (>= 3.0.0)
 Description: This is a two-in-one package which provides interfaces to
@@ -20,8 +23,8 @@ Description: This is a two-in-one package which provides interfaces to
 License: FreeBSD | GPL-2 | file LICENSE
 URL: http://danifold.net/fastcluster.html
 NeedsCompilation: yes
-Packaged: 2016-12-08 21:53:33 UTC; muellner
+Packaged: 2017-08-20 21:16:52 UTC; muellner
 Author: Daniel Müllner [aut, cph, cre]
 Maintainer: Daniel Müllner <daniel at danifold.net>
 Repository: CRAN
-Date/Publication: 2016-12-09 08:58:55
+Date/Publication: 2017-08-21 10:36:48 UTC
diff --git a/INSTALL b/INSTALL
index b5b3b12..6ec2861 100644
--- a/INSTALL
+++ b/INSTALL
@@ -1,7 +1,8 @@
 fastcluster: Fast hierarchical clustering routines for R and Python
 
-Copyright © 2011 Daniel Müllner
-<http://danifold.net>
+Copyright:
+  * Until package version 1.1.23: © 2011 Daniel Müllner <http://danifold.net>
+  * All changes from version 1.1.24 on: © Google Inc. <http://google.com>
 
 
 Installation
diff --git a/LICENSE b/LICENSE
index 00e38cb..d2a750f 100644
--- a/LICENSE
+++ b/LICENSE
@@ -1,4 +1,6 @@
-Copyright © 2011, Daniel Müllner
+Copyright:
+  * Until package version 1.1.23: © 2011 Daniel Müllner <http://danifold.net>
+  * All changes from version 1.1.24 on: © Google Inc. <http://google.com>
 All rights reserved.
 
 Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
diff --git a/MD5 b/MD5
index 9e910da..f3ae076 100644
--- a/MD5
+++ b/MD5
@@ -1,28 +1,28 @@
-265c61458ac650adb0f404d7a7bdd25d *DESCRIPTION
-9a38ac493894f68adbc5dee13675861d *INSTALL
-59e6750f727695ec043303b3a9f37fa4 *LICENSE
+c003d3dcbd395ef849e3af680c14ea04 *DESCRIPTION
+f42049b61f5700e04db55fea48c3f172 *INSTALL
+f4abec074fd2a5f5df26d4ea11206493 *LICENSE
 da8e9d68585993250a9c29c3e9bff50b *NAMESPACE
-316558259c1fa33949fd200ec52e9ce5 *NEWS
-5fd601c6a56b9625b79593f71463d50c *R/fastcluster.R
-50c4d271555e475a4561b10f60f3e667 *README
-cddee6a56d5faa6bb111a27fe8961c6e *build/vignette.rds
+4be4155d9678f4ddfb76071de259a2ac *NEWS
+e17871d8f0d7650d3d588c7d4fd7cdc0 *R/fastcluster.R
+e1e421b365b092b958761b9aa4542751 *README
+787b94de9a3092c7dd7763d3a5e64414 *build/vignette.rds
 459081fd7078ab4eadf2e3ce7e45bab1 *inst/CITATION
-286973e7961f4ae6bbc108d6600e1f1f *inst/doc/fastcluster.Rtex
-7d5b215b4eb4130a1dd0bca81134d80b *inst/doc/fastcluster.pdf
+1c31e2352078833f8d2f664fa4d92222 *inst/doc/fastcluster.Rtex
+504632b4c6500b0994acaf07ec3bd865 *inst/doc/fastcluster.pdf
 3eed5fa276cbf58077d5304bb8ed0eb7 *man/fastcluster.Rd
 14abdf33b799d6d48057f19d1974a6bc *man/hclust.Rd
 a6ca386b8617952d163ef83abb8b6819 *man/hclust.vector.Rd
 97bb0f9bf046e498c47423129fc3691a *src/Makevars
 7b8a328733afe582986d5292e9c91278 *src/Makevars.win
-4ed03c1958f69282a6651054055fe72e *src/fastcluster.cpp
-b8aa08a74bc7b0a67d1c658fe707320f *src/fastcluster_R.cpp
-01a3be172cb49cbc2b9a75cd8d693db6 *src/python/fastcluster.py
-a4d720df2ebd0af3c66b0bcf4b3849f1 *src/python/fastcluster_python.cpp
-c744cda507fdf4795ad2e168c8a5dc9f *src/python/setup.py
+60cb0a90da9ab22ad5871ae71b434a2f *src/fastcluster.cpp
+08eeb0c1683b6dea8fbb7840c7aaf2f1 *src/fastcluster_R.cpp
+3206c9ffac28920af50a7d8049ee302b *src/python/fastcluster.py
+b72007eb0c73f20ef901f84bdd32f1ae *src/python/fastcluster_python.cpp
+2d4ab7ae984ecc57fe6448a6bb2f83d1 *src/python/setup.py
 0553f404a601c5830f33d7f2216c7530 *src/python/tests/__init__.py
-649566b0471a200d87f456009a9c38bb *src/python/tests/nantest.py
-9e43116e64bd2b19ebfe91ab62855c44 *src/python/tests/test.py
-3aeb0aee74de797c6969b7521ec37081 *src/python/tests/vectortest.py
-7dba0c8af8d88099a7898cf95b049fe9 *tests/test_fastcluster.R
+68604314cc18b0aa691934edd94eebff *src/python/tests/nantest.py
+c8c9a929ee8a22b8219e376de6677020 *src/python/tests/test.py
+3b1cf8f33d62292394f1ae56b37b2022 *src/python/tests/vectortest.py
+7862ca89f826da64aedcc585521795c4 *tests/test_fastcluster.R
 9cbb544a7574e9d55aed550e5f3608a4 *vignettes/Makefile
-286973e7961f4ae6bbc108d6600e1f1f *vignettes/fastcluster.Rtex
+1c31e2352078833f8d2f664fa4d92222 *vignettes/fastcluster.Rtex
diff --git a/NEWS b/NEWS
index 56e3156..cc1bb9c 100644
--- a/NEWS
+++ b/NEWS
@@ -1,7 +1,8 @@
 fastcluster: Fast hierarchical clustering routines for R and Python
 
-Copyright © 2011 Daniel Müllner
-<http://danifold.net>
+Copyright:
+  • Until package version 1.1.23: © 2011 Daniel Müllner <http://danifold.net>
+  • All changes from version 1.1.24 on: © Google Inc. <http://google.com>
 
 
 Version history
@@ -185,3 +186,15 @@ Version 1.1.22, 06/12/2016
 
   • No fenv header usage if software floating-point emulation is used (bug
     report: NaN test failed on Debian armel).
+
+Version 1.1.23, 03/24/2017
+
+  • setup.py: Late NumPy import for better dependency management.
+
+Version 1.1.24, 08/04/2017
+
+ • R 3.5 corrects the formula for the “Canberra” metric. See
+   https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17285.
+   The formula in the fastcluster package was changed accordingly. This
+   concerns only the R interface. SciPy and fastcluster's Python interface
+   always had the correct formula.
diff --git a/R/fastcluster.R b/R/fastcluster.R
index b1cfafc..a4d0862 100644
--- a/R/fastcluster.R
+++ b/R/fastcluster.R
@@ -1,7 +1,8 @@
 #  fastcluster: Fast hierarchical clustering routines for R and Python
 #
-#  Copyright © 2011 Daniel Müllner
-#  <http://danifold.net>
+#  Copyright:
+#    * Until package version 1.1.23: © 2011 Daniel Müllner <http://danifold.net>
+#    * All changes from version 1.1.24 on: © Google Inc. <http://google.com>
 
 hclust <- function(d, method="complete", members=NULL)
 {
@@ -42,10 +43,12 @@ hclust.vector <- function(X, method='single', members=NULL, metric='euclidean',
   METRICS <- c("euclidean", "maximum", "manhattan", "canberra", "binary",
                "minkowski")
   metric = pmatch(metric, METRICS)
-  if (is.na(metric))
+  if (is.na(metric) || metric > 6)
     stop("Invalid metric.")
   if (metric == -1)
     stop("Ambiguous metric.")
+  if (metric == 4 && getRversion() < "3.5.0")
+    metric <- as.integer(7) # special metric code for backwards compatibility
 
   if (methodidx!=1 && metric!=1)
     stop("The Euclidean methods 'ward', 'centroid' and 'median' require the 'euclidean' metric.")
diff --git a/README b/README
index b8f2f79..987d0f7 100644
--- a/README
+++ b/README
@@ -1,7 +1,8 @@
 fastcluster: Fast hierarchical clustering routines for R and Python
 
-Copyright © 2011 Daniel Müllner
-<http://danifold.net>
+Copyright:
+  * Until package version 1.1.23: © 2011 Daniel Müllner <http://danifold.net>
+  * All changes from version 1.1.24 on: © Google Inc. <http://google.com>
 
 The fastcluster package is a C++ library for hierarchical, agglomerative
 clustering. It efficiently implements the seven most widely used clustering
diff --git a/build/vignette.rds b/build/vignette.rds
index 28c75fd..bddf3a0 100644
Binary files a/build/vignette.rds and b/build/vignette.rds differ
diff --git a/inst/doc/fastcluster.Rtex b/inst/doc/fastcluster.Rtex
index f3bcb18..47fe152 100644
--- a/inst/doc/fastcluster.Rtex
+++ b/inst/doc/fastcluster.Rtex
@@ -1,4 +1,4 @@
-\def\fastclusterversion{1.1.22}
+\def\fastclusterversion{1.1.24}
 \documentclass[fontsize=10pt,paper=letter,BCOR=-6mm]{scrartcl}
 \usepackage[utf8]{inputenc}
 \usepackage{lmodern}
@@ -89,7 +89,7 @@
 %\VignetteIndexEntry{User's manual}
 \title{The \textit{fastcluster} package: User's manual}
 \author{\href{http://danifold.net}{Daniel Müllner}}
-\date{December 6, 2016}
+\date{August 14, 2017}
 \subtitle{Version \fastclusterversion}
 \maketitle
 
@@ -295,6 +295,18 @@ is equivalent to
 \end{quote}
 but uses less memory and is equally fast. Ties may be resolved differently, ie.\ if two pairs of nodes have equal, minimal dissimilarity values at some point, in the specific computer's representation for floating point numbers, either pair may be chosen for the next merging step in the dendrogram.
 
+Note that the formula for the \textit{\q canberra\q} metric changed in R 3.5.0: Before R version 3.5.0, the \textit{\q canberra\q} metric was computed as
+\[
+ d(u,v) = \sum_j\frac{|u_j-v_j|}{|u_j+v_j|}.
+\]
+Starting with R version 3.5.0, the formula was corrected to
+\[
+ d(u,v) = \sum_j\frac{|u_j-v_j|}{|u_j|+|v_j|}.
+\]
+Summands with $u_j=v_j=0$ always contribute 0 to the sum. The second, newer formula equals SciPy's definition.
+
+The fastcluster package detects the R version at runtime and chooses the formula accordingly, so that fastcluster and the \href{http://stat.ethz.ch/R-manual/R-patched/library/stats/html/dist.html}{\texttt{dist}} method always use the same formula for a given R version.
+
 If \textit{method} is one of \textit{\q centroid\q}, \textit{\q median\q}, \textit{\q ward\q}, clustering is performed with respect to Euclidean distances. In this case, the parameter \textit{metric} must be \textit{\q euclidean\q}. Notice that \texttt{hclust.vector} operates on Euclidean distances for compatibility reasons with the \dist{} method, while \hyperref[hclust]{\texttt{hclust}} assumes \textbf{squared} Euclidean distances for compatibility with the \href{http://stat.ethz.ch/R- [...]
 \phantomsection\label{squared}
 \begin{quote}
diff --git a/inst/doc/fastcluster.pdf b/inst/doc/fastcluster.pdf
index 55c8f3c..611f6af 100644
Binary files a/inst/doc/fastcluster.pdf and b/inst/doc/fastcluster.pdf differ
diff --git a/src/fastcluster.cpp b/src/fastcluster.cpp
index 8834bc2..586d228 100644
--- a/src/fastcluster.cpp
+++ b/src/fastcluster.cpp
@@ -1,8 +1,9 @@
 /*
   fastcluster: Fast hierarchical clustering routines for R and Python
 
-  Copyright © 2011 Daniel Müllner
-  <http://danifold.net>
+  Copyright:
+    * Until package version 1.1.23: © 2011 Daniel Müllner <http://danifold.net>
+    * All changes from version 1.1.24 on: © Google Inc. <http://google.com>
 
   This library implements various fast algorithms for hierarchical,
   agglomerative clustering methods:
diff --git a/src/fastcluster_R.cpp b/src/fastcluster_R.cpp
index 2863357..394085e 100644
--- a/src/fastcluster_R.cpp
+++ b/src/fastcluster_R.cpp
@@ -1,8 +1,9 @@
 /*
   fastcluster: Fast hierarchical clustering routines for R and Python
 
-  Copyright © 2011 Daniel Müllner
-  <http://danifold.net>
+  Copyright:
+    * Until package version 1.1.23: © 2011 Daniel Müllner <http://danifold.net>
+    * All changes from version 1.1.24 on: © Google Inc. <http://google.com>
 */
 #if __GNUC__ > 4 || (__GNUC__ == 4 && (__GNUC_MINOR__ >= 6))
 #define HAVE_DIAGNOSTIC 1
@@ -173,12 +174,13 @@ void generate_R_dendrogram(int * const merge, double * const height, int * const
 */
 
 enum {
-  METRIC_R_EUCLIDEAN = 0,
-  METRIC_R_MAXIMUM   = 1,
-  METRIC_R_MANHATTAN = 2,
-  METRIC_R_CANBERRA  = 3,
-  METRIC_R_BINARY    = 4,
-  METRIC_R_MINKOWSKI = 5
+  METRIC_R_EUCLIDEAN     = 0,
+  METRIC_R_MAXIMUM       = 1,
+  METRIC_R_MANHATTAN     = 2,
+  METRIC_R_CANBERRA      = 3,
+  METRIC_R_BINARY        = 4,
+  METRIC_R_MINKOWSKI     = 5,
+  METRIC_R_CANBERRA_OLD  = 6
 };
 
 #if HAVE_DIAGNOSTIC
@@ -248,6 +250,9 @@ public:
         distfn = &R_dissimilarity::minkowski;
         postprocessfn = &cluster_result::power;
         break;
+      case METRIC_R_CANBERRA_OLD:
+        distfn = &R_dissimilarity::canberra_old;
+        break;
       default:
         throw std::runtime_error(std::string("Invalid method."));
       }
@@ -475,23 +480,46 @@ private:
     double * p2 = x+i2*nc;
     for(j = 0 ; j < nc ; ++j) {
       if(both_non_NA(*p1, *p2)) {
+        sum = std::abs(*p1) + std::abs(*p2);
+        diff = std::abs(*p1 - *p2);
+        if (sum > DBL_MIN || diff > DBL_MIN) {
+          dev = diff/sum;
+          if(!ISNAN(dev) ||
+             (!R_FINITE(diff) && diff == sum &&
+              /* use Inf = lim x -> oo */ (dev = 1., true))) {
+            dist += dev;
+            ++count;
+          }
+        }
+      }
+      ++p1;
+      ++p2;
+    }
+    if(count == 0) return NA_REAL;
+    if(count != nc) dist /= (static_cast<double>(count)/static_cast<double>(nc));
+    return dist;
+  }
+
+  double canberra_old(t_index i1, t_index i2) const {
+    double dev, dist, sum, diff;
+    int count, j;
+
+    count = 0;
+    dist = 0;
+    double * p1 = x+i1*nc;
+    double * p2 = x+i2*nc;
+    for(j = 0 ; j < nc ; ++j) {
+      if(both_non_NA(*p1, *p2)) {
         sum = std::abs(*p1 + *p2);
         diff = std::abs(*p1 - *p2);
         if (sum > DBL_MIN || diff > DBL_MIN) {
           dev = diff/sum;
-#if HAVE_DIAGNOSTIC
-#pragma GCC diagnostic push
-#pragma GCC diagnostic ignored "-Wfloat-equal"
-#endif
           if(!ISNAN(dev) ||
              (!R_FINITE(diff) && diff == sum &&
-              /* use Inf = lim x -> oo */ (dev = 1.))) {
+              /* use Inf = lim x -> oo */ (dev = 1., true))) {
             dist += dev;
             ++count;
           }
-#if HAVE_DIAGNOSTIC
-#pragma GCC diagnostic pop
-#endif
         }
       }
       ++p1;
@@ -759,7 +787,7 @@ extern "C" {
       if (!IS_INTEGER(metric_) || LENGTH(metric_)!=1)
         Rf_error("'metric' must be a single integer.");
       int metric = INTEGER_VALUE(metric_) - 1; // index-0 based;
-      if (metric<0 || metric>5 ||
+      if (metric<0 || metric>6 ||
           (method!=METHOD_VECTOR_SINGLE && metric!=0) ) {
         Rf_error("Invalid metric index.");
       }
@@ -920,14 +948,16 @@ extern "C" {
 #if HAVE_VISIBILITY
 #pragma GCC visibility push(default)
 #endif
-  void R_init_fastcluster(DllInfo * const info)
+  void R_init_fastcluster(DllInfo * const dll)
   {
     R_CallMethodDef callMethods[]  = {
       {"fastcluster", (DL_FUNC) &fastcluster, 4},
       {"fastcluster_vector", (DL_FUNC) &fastcluster_vector, 5},
       {NULL, NULL, 0}
     };
-    R_registerRoutines(info, NULL, callMethods, NULL, NULL);
+    R_registerRoutines(dll, NULL, callMethods, NULL, NULL);
+    R_useDynamicSymbols(dll, FALSE);
+    R_forceSymbols(dll, TRUE);
   }
 #if HAVE_VISIBILITY
 #pragma GCC visibility pop
diff --git a/src/python/fastcluster.py b/src/python/fastcluster.py
index a218e9d..cad2df0 100644
--- a/src/python/fastcluster.py
+++ b/src/python/fastcluster.py
@@ -1,8 +1,9 @@
 # -*- coding: utf-8 -*-
 __doc__ = """Fast hierarchical clustering routines for R and Python
 
-Copyright © 2011 Daniel Müllner
-<http://danifold.net>
+Copyright:
+Until package version 1.1.23: © 2011 Daniel Müllner <http://danifold.net>
+All changes from version 1.1.24 on: © Google Inc. <http://google.com>
 
 This module provides fast hierarchical clustering routines. The "linkage"
 method is designed to provide a replacement for the “linkage” function and
@@ -19,7 +20,7 @@ also be obtained at <http://danifold.net/fastcluster.html>.
 """
 
 __all__ = ['single', 'complete', 'average', 'weighted', 'ward', 'centroid', 'median', 'linkage', 'linkage_vector']
-__version_info__ = ('1', '1', '22')
+__version_info__ = ('1', '1', '24')
 __version__ = '.'.join(__version_info__)
 
 from numpy import double, empty, array, ndarray, var, cov, dot, bool, \
@@ -80,7 +81,7 @@ mthidx = {'single'   : 0,
           'median'   : 6 }
 
 def linkage(X, method='single', metric='euclidean', preserve_input=True):
-    '''Hierarchical, agglomerative clustering on a dissimilarity matrix or on
+    r'''Hierarchical, agglomerative clustering on a dissimilarity matrix or on
 Euclidean data.
 
 Apart from the argument 'preserve_input', the method has the same input
@@ -234,8 +235,8 @@ and simply ignores the mask.'''
         NN = len(X)
         N = int(ceil(sqrt(NN*2)))
         if (N*(N-1)//2) != NN:
-            raise ValueError('The length of the condensed distance matrix '
-                             'must be (k \choose 2) for k data points!')
+            raise ValueError(r'The length of the condensed distance matrix '
+                             r'must be (k \choose 2) for k data points!')
     else:
         assert X.ndim==2
         N = len(X)
@@ -274,7 +275,7 @@ booleanmetrics = ('yule', 'matching', 'dice', 'kulsinski', 'rogerstanimoto',
                   'sokalmichener', 'russellrao', 'sokalsneath', 'kulsinski')
 
 def linkage_vector(X, method='single', metric='euclidean', extraarg=None):
-    '''Hierarchical (agglomerative) clustering on Euclidean data.
+    r'''Hierarchical (agglomerative) clustering on Euclidean data.
 
 Compared to the 'linkage' method, 'linkage_vector' uses a memory-saving
 algorithm. While the linkage method requires Θ(N^2) memory for
diff --git a/src/python/fastcluster_python.cpp b/src/python/fastcluster_python.cpp
index 9c08a3d..9e378da 100644
--- a/src/python/fastcluster_python.cpp
+++ b/src/python/fastcluster_python.cpp
@@ -1,8 +1,9 @@
 /*
   fastcluster: Fast hierarchical clustering routines for R and Python
 
-  Copyright © 2011 Daniel Müllner
-  <http://danifold.net>
+  Copyright:
+    * Until package version 1.1.23: © 2011 Daniel Müllner <http://danifold.net>
+    * All changes from version 1.1.24 on: © Google Inc. <http://google.com>
 */
 
 // for INT32_MAX in fastcluster.cpp
diff --git a/src/python/setup.py b/src/python/setup.py
index c65a427..8361dbc 100644
--- a/src/python/setup.py
+++ b/src/python/setup.py
@@ -2,7 +2,11 @@
 # -*- coding: utf-8 -*-
 import os
 import sys
-import numpy
+
+#import distutils.debug
+#distutils.debug.DEBUG = 'yes'
+from setuptools import setup, Extension
+
 if sys.hexversion < 0x03000000: # uniform unicode handling for both Python 2.x and 3.x
     def u(x):
         return x.decode('utf-8')
@@ -16,21 +20,37 @@ else:
 u('''
   fastcluster: Fast hierarchical clustering routines for R and Python
 
-  Copyright © 2011 Daniel Müllner
-  <http://danifold.net>
+  Copyright:
+    * Until package version 1.1.23: © 2011 Daniel Müllner <http://danifold.net>
+    * All changes from version 1.1.24 on: © Google Inc. <http://google.com>
 ''')
-#import distutils.debug
-#distutils.debug.DEBUG = 'yes'
-from setuptools import setup, Extension
 
 with textfileopen('fastcluster.py') as f:
     for line in f:
-        if line.find('__version_info__ =')==0:
+        if line.find('__version_info__ =') == 0:
             version = '.'.join(line.split("'")[1:-1:2])
             break
 
 print('Version: ' + version)
 
+
+def get_include_dirs():
+    """ Avoid importing numpy until here, so that users can run "setup.py install"
+    without having numpy installed yet. """
+    def is_special_command():
+        special_list = ('--help-commands',
+                        'egg_info',
+                        '--version',
+                        'clean')
+        return ('--help' in sys.argv[1:] or
+                sys.argv[1] in special_list)
+
+    if len(sys.argv) >= 2 and is_special_command():
+        return []
+
+    import numpy
+    return [numpy.get_include()]
+
 setup(name='fastcluster',
       version=version,
       py_modules=['fastcluster'],
@@ -62,18 +82,31 @@ page <http://www.lfd.uci.edu/~gohlke/pythonlibs/#fastcluster>`_.
 from now on. If some years from now there have not been any updates, this
 does not necessarily mean that the package is unmaintained but maybe it just
 was not necessary to correct anything. Of course, please still report potential
-bugs and incompatibilities to daniel at danifold.net.**
+bugs and incompatibilities to daniel at danifold.net. You may also use**
+`my GitHub repository <https://github.com/dmuellner/fastcluster/>`_
+**for bug reports, pull requests etc.**
+
+Note that PyPI and my GitHub repository host the source code for the Python
+interface only. The archive with both the R and the Python interface is
+available on `CRAN
+<https://CRAN.R-project.org/package=fastcluster>`_ and the
+GitHub repository `“cran/fastcluster”
+<https://github.com/cran/fastcluster>`_. Even though I appear as the author also
+of this second GitHub repository, this is just an automatic, read-only mirror
+of the CRAN archive, so please do not attempt to report bugs or contact me via
+this repository.
 
 Reference: Daniel Müllner, *fastcluster: Fast Hierarchical, Agglomerative
 Clustering Routines for R and Python*, Journal of Statistical Software, **53**
 (2013), no. 9, 1–18, http://www.jstatsoft.org/v53/i09/.
 """),
       requires=['numpy'],
+      install_requires=["numpy>=1.9"],
       provides=['fastcluster'],
       ext_modules=[Extension('_fastcluster',
                              ['fastcluster_python.cpp'],
-                             extra_compile_args=['/EHsc'] if os.name=='nt' else [],
-                             include_dirs=[numpy.get_include()],
+                             extra_compile_args=['/EHsc'] if os.name == 'nt' else [],
+                             include_dirs=get_include_dirs(),
 # Feel free to uncomment the line below if you use the GCC.
 # This switches to more aggressive optimization and turns
 # more warning switches on. No warning should appear in
@@ -92,23 +125,25 @@ Clustering Routines for R and Python*, Journal of Statistical Software, **53**
 # Linker optimization
 #extra_link_args=['-Wl,--strip-all'],
       )],
-      keywords=['dendrogram', 'linkage', 'cluster', 'agglomerative', 'hierarchical', 'hierarchy', 'ward'],
+      keywords=['dendrogram', 'linkage', 'cluster', 'agglomerative',
+                'hierarchical', 'hierarchy', 'ward'],
       author=u("Daniel Müllner"),
       author_email="daniel at danifold.net",
       license="BSD <http://opensource.org/licenses/BSD-2-Clause>",
-      classifiers = ["Topic :: Scientific/Engineering :: Information Analysis",
-                     "Topic :: Scientific/Engineering :: Artificial Intelligence",
-                     "Topic :: Scientific/Engineering :: Bio-Informatics",
-                     "Topic :: Scientific/Engineering :: Mathematics",
-                     "Programming Language :: Python",
-                     "Programming Language :: Python :: 2",
-                     "Programming Language :: Python :: 3",
-                     "Programming Language :: C++",
-                     "Operating System :: OS Independent",
-                     "License :: OSI Approved :: BSD License",
-                     "License :: OSI Approved :: GNU General Public License v2 (GPLv2)",
-                     "Intended Audience :: Science/Research",
-                     "Development Status :: 5 - Production/Stable"],
-      url = 'http://danifold.net',
-      test_suite='tests',
+      classifiers=[
+          "Topic :: Scientific/Engineering :: Information Analysis",
+          "Topic :: Scientific/Engineering :: Artificial Intelligence",
+          "Topic :: Scientific/Engineering :: Bio-Informatics",
+          "Topic :: Scientific/Engineering :: Mathematics",
+          "Programming Language :: Python",
+          "Programming Language :: Python :: 2",
+          "Programming Language :: Python :: 3",
+          "Programming Language :: C++",
+          "Operating System :: OS Independent",
+          "License :: OSI Approved :: BSD License",
+          "License :: OSI Approved :: GNU General Public License v2 (GPLv2)",
+          "Intended Audience :: Science/Research",
+          "Development Status :: 5 - Production/Stable"],
+      url='http://danifold.net',
+      test_suite='tests.fastcluster_test',
 )
diff --git a/src/python/tests/nantest.py b/src/python/tests/nantest.py
index 465f8bb..4bfdd50 100644
--- a/src/python/tests/nantest.py
+++ b/src/python/tests/nantest.py
@@ -4,11 +4,13 @@
 and raises a FloatingPointError.'''
 print('''
 Test program for the 'fastcluster' package.
-Copyright (c) 2011 Daniel Müllner, <http://danifold.net>''')
+Copyright:
+  * Until package version 1.1.23: (c) 2011 Daniel Müllner <http://danifold.net>
+  * All changes from version 1.1.24 on: (c) Google Inc. <http://google.com>''')
 import numpy as np
 import fastcluster
 
-version = '1.1.22'
+version = '1.1.24'
 if fastcluster.__version__ != version:
     raise ValueError('Wrong module version: {} instead of {}.'.format(fastcluster.__version__, version))
 
diff --git a/src/python/tests/test.py b/src/python/tests/test.py
index 54f50f7..c831fb2 100644
--- a/src/python/tests/test.py
+++ b/src/python/tests/test.py
@@ -2,14 +2,16 @@
 # -*- coding: utf-8 -*-
 print('''
 Test program for the 'fastcluster' package.
-Copyright (c) 2011 Daniel Müllner, <http://danifold.net>''')
+Copyright:
+  * Until package version 1.1.23: (c) 2011 Daniel Müllner <http://danifold.net>
+  * All changes from version 1.1.24 on: (c) Google Inc. <http://google.com>''')
 import sys
 import fastcluster as fc
 import numpy as np
 from scipy.spatial.distance import pdist, squareform
 import math
 
-version = '1.1.22'
+version = '1.1.24'
 if fc.__version__ != version:
     raise ValueError('Wrong module version: {} instead of {}.'.format(fc.__version__, version))
 
diff --git a/src/python/tests/vectortest.py b/src/python/tests/vectortest.py
index 1cd3c3d..d63126e 100644
--- a/src/python/tests/vectortest.py
+++ b/src/python/tests/vectortest.py
@@ -4,14 +4,16 @@
 # TBD test single on integer matrices for hamming/jaccard
 print('''
 Test program for the 'fastcluster' package.
-Copyright (c) 2011 Daniel Müllner, <http://danifold.net>''')
+Copyright:
+  * Until package version 1.1.23: (c) 2011 Daniel Müllner <http://danifold.net>
+  * All changes from version 1.1.24 on: (c) Google Inc. <http://google.com>''')
 import sys
 import fastcluster as fc
 import numpy as np
 from scipy.spatial.distance import pdist, squareform
 import math
 
-version = '1.1.22'
+version = '1.1.24'
 if fc.__version__ != version:
     raise ValueError('Wrong module version: {} instead of {}.'.format(fc.__version__, version))
 
diff --git a/tests/test_fastcluster.R b/tests/test_fastcluster.R
index 93ef2e5..6b63b21 100644
--- a/tests/test_fastcluster.R
+++ b/tests/test_fastcluster.R
@@ -1,7 +1,8 @@
 #  fastcluster: Fast hierarchical clustering routines for R and Python
 #
-#  Copyright © 2011 Daniel Müllner
-#  <http://danifold.net>
+#  Copyright:
+#    * Until package version 1.1.23: © 2011 Daniel Müllner <http://danifold.net>
+#    * All changes from version 1.1.24 on: © Google Inc. <http://google.com>
 #
 # Test script for the R interface
 
diff --git a/vignettes/fastcluster.Rtex b/vignettes/fastcluster.Rtex
index f3bcb18..47fe152 100644
--- a/vignettes/fastcluster.Rtex
+++ b/vignettes/fastcluster.Rtex
@@ -1,4 +1,4 @@
-\def\fastclusterversion{1.1.22}
+\def\fastclusterversion{1.1.24}
 \documentclass[fontsize=10pt,paper=letter,BCOR=-6mm]{scrartcl}
 \usepackage[utf8]{inputenc}
 \usepackage{lmodern}
@@ -89,7 +89,7 @@
 %\VignetteIndexEntry{User's manual}
 \title{The \textit{fastcluster} package: User's manual}
 \author{\href{http://danifold.net}{Daniel Müllner}}
-\date{December 6, 2016}
+\date{August 14, 2017}
 \subtitle{Version \fastclusterversion}
 \maketitle
 
@@ -295,6 +295,18 @@ is equivalent to
 \end{quote}
 but uses less memory and is equally fast. Ties may be resolved differently, ie.\ if two pairs of nodes have equal, minimal dissimilarity values at some point, in the specific computer's representation for floating point numbers, either pair may be chosen for the next merging step in the dendrogram.
 
+Note that the formula for the \textit{\q canberra\q} metric changed in R 3.5.0: Before R version 3.5.0, the \textit{\q canberra\q} metric was computed as
+\[
+ d(u,v) = \sum_j\frac{|u_j-v_j|}{|u_j+v_j|}.
+\]
+Starting with R version 3.5.0, the formula was corrected to
+\[
+ d(u,v) = \sum_j\frac{|u_j-v_j|}{|u_j|+|v_j|}.
+\]
+Summands with $u_j=v_j=0$ always contribute 0 to the sum. The second, newer formula equals SciPy's definition.
+
+The fastcluster package detects the R version at runtime and chooses the formula accordingly, so that fastcluster and the \href{http://stat.ethz.ch/R-manual/R-patched/library/stats/html/dist.html}{\texttt{dist}} method always use the same formula for a given R version.
+
 If \textit{method} is one of \textit{\q centroid\q}, \textit{\q median\q}, \textit{\q ward\q}, clustering is performed with respect to Euclidean distances. In this case, the parameter \textit{metric} must be \textit{\q euclidean\q}. Notice that \texttt{hclust.vector} operates on Euclidean distances for compatibility reasons with the \dist{} method, while \hyperref[hclust]{\texttt{hclust}} assumes \textbf{squared} Euclidean distances for compatibility with the \href{http://stat.ethz.ch/R- [...]
 \phantomsection\label{squared}
 \begin{quote}

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/r-cran-fastcluster.git



More information about the debian-med-commit mailing list