[libstatistics-descriptive-perl] 01/04: Imported Upstream version 3.0607
Harlan Lieberman-Berg
H.LiebermanBerg at gmail.com
Thu Mar 13 17:59:26 UTC 2014
This is an automated email from the git hooks/post-receive script.
hlieberman-guest pushed a commit to branch master
in repository libstatistics-descriptive-perl.
commit b125d2117d3ada16e35ed2418b71d89bb67622e8
Author: Harlan Lieberman-Berg <H.LiebermanBerg at gmail.com>
Date: Thu Mar 13 13:28:24 2014 -0400
Imported Upstream version 3.0607
---
Build.PL | 3 +
Changes | 139 ++++-
LICENSE | 396 +++++++++++++
MANIFEST | 15 +
META.json | 89 +++
META.yml | 23 +-
Makefile.PL | 5 +-
examples/statistical-analysis.pl | 8 +-
inc/Test/Run/Builder.pm | 4 +-
lib/Statistics/Descriptive.pm | 629 +++++++++++++++------
lib/Statistics/Descriptive/Smoother.pm | 156 +++++
lib/Statistics/Descriptive/Smoother/Exponential.pm | 118 ++++
.../Descriptive/Smoother/Weightedexponential.pm | 170 ++++++
scripts/bump-version-number.pl | 35 ++
scripts/tag-release.pl | 9 +-
t/cpan-changes.t | 12 +
t/descr.t | 253 ++++++---
t/descr_smooth_methods.t | 95 ++++
t/freq_distribution-1-rt-34999.t | 4 +-
t/lib/Utils.pm | 146 +++++
t/median_absolute_deviation.t | 26 +
t/outliers.t | 144 +++++
t/quantile.t | 2 +-
t/smoother.t | 165 ++++++
t/smoother_exponential.t | 114 ++++
t/smoother_weightedexponential.t | 131 +++++
t/style-trailing-space.t | 29 +
27 files changed, 2635 insertions(+), 285 deletions(-)
diff --git a/Build.PL b/Build.PL
index 0369644..5f4cf69 100644
--- a/Build.PL
+++ b/Build.PL
@@ -20,7 +20,10 @@ my $builder = Test::Run::Builder->new(
},
requires => {
'Carp' => 0,
+ 'List::MoreUtils' => 0,
+ 'List::Util' => 0,
'POSIX' => 0,
+ 'perl' => '5.006',
'strict' => 0,
'vars' => 0,
'warnings' => 0,
diff --git a/Changes b/Changes
index bcbcac9..1aa0253 100644
--- a/Changes
+++ b/Changes
@@ -1,38 +1,119 @@
Revision history for Perl extension Statistics::Descriptive.
-3.0202 July 23, 2011
+3.0607 2014-02-01
+ - Fix the List::MoreUtils prereqs.
+ - http://www.cpantesters.org/cpan/report/365b752c-8adf-11e3-bd14-e3bee4621ba3
+ - Thanks to Chris Williams (BINGOS) for the CPAN Testers Report.
+
+3.0606 2014-01-31
+ - Implement the median_absolute_deviation method.
+ - https://bitbucket.org/shlomif/perl-statistics-descriptive/pull-request/5/median-absolute-deviation-method/diff
+ - Thanks to Kang-min Liu.
+ - Minimal version of perl set to 5.6.0 (CPANTS).
+ - Add standalone LICENSE file (CPANTS).
+
+3.0605 2013-05-21
+ - Add t/style-trailing-space.t .
+ - Add t/cpan-changes.t .
+ - Convert Changes to it.
+
+3.0604 2012-07-14
+ - Correct a misspelling of "weight" in
+ lib/Statistics/Descriptive/Smoother/Weightedexponential.pm
+ - Thanks to Wilhelm for the report.
+ - Update the scripts/tag-release.pl file for Mercurial.
+
+3.0603 2012-05-15
+ - Use in_between to compare decimal numbers
+ - Smoothing tests were failing because of rounding problems
+ - Thanks to Andreas J. König for reporting it and to
+ Fabio Ponciroli for fixing it.
+
+3.0602 2012-05-12
+ - Correct a typo:
+ - https://rt.cpan.org/Ticket/Display.html?id=77145
+ - Thanks to Salvatore Bonaccorso and the Debian Perl Group
+ for the report.
+
+3.0601 2012-05-11
+ - No longer using Test::Exception in the tests.
+ - It was used by the tests and not specified in
+ test_requires/build_requires.
+ - Thanks to hsk at fli-leibniz.de for the report.
+
+3.0600 2012-05-11
+ - Add the smoothing functionality.
+ - Add the following public methods: add_data_with_samples(),
+ set_smoother(), get_smoothed_data() to the main module.
+ - Add the lib/Statistics/Descriptive/Smoother.pm and
+ lib/Statistics/Descriptive/Smoother/Exponential.pm
+ lib/Statistics/Descriptive/Smoother/Weightedexponential.pm
+ modules.
+ - Thanks to Fabio Ponciroli
+ - Add the scripts/bump-version-number.pl to facilitate bumping the
+ version number.
+
+3.0500 2012-05-03
+ - Add the get_data_without_outliers() and the set_outlier_filter()
+ methods.
+ - See https://bitbucket.org/barbasa/perl-statistics-descriptive/overview
+ - Thanks to Fabio Ponciroli
+
+3.0400 2012-03-01
+ - Fix https://rt.cpan.org/Ticket/Display.html?id=74890
+ - some function should return undef() in list context so they can
+ be easily assigned to values in hash initialisations.
+ - thanks to SLAFFAN for a preliminary patch which was modified
+ by SHLOMIF (the current Statistics-Descriptive maintainer).
+
+3.0300 2012-02-11
+ - Now mean() and median() and other routines return undef() if there are
+ no data.
+ - Somewhat incompatible change: some methods that returned undef() under
+ list context now return an empty list (so it will be false).
+ - it is generally not recommended to call such methods in list context
+ as they should always be called in scalar context.
+ - Resolves https://rt.cpan.org/Ticket/Display.html?id=74693
+ - thanks to Shawn Laffan for the report and the patch.
+
+3.0203 2011-11-17
+ - Fix https://rt.cpan.org/Ticket/Display.html?id=72495 .
+ - percentile should not die and should return undef if there are
+ no elements in the collection.
+
+3.0202 2011-07-23
- Moved tag-release.pl to scripts/tag-release.pl (though we now use
Mercurial instead of Subversion.)
- Add t/mode.t to test the ->mode() method.
- Documented ->mode() better.
- Optimized ->mode().
-3.0201 October 14, 2010
+3.0201 2010-10-14
- Add some documentation clarifying the 0th percentile return, as it
returns undef() for representing -inf:
- Fix https://rt.cpan.org/Ticket/Display.html?id=62055
- Thanks to Dave Breimann for reporting it.
- Add the tag-release.pl to tag a release using Subversion.
-3.0200 June 18, 2010
+3.0200 2010-06-18
- Added skewness and kurtosis
- https://rt.cpan.org/Ticket/Display.html?id=58187
- Thanks to Shawn Laffan.
- - Removed the Changes / Revision log from the .pm file.
+ - Removed the Changes / Revision log from the .pm file.
-3.0102 June 15, 2010
+3.0102 2010-06-15
- Add the $VERSION variable to Statistics::Descriptive::Sparse and
Statistics::Descriptive::Full. This was done to silence the CPAN indexer.
-3.0101 June 15, 2010
+3.0101 2010-06-15
- Moved the trimmed_mean caching test (that used the Benchmark.pm module)
to rejects/descr.t , because it kept failing.
-3.0100 July 20, 2009
+3.0100 2009-07-20
- Added the quantile method - thanks to Djibril Ousmanou (DJIBEL).
- https://rt.cpan.org/Ticket/Display.html?id=47948
-3.0000 May 29, 2009
+3.0000 2009-05-29
- Added tests (for ->count, ->sum, ->sumsq, ->min, ->max)
- Localized the scope of $stat and other variables in t/descr.t
- Got rid of AUTOLOAD in favour of individual accessors.
@@ -42,7 +123,7 @@ Revision history for Perl extension Statistics::Descriptive.
- Some refactoring of the lib/Statistics/Descriptive.pm module
(without breaking the documented API).
-2.9 May 13, 2009
+2.9 2009-05-13
- Fixed bug https://rt.cpan.org/Public/Bug/Display.html?id=46026 :
- standard_deviation failing due to a variance that got evaluated
@@ -50,26 +131,26 @@ Revision history for Perl extension Statistics::Descriptive.
- Kwalitee : added a LICENSE section to the POD.
- Kwalitee (CPANTS) : added an examples/ directory with a script.
-2.8 May 09, 2009
+2.8 2009-05-09
- Enabled "./Build runtest" and "./Build distruntest" (using Test::Run)
in the distribution.
- Fixed incomplete/broken tests in t/descr.t.
-2.7 May 03, 2009
+2.7 2009-05-03
- Converted the distribution to Build.PL and re-organized it to
put everything under its proper place. Started maintaining it in:
- http://svn.berlios.de/svnroot/repos/web-cpan/Statistics-Descriptive/
- Converted t/descr.t to use "use strict;" and "use warnings;".
-
+
- Converted t/descr.t to use Test::More.
- Cleaned up the "use" statement of lib/Statistics/Descriptive.pm.
- - Added more explicit dependencies (core, though) to Build.PL.
+ - Added more explicit dependencies (core, though) to Build.PL.
- Fixed RT bug #34999: freq distribution generated too many bins.
- https://rt.cpan.org/Ticket/Display.html?id=34999
@@ -77,21 +158,21 @@ Revision history for Perl extension Statistics::Descriptive.
- Added some keywords and resources to the META.yml, using Build.PL's
meta_merge.
- - Fixed https://rt.cpan.org/Ticket/Display.html?id=32183
+ - Fixed https://rt.cpan.org/Ticket/Display.html?id=32183
- more authoritative (and non-broken) link to the RFC.
- - Applied the patch in https://rt.cpan.org/Ticket/Display.html?id=9160
- - {{#9160: Variance and Standard Deviation use costly pseudo-variance,
+ - Applied the patch in https://rt.cpan.org/Ticket/Display.html?id=9160
+ - {{#9160: Variance and Standard Deviation use costly pseudo-variance,
instead of computing real variance}}.
-2.6 October 10, 2002
+2.6 2002-10-10
- Fixed caching in trimmed mean and modified code to allow trimming
0% from upper bound. Formerly if 0 was requested then it used the
lower bound!
- POD format patch from ddunlap
-2.5 Wednesday, May 12 1999
+2.5 1999-05-12
- Forgot to document change in v2.4, which included fixing
percentile so that it worked right and added to the test
harness.
@@ -102,40 +183,40 @@ Revision history for Perl extension Statistics::Descriptive.
- Turned off caching for least_squares_fit because there's no
way to generate a unique key for memorization.
-2.3 Thursday Nov 12 1998
+2.3 1998-11-12
- Fix for frequency distribution.
Changed Makefile.PL to ease ActiveState distribution of the module.
- Andrea's code for preventing division by zero and other
+ Andrea's code for preventing division by zero and other
improvements. He also wrote a great test bench.
Added code from Warren Matthews to calculate percentile.
-2.2 Monday Feb 23 1998
+2.2 1998-02-23
- Multiple bug fixes:
Fixed min/max bug with '0' vs defined.
Provided fix for bug with AUTOLOAD/DESTROY/Carp problem.
-2.1 Tues Sep 02 1997
+2.1 1997-09-02
- Multiple bug fixes:
- Cleaned up syntax error with my scoping.
+ Cleaned up syntax error with my scoping.
Fixed errors in least_squares_fit and median methods
-2.00 Wed Aug 20 13:22:51 1997
+2.00 1997-08-20
- new version; created by h2xs 1.16
- Complete rewrite of OO interface by Colin Kuskie.
- Now has 2 classes instead of 1.5, a base class without data
storage and a class that inherits the base methods and
extends them with data storage and more statistics.
-1.1 April 1995
+1.1 1995-04-01
- Added LeastSquaresFit and FrequencyDistribution.
-1.0 March 1995
+1.0 1995-03-01
- Released to comp.lang.perl and placed on archive sites.
-0.20 December 1994
- - Complete rewrite after extensive and invaluable e-mail
+0.20 1994-12-01
+ - Complete rewrite after extensive and invaluable e-mail
correspondence with Anno Siegel.
-0.10 December 1994
+0.10 1994-12-01
- Initital concept, released to perl5-porters list.
- Jason Kastner <jkastner at tuba.aix.calpoly.edu>
diff --git a/LICENSE b/LICENSE
new file mode 100644
index 0000000..0ba8d2b
--- /dev/null
+++ b/LICENSE
@@ -0,0 +1,396 @@
+Terms of Perl itself
+
+a) the GNU General Public License as published by the Free
+ Software Foundation; either version 1, or (at your option) any
+ later version, or
+b) the "Artistic License"
+
+----------------------------------------------------------------------------
+
+ GNU GENERAL PUBLIC LICENSE
+ Version 2, June 1991
+
+ Copyright (C) 1989, 1991 Free Software Foundation, Inc.,
+ 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+ Preamble
+
+ The licenses for most software are designed to take away your
+freedom to share and change it. By contrast, the GNU General Public
+License is intended to guarantee your freedom to share and change free
+software--to make sure the software is free for all its users. This
+General Public License applies to most of the Free Software
+Foundation's software and to any other program whose authors commit to
+using it. (Some other Free Software Foundation software is covered by
+the GNU Lesser General Public License instead.) You can apply it to
+your programs, too.
+
+ When we speak of free software, we are referring to freedom, not
+price. Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+this service if you wish), that you receive source code or can get it
+if you want it, that you can change the software or use pieces of it
+in new free programs; and that you know you can do these things.
+
+ To protect your rights, we need to make restrictions that forbid
+anyone to deny you these rights or to ask you to surrender the rights.
+These restrictions translate to certain responsibilities for you if you
+distribute copies of the software, or if you modify it.
+
+ For example, if you distribute copies of such a program, whether
+gratis or for a fee, you must give the recipients all the rights that
+you have. You must make sure that they, too, receive or can get the
+source code. And you must show them these terms so they know their
+rights.
+
+ We protect your rights with two steps: (1) copyright the software, and
+(2) offer you this license which gives you legal permission to copy,
+distribute and/or modify the software.
+
+ Also, for each author's protection and ours, we want to make certain
+that everyone understands that there is no warranty for this free
+software. If the software is modified by someone else and passed on, we
+want its recipients to know that what they have is not the original, so
+that any problems introduced by others will not reflect on the original
+authors' reputations.
+
+ Finally, any free program is threatened constantly by software
+patents. We wish to avoid the danger that redistributors of a free
+program will individually obtain patent licenses, in effect making the
+program proprietary. To prevent this, we have made it clear that any
+patent must be licensed for everyone's free use or not licensed at all.
+
+ The precise terms and conditions for copying, distribution and
+modification follow.
+
+ GNU GENERAL PUBLIC LICENSE
+ TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
+
+ 0. This License applies to any program or other work which contains
+a notice placed by the copyright holder saying it may be distributed
+under the terms of this General Public License. The "Program", below,
+refers to any such program or work, and a "work based on the Program"
+means either the Program or any derivative work under copyright law:
+that is to say, a work containing the Program or a portion of it,
+either verbatim or with modifications and/or translated into another
+language. (Hereinafter, translation is included without limitation in
+the term "modification".) Each licensee is addressed as "you".
+
+Activities other than copying, distribution and modification are not
+covered by this License; they are outside its scope. The act of
+running the Program is not restricted, and the output from the Program
+is covered only if its contents constitute a work based on the
+Program (independent of having been made by running the Program).
+Whether that is true depends on what the Program does.
+
+ 1. You may copy and distribute verbatim copies of the Program's
+source code as you receive it, in any medium, provided that you
+conspicuously and appropriately publish on each copy an appropriate
+copyright notice and disclaimer of warranty; keep intact all the
+notices that refer to this License and to the absence of any warranty;
+and give any other recipients of the Program a copy of this License
+along with the Program.
+
+You may charge a fee for the physical act of transferring a copy, and
+you may at your option offer warranty protection in exchange for a fee.
+
+ 2. You may modify your copy or copies of the Program or any portion
+of it, thus forming a work based on the Program, and copy and
+distribute such modifications or work under the terms of Section 1
+above, provided that you also meet all of these conditions:
+
+ a) You must cause the modified files to carry prominent notices
+ stating that you changed the files and the date of any change.
+
+ b) You must cause any work that you distribute or publish, that in
+ whole or in part contains or is derived from the Program or any
+ part thereof, to be licensed as a whole at no charge to all third
+ parties under the terms of this License.
+
+ c) If the modified program normally reads commands interactively
+ when run, you must cause it, when started running for such
+ interactive use in the most ordinary way, to print or display an
+ announcement including an appropriate copyright notice and a
+ notice that there is no warranty (or else, saying that you provide
+ a warranty) and that users may redistribute the program under
+ these conditions, and telling the user how to view a copy of this
+ License. (Exception: if the Program itself is interactive but
+ does not normally print such an announcement, your work based on
+ the Program is not required to print an announcement.)
+
+These requirements apply to the modified work as a whole. If
+identifiable sections of that work are not derived from the Program,
+and can be reasonably considered independent and separate works in
+themselves, then this License, and its terms, do not apply to those
+sections when you distribute them as separate works. But when you
+distribute the same sections as part of a whole which is a work based
+on the Program, the distribution of the whole must be on the terms of
+this License, whose permissions for other licensees extend to the
+entire whole, and thus to each and every part regardless of who wrote it.
+
+Thus, it is not the intent of this section to claim rights or contest
+your rights to work written entirely by you; rather, the intent is to
+exercise the right to control the distribution of derivative or
+collective works based on the Program.
+
+In addition, mere aggregation of another work not based on the Program
+with the Program (or with a work based on the Program) on a volume of
+a storage or distribution medium does not bring the other work under
+the scope of this License.
+
+ 3. You may copy and distribute the Program (or a work based on it,
+under Section 2) in object code or executable form under the terms of
+Sections 1 and 2 above provided that you also do one of the following:
+
+ a) Accompany it with the complete corresponding machine-readable
+ source code, which must be distributed under the terms of Sections
+ 1 and 2 above on a medium customarily used for software interchange; or,
+
+ b) Accompany it with a written offer, valid for at least three
+ years, to give any third party, for a charge no more than your
+ cost of physically performing source distribution, a complete
+ machine-readable copy of the corresponding source code, to be
+ distributed under the terms of Sections 1 and 2 above on a medium
+ customarily used for software interchange; or,
+
+ c) Accompany it with the information you received as to the offer
+ to distribute corresponding source code. (This alternative is
+ allowed only for noncommercial distribution and only if you
+ received the program in object code or executable form with such
+ an offer, in accord with Subsection b above.)
+
+The source code for a work means the preferred form of the work for
+making modifications to it. For an executable work, complete source
+code means all the source code for all modules it contains, plus any
+associated interface definition files, plus the scripts used to
+control compilation and installation of the executable. However, as a
+special exception, the source code distributed need not include
+anything that is normally distributed (in either source or binary
+form) with the major components (compiler, kernel, and so on) of the
+operating system on which the executable runs, unless that component
+itself accompanies the executable.
+
+If distribution of executable or object code is made by offering
+access to copy from a designated place, then offering equivalent
+access to copy the source code from the same place counts as
+distribution of the source code, even though third parties are not
+compelled to copy the source along with the object code.
+
+ 4. You may not copy, modify, sublicense, or distribute the Program
+except as expressly provided under this License. Any attempt
+otherwise to copy, modify, sublicense or distribute the Program is
+void, and will automatically terminate your rights under this License.
+However, parties who have received copies, or rights, from you under
+this License will not have their licenses terminated so long as such
+parties remain in full compliance.
+
+ 5. You are not required to accept this License, since you have not
+signed it. However, nothing else grants you permission to modify or
+distribute the Program or its derivative works. These actions are
+prohibited by law if you do not accept this License. Therefore, by
+modifying or distributing the Program (or any work based on the
+Program), you indicate your acceptance of this License to do so, and
+all its terms and conditions for copying, distributing or modifying
+the Program or works based on it.
+
+ 6. Each time you redistribute the Program (or any work based on the
+Program), the recipient automatically receives a license from the
+original licensor to copy, distribute or modify the Program subject to
+these terms and conditions. You may not impose any further
+restrictions on the recipients' exercise of the rights granted herein.
+You are not responsible for enforcing compliance by third parties to
+this License.
+
+ 7. If, as a consequence of a court judgment or allegation of patent
+infringement or for any other reason (not limited to patent issues),
+conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License. If you cannot
+distribute so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you
+may not distribute the Program at all. For example, if a patent
+license would not permit royalty-free redistribution of the Program by
+all those who receive copies directly or indirectly through you, then
+the only way you could satisfy both it and this License would be to
+refrain entirely from distribution of the Program.
+
+If any portion of this section is held invalid or unenforceable under
+any particular circumstance, the balance of the section is intended to
+apply and the section as a whole is intended to apply in other
+circumstances.
+
+It is not the purpose of this section to induce you to infringe any
+patents or other property right claims or to contest validity of any
+such claims; this section has the sole purpose of protecting the
+integrity of the free software distribution system, which is
+implemented by public license practices. Many people have made
+generous contributions to the wide range of software distributed
+through that system in reliance on consistent application of that
+system; it is up to the author/donor to decide if he or she is willing
+to distribute software through any other system and a licensee cannot
+impose that choice.
+
+This section is intended to make thoroughly clear what is believed to
+be a consequence of the rest of this License.
+
+ 8. If the distribution and/or use of the Program is restricted in
+certain countries either by patents or by copyrighted interfaces, the
+original copyright holder who places the Program under this License
+may add an explicit geographical distribution limitation excluding
+those countries, so that distribution is permitted only in or among
+countries not thus excluded. In such case, this License incorporates
+the limitation as if written in the body of this License.
+
+ 9. The Free Software Foundation may publish revised and/or new versions
+of the General Public License from time to time. Such new versions will
+be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+
+Each version is given a distinguishing version number. If the Program
+specifies a version number of this License which applies to it and "any
+later version", you have the option of following the terms and conditions
+either of that version or of any later version published by the Free
+Software Foundation. If the Program does not specify a version number of
+this License, you may choose any version ever published by the Free Software
+Foundation.
+
+ 10. If you wish to incorporate parts of the Program into other free
+programs whose distribution conditions are different, write to the author
+to ask for permission. For software which is copyrighted by the Free
+Software Foundation, write to the Free Software Foundation; we sometimes
+make exceptions for this. Our decision will be guided by the two goals
+of preserving the free status of all derivatives of our free software and
+of promoting the sharing and reuse of software generally.
+
+ NO WARRANTY
+
+ 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
+FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
+OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
+PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
+OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
+TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
+PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
+REPAIR OR CORRECTION.
+
+ 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
+REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
+INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
+OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
+TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
+YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
+PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGES.
+
+ END OF TERMS AND CONDITIONS
+
+
+----------------------------------------------------------------------------
+
+The Artistic License
+
+Preamble
+
+The intent of this document is to state the conditions under which a Package
+may be copied, such that the Copyright Holder maintains some semblance of
+artistic control over the development of the package, while giving the users of the
+package the right to use and distribute the Package in a more-or-less customary
+fashion, plus the right to make reasonable modifications.
+
+Definitions:
+
+- "Package" refers to the collection of files distributed by the Copyright
+ Holder, and derivatives of that collection of files created through textual
+ modification.
+- "Standard Version" refers to such a Package if it has not been modified,
+ or has been modified in accordance with the wishes of the Copyright
+ Holder.
+- "Copyright Holder" is whoever is named in the copyright or copyrights for
+ the package.
+- "You" is you, if you're thinking about copying or distributing this Package.
+- "Reasonable copying fee" is whatever you can justify on the basis of
+ media cost, duplication charges, time of people involved, and so on. (You
+ will not be required to justify it to the Copyright Holder, but only to the
+ computing community at large as a market that must bear the fee.)
+- "Freely Available" means that no fee is charged for the item itself, though
+ there may be fees involved in handling the item. It also means that
+ recipients of the item may redistribute it under the same conditions they
+ received it.
+
+1. You may make and give away verbatim copies of the source form of the
+Standard Version of this Package without restriction, provided that you duplicate
+all of the original copyright notices and associated disclaimers.
+
+2. You may apply bug fixes, portability fixes and other modifications derived from
+the Public Domain or from the Copyright Holder. A Package modified in such a
+way shall still be considered the Standard Version.
+
+3. You may otherwise modify your copy of this Package in any way, provided
+that you insert a prominent notice in each changed file stating how and when
+you changed that file, and provided that you do at least ONE of the following:
+
+ a) place your modifications in the Public Domain or otherwise
+ make them Freely Available, such as by posting said modifications
+ to Usenet or an equivalent medium, or placing the modifications on
+ a major archive site such as ftp.uu.net, or by allowing the
+ Copyright Holder to include your modifications in the Standard
+ Version of the Package.
+
+ b) use the modified Package only within your corporation or
+ organization.
+
+ c) rename any non-standard executables so the names do not
+ conflict with standard executables, which must also be provided,
+ and provide a separate manual page for each non-standard
+ executable that clearly documents how it differs from the Standard
+ Version.
+
+ d) make other distribution arrangements with the Copyright Holder.
+
+4. You may distribute the programs of this Package in object code or executable
+form, provided that you do at least ONE of the following:
+
+ a) distribute a Standard Version of the executables and library
+ files, together with instructions (in the manual page or equivalent)
+ on where to get the Standard Version.
+
+ b) accompany the distribution with the machine-readable source of
+ the Package with your modifications.
+
+ c) accompany any non-standard executables with their
+ corresponding Standard Version executables, giving the
+ non-standard executables non-standard names, and clearly
+ documenting the differences in manual pages (or equivalent),
+ together with instructions on where to get the Standard Version.
+
+ d) make other distribution arrangements with the Copyright Holder.
+
+5. You may charge a reasonable copying fee for any distribution of this Package.
+You may charge any fee you choose for support of this Package. You may not
+charge a fee for this Package itself. However, you may distribute this Package in
+aggregate with other (possibly commercial) programs as part of a larger
+(possibly commercial) software distribution provided that you do not advertise
+this Package as a product of your own.
+
+6. The scripts and library files supplied as input to or produced as output from
+the programs of this Package do not automatically fall under the copyright of this
+Package, but belong to whomever generated them, and may be sold
+commercially, and may be aggregated with this Package.
+
+7. C or perl subroutines supplied by you and linked into this Package shall not
+be considered part of this Package.
+
+8. The name of the Copyright Holder may not be used to endorse or promote
+products derived from this software without specific prior written permission.
+
+9. THIS PACKAGE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR
+IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
+WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR A PARTICULAR
+PURPOSE.
+
+The End
+
diff --git a/MANIFEST b/MANIFEST
index 2533625..ee4a37e 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -1,6 +1,8 @@
Build.PL
Changes
+LICENSE
MANIFEST
+META.json
META.yml
Makefile.PL
README
@@ -8,12 +10,25 @@ UserSurvey.txt
examples/statistical-analysis.pl
inc/Test/Run/Builder.pm
lib/Statistics/Descriptive.pm
+lib/Statistics/Descriptive/Smoother.pm
+lib/Statistics/Descriptive/Smoother/Exponential.pm
+lib/Statistics/Descriptive/Smoother/Weightedexponential.pm
rejects/descr.t
+scripts/bump-version-number.pl
scripts/tag-release.pl
t/00-load.t
+t/cpan-changes.t
t/descr.t
+t/descr_smooth_methods.t
t/freq_distribution-1-rt-34999.t
+t/lib/Utils.pm
+t/median_absolute_deviation.t
t/mode.t
+t/outliers.t
t/pod-coverage.t
t/pod.t
t/quantile.t
+t/smoother.t
+t/smoother_exponential.t
+t/smoother_weightedexponential.t
+t/style-trailing-space.t
diff --git a/META.json b/META.json
new file mode 100644
index 0000000..c92cc3f
--- /dev/null
+++ b/META.json
@@ -0,0 +1,89 @@
+{
+ "abstract" : "Module of basic descriptive statistical functions.",
+ "author" : [
+ "Shlomi Fish <shlomif at iglu.org.il>"
+ ],
+ "dynamic_config" : 1,
+ "generated_by" : "Module::Build version 0.4007, CPAN::Meta::Converter version 2.132830",
+ "keywords" : [
+ "average",
+ "distribution",
+ "mean",
+ "median",
+ "statistics",
+ "stats",
+ "stddev",
+ "standard deviation"
+ ],
+ "license" : [
+ "perl_5"
+ ],
+ "meta-spec" : {
+ "url" : "http://search.cpan.org/perldoc?CPAN::Meta::Spec",
+ "version" : "2"
+ },
+ "name" : "Statistics-Descriptive",
+ "prereqs" : {
+ "build" : {
+ "requires" : {
+ "Benchmark" : "0",
+ "Test::More" : "0",
+ "lib" : "0"
+ }
+ },
+ "configure" : {
+ "requires" : {
+ "Module::Build" : "0.36"
+ }
+ },
+ "runtime" : {
+ "requires" : {
+ "Carp" : "0",
+ "List::MoreUtils" : "0",
+ "List::Util" : "0",
+ "POSIX" : "0",
+ "perl" : "5.006",
+ "strict" : "0",
+ "vars" : "0",
+ "warnings" : "0"
+ }
+ }
+ },
+ "provides" : {
+ "Statistics::Descriptive" : {
+ "file" : "lib/Statistics/Descriptive.pm",
+ "version" : "3.0607"
+ },
+ "Statistics::Descriptive::Full" : {
+ "file" : "lib/Statistics/Descriptive.pm",
+ "version" : "3.0607"
+ },
+ "Statistics::Descriptive::Smoother" : {
+ "file" : "lib/Statistics/Descriptive/Smoother.pm",
+ "version" : "3.0607"
+ },
+ "Statistics::Descriptive::Smoother::Exponential" : {
+ "file" : "lib/Statistics/Descriptive/Smoother/Exponential.pm",
+ "version" : "3.0607"
+ },
+ "Statistics::Descriptive::Smoother::Weightedexponential" : {
+ "file" : "lib/Statistics/Descriptive/Smoother/Weightedexponential.pm",
+ "version" : "3.0607"
+ },
+ "Statistics::Descriptive::Sparse" : {
+ "file" : "lib/Statistics/Descriptive.pm",
+ "version" : "3.0607"
+ }
+ },
+ "release_status" : "stable",
+ "resources" : {
+ "homepage" : "http://web-cpan.berlios.de/modules/Statistics-Descriptive/",
+ "license" : [
+ "http://dev.perl.org/licenses/"
+ ],
+ "repository" : {
+ "url" : "https://bitbucket.org/shlomif/perl-statistics-descriptive"
+ }
+ },
+ "version" : "3.0607"
+}
diff --git a/META.yml b/META.yml
index b455c88..e95487b 100644
--- a/META.yml
+++ b/META.yml
@@ -8,7 +8,8 @@ build_requires:
lib: 0
configure_requires:
Module::Build: 0.36
-generated_by: 'Module::Build version 0.3617'
+dynamic_config: 1
+generated_by: 'Module::Build version 0.4007, CPAN::Meta::Converter version 2.132830'
keywords:
- average
- distribution
@@ -26,16 +27,28 @@ name: Statistics-Descriptive
provides:
Statistics::Descriptive:
file: lib/Statistics/Descriptive.pm
- version: 3.0202
+ version: 3.0607
Statistics::Descriptive::Full:
file: lib/Statistics/Descriptive.pm
- version: 3.0202
+ version: 3.0607
+ Statistics::Descriptive::Smoother:
+ file: lib/Statistics/Descriptive/Smoother.pm
+ version: 3.0607
+ Statistics::Descriptive::Smoother::Exponential:
+ file: lib/Statistics/Descriptive/Smoother/Exponential.pm
+ version: 3.0607
+ Statistics::Descriptive::Smoother::Weightedexponential:
+ file: lib/Statistics/Descriptive/Smoother/Weightedexponential.pm
+ version: 3.0607
Statistics::Descriptive::Sparse:
file: lib/Statistics/Descriptive.pm
- version: 3.0202
+ version: 3.0607
requires:
Carp: 0
+ List::MoreUtils: 0
+ List::Util: 0
POSIX: 0
+ perl: 5.006
strict: 0
vars: 0
warnings: 0
@@ -43,4 +56,4 @@ resources:
homepage: http://web-cpan.berlios.de/modules/Statistics-Descriptive/
license: http://dev.perl.org/licenses/
repository: https://bitbucket.org/shlomif/perl-statistics-descriptive
-version: 3.0202
+version: 3.0607
diff --git a/Makefile.PL b/Makefile.PL
index b377ab6..458eb10 100644
--- a/Makefile.PL
+++ b/Makefile.PL
@@ -1,4 +1,5 @@
-# Note: this file was auto-generated by Module::Build::Compat version 0.36_17
+# Note: this file was auto-generated by Module::Build::Compat version 0.4007
+require 5.006;
use ExtUtils::MakeMaker;
WriteMakefile
(
@@ -7,6 +8,8 @@ WriteMakefile
'PREREQ_PM' => {
'Benchmark' => 0,
'Carp' => 0,
+ 'List::MoreUtils' => 0,
+ 'List::Util' => 0,
'POSIX' => 0,
'Test::More' => 0,
'lib' => 0,
diff --git a/examples/statistical-analysis.pl b/examples/statistical-analysis.pl
index 0d30fd6..052907c 100644
--- a/examples/statistical-analysis.pl
+++ b/examples/statistical-analysis.pl
@@ -1,6 +1,6 @@
#!/usr/bin/perl
-# This script analyses two distributions from tab-separated input
+# This script analyses two distributions from tab-separated input
# and outputs some basic statistcs.
#
# It was used to analyse the distribution of the number of moves in
@@ -13,9 +13,9 @@ use Statistics::Descriptive;
my $num_fields = 2;
-my @stats =
- (map
- { Statistics::Descriptive::Full->new(); }
+my @stats =
+ (map
+ { Statistics::Descriptive::Full->new(); }
(1 .. $num_fields)
);
diff --git a/inc/Test/Run/Builder.pm b/inc/Test/Run/Builder.pm
index 6dc24b9..504490b 100644
--- a/inc/Test/Run/Builder.pm
+++ b/inc/Test/Run/Builder.pm
@@ -54,7 +54,7 @@ sub do_test_run_tests
Test::Run::CmdLine::Iface->new(
{
'test_files' => [glob("t/*.t")],
- }
+ }
# 'backend_params' => $self->_get_backend_params(),
);
@@ -63,7 +63,7 @@ sub do_test_run_tests
sub ACTION_tags
{
- return
+ return
system(qw(
ctags -f tags --recurse --totals
--exclude=blib/ --exclude=t/lib
diff --git a/lib/Statistics/Descriptive.pm b/lib/Statistics/Descriptive.pm
index 1b3c6b3..f5ce22a 100644
--- a/lib/Statistics/Descriptive.pm
+++ b/lib/Statistics/Descriptive.pm
@@ -5,23 +5,24 @@ use warnings;
##This module draws heavily from perltoot v0.4 from Tom Christiansen.
-require 5.00404; ##Yes, this is underhanded, but makes support for me easier
- ##Not only that, but it's the latest "safe" version of
- ##Perl5. 01-03 weren't bug free.
-use vars (qw($VERSION $Tolerance));
+use 5.006;
-$VERSION = '3.0202';
+use vars (qw($VERSION $Tolerance $Min_samples_number));
+
+$VERSION = '3.0607';
$Tolerance = 0.0;
+$Min_samples_number = 4;
package Statistics::Descriptive::Sparse;
use vars qw($VERSION);
-$VERSION = '3.0202';
+$VERSION = '3.0607';
use vars qw(%fields);
use Carp;
+use Statistics::Descriptive::Smoother;
sub _make_accessors
{
@@ -76,15 +77,15 @@ sub _make_private_accessors
##Define the fields to be used as methods
%fields = (
count => 0,
- mean => 0,
- sum => 0,
- sumsq => 0,
+ mean => undef,
+ sum => undef,
+ sumsq => undef,
min => undef,
max => undef,
mindex => undef,
maxdex => undef,
sample_range => undef,
- variance => undef,
+ variance => undef,
);
__PACKAGE__->_make_accessors( [ grep { $_ ne "variance" } keys(%fields) ] );
@@ -111,75 +112,76 @@ sub _is_permitted
}
sub add_data {
- my $self = shift; ##Myself
- my $oldmean;
- my ($min,$mindex,$max,$maxdex,$sum,$sumsq,$count);
- my $aref;
+ my $self = shift; ##Myself
+ my $oldmean;
+ my ($min,$mindex,$max,$maxdex,$sum,$sumsq,$count);
+ my $aref;
- if (ref $_[0] eq 'ARRAY') {
- $aref = $_[0];
- }
- else {
- $aref = \@_;
- }
+ if (ref $_[0] eq 'ARRAY') {
+ $aref = $_[0];
+ }
+ else {
+ $aref = \@_;
+ }
- ##If we were given no data, we do nothing.
- return 1 if (!@{ $aref });
+ ##If we were given no data, we do nothing.
+ return 1 if (!@{ $aref });
- ##Take care of appending to an existing data set
-
- if (!defined($min = $self->min()))
- {
- $min = $aref->[$mindex = 0];
- }
- else
- {
- $mindex = $self->mindex();
- }
+ ##Take care of appending to an existing data set
- if (!defined($max = $self->max()))
- {
- $max = $aref->[$maxdex = 0];
- }
- else
- {
- $maxdex = $self->maxdex();
- }
+ if (!defined($min = $self->min()))
+ {
+ $min = $aref->[$mindex = 0];
+ }
+ else
+ {
+ $mindex = $self->mindex();
+ }
- $sum = $self->sum();
- $sumsq = $self->sumsq();
- $count = $self->count();
-
- ##Calculate new mean, sumsq, min and max;
- foreach ( @{ $aref } ) {
- $sum += $_;
- $sumsq += $_**2;
- $count++;
- if ($_ >= $max) {
- $max = $_;
- $maxdex = $count-1;
+ if (!defined($max = $self->max()))
+ {
+ $max = $aref->[$maxdex = 0];
}
- if ($_ <= $min) {
- $min = $_;
- $mindex = $count-1;
+ else
+ {
+ $maxdex = $self->maxdex();
}
- }
- $self->min($min);
- $self->mindex($mindex);
- $self->max($max);
- $self->maxdex($maxdex);
- $self->sample_range($max - $min);
- $self->sum($sum);
- $self->sumsq($sumsq);
- $self->mean($sum / $count);
- $self->count($count);
- ##indicator the value is not cached. Variance isn't commonly enough
- ##used to recompute every single data add.
- $self->_variance(undef());
- return 1;
+ $sum = $self->sum();
+ $sumsq = $self->sumsq();
+ $count = $self->count();
+
+ ##Calculate new mean, sumsq, min and max;
+ foreach ( @{ $aref } ) {
+ $sum += $_;
+ $sumsq += $_**2;
+ $count++;
+ if ($_ >= $max) {
+ $max = $_;
+ $maxdex = $count-1;
+ }
+ if ($_ <= $min) {
+ $min = $_;
+ $mindex = $count-1;
+ }
+ }
+
+ $self->min($min);
+ $self->mindex($mindex);
+ $self->max($max);
+ $self->maxdex($maxdex);
+ $self->sample_range($max - $min);
+ $self->sum($sum);
+ $self->sumsq($sumsq);
+ $self->mean($sum / $count);
+ $self->count($count);
+ ##indicator the value is not cached. Variance isn't commonly enough
+ ##used to recompute every single data add.
+ $self->_variance(undef);
+ return 1;
}
+
sub standard_deviation {
my $self = shift; ##Myself
return undef if (!$self->count());
@@ -188,32 +190,34 @@ sub standard_deviation {
##Return variance; if needed, compute and cache it.
sub variance {
- my $self = shift; ##Myself
- my $div = @_ ? 0 : 1;
- my $count = $self->count();
- if ($count < 1 + $div) {
- return 0;
- }
+ my $self = shift; ##Myself
- if (!defined($self->_variance())) {
- my $variance = ($self->sumsq()- $count * $self->mean()**2);
+ my $count = $self->count();
- # Sometimes due to rounding errors we get a number below 0.
- # This makes sure this is handled as gracefully as possible.
- #
- # See:
- #
- # https://rt.cpan.org/Public/Bug/Display.html?id=46026
- if ($variance < 0)
- {
- $variance = 0;
- }
+ return undef if !$count;
- $variance /= $count - $div;
+ return 0 if $count == 1;
- $self->_variance($variance);
- }
- return $self->_variance();
+ if (!defined($self->_variance())) {
+ my $variance = ($self->sumsq()- $count * $self->mean()**2);
+
+ # Sometimes due to rounding errors we get a number below 0.
+ # This makes sure this is handled as gracefully as possible.
+ #
+ # See:
+ #
+ # https://rt.cpan.org/Public/Bug/Display.html?id=46026
+
+ $variance = $variance < 0 ? 0 : $variance / ($count - 1);
+
+ $self->_variance($variance);
+
+ # Return now to avoid re-entering this sub
+ # (and therefore save time when many objects are used).
+ return $variance;
+ }
+
+ return $self->_variance();
}
##Clear a stat. More efficient than destroying an object and calling
@@ -223,7 +227,7 @@ sub clear {
my $key;
return if (!$self->count());
- while (my($field, $value) = each %fields) {
+ while (my($field, $value) = each %fields) { # could use a slice assignment here
$self->{$field} = $value;
}
}
@@ -234,28 +238,32 @@ package Statistics::Descriptive::Full;
use vars qw($VERSION);
-$VERSION = '3.0202';
+$VERSION = '3.0607';
use Carp;
-
use POSIX ();
+use Statistics::Descriptive::Smoother;
use vars qw(@ISA $a $b %fields);
@ISA = qw(Statistics::Descriptive::Sparse);
+use List::MoreUtils ();
+use List::Util ();
+
##Create a list of fields not to remove when data is updated
%fields = (
_permitted => undef, ##Place holder for the inherited key hash
data => undef, ##Our data
+ samples => undef, ##Number of samples for each value of the data set
presorted => undef, ##Flag to indicate the data is already sorted
_reserved => undef, ##Place holder for this lookup hash
);
__PACKAGE__->_make_private_accessors(
- [qw(data frequency geometric_mean harmonic_mean
+ [qw(data samples frequency geometric_mean harmonic_mean
least_squares_fit median mode
- skewness kurtosis
+ skewness kurtosis median_absolute_deviation
)
]
);
@@ -267,6 +275,7 @@ sub _clear_fields
# Empty array ref for holding data later!
$self->_data([]);
+ $self->_samples([]);
$self->_reserved(\%fields);
$self->presorted(0);
$self->_trimmed_mean_cache(+{});
@@ -280,7 +289,7 @@ sub new {
my $proto = shift;
my $class = ref($proto) || $proto;
# Create my self re SUPER
- my $self = $class->SUPER::new();
+ my $self = $class->SUPER::new();
bless ($self, $class); #Re-anneal the object
$self->_clear_fields();
return $self;
@@ -298,15 +307,18 @@ sub _delete_all_cached_keys
{
my $self = shift;
+ my %keys = %{ $self };
+
+ # Remove reserved keys for this class from the deletion list
+ delete @keys{keys %{$self->_reserved}};
+ delete @keys{keys %{$self->_permitted}};
+ delete $keys{_trimmed_mean_cache};
+
KEYS_LOOP:
- foreach my $key (keys %{ $self }) { # Check each key in the object
- # If it's a reserved key for this class, keep it
- if ($self->_is_reserved($key) || $self->_is_permitted($key))
- {
- next KEYS_LOOP;
- }
- delete $self->{$key}; # Delete the out of date cached key
+ foreach my $key (keys %keys) { # Check each key in the object
+ delete $self->{$key}; # Delete any out of date cached key
}
+ $self->{_trimmed_mean_cache} = {}; # just reset this one
return;
}
@@ -327,30 +339,209 @@ sub clear {
}
sub add_data {
- my $self = shift;
- my $aref;
+ my $self = shift; ##Myself
- if (ref $_[0] eq 'ARRAY') {
- $aref = $_[0];
- }
- else {
- $aref = \@_;
- }
- $self->SUPER::add_data($aref); ##Perform base statistics on the data
- push @{ $self->_data() }, @{ $aref };
- ##Clear the presorted flag
- $self->presorted(0);
+ my $aref;
- $self->_delete_all_cached_keys();
+ if (ref $_[0] eq 'ARRAY') {
+ $aref = $_[0];
+ }
+ else {
+ $aref = \@_;
+ }
- return 1;
+ ##If we were given no data, we do nothing.
+ return 1 if (!@{ $aref });
+
+ my $oldmean;
+ my ($min, $max, $sum, $sumsq);
+ my $count = $self->count;
+
+ # $count is modified lower down, but we need this flag after that
+ my $has_existing_data = $count;
+
+ # Take care of appending to an existing data set
+ if ($has_existing_data) {
+ $min = $self->min();
+ $max = $self->max();
+ $sum = $self->sum();
+ $sumsq = $self->sumsq();
+ }
+ else {
+ $min = $aref->[0];
+ $max = $aref->[0];
+ $sum = 0;
+ $sumsq = 0;
+ }
+
+ # need to allow for already having data
+ $sum += List::Util::sum (@$aref);
+ $sumsq += List::Util::sum (map {$_ ** 2} @$aref);
+ $max = List::Util::max ($max, @$aref);
+ $min = List::Util::min ($min, @$aref);
+ $count += scalar @$aref;
+ my $mean = $sum / $count;
+
+ $self->min($min);
+ $self->max($max);
+ $self->sample_range($max - $min);
+ $self->sum($sum);
+ $self->sumsq($sumsq);
+ $self->mean($mean);
+ $self->count($count);
+
+ ##Variance isn't commonly enough
+ ##used to recompute every single data add, so just clear its cache.
+ $self->_variance(undef);
+
+ push @{ $self->_data() }, @{ $aref };
+
+ # no need to clear keys if we are a newly populated object,
+ # and profiling shows it takes a long time when creating
+ # and populating many stats objects
+ if ($has_existing_data) {
+ ##Clear the presorted flag
+ $self->presorted(0);
+ $self->_delete_all_cached_keys();
+ }
+
+ return 1;
+}
+
+
+sub add_data_with_samples {
+ my ($self,$aref_values) = @_;
+
+ return 1 if (!@{ $aref_values });
+
+ my $aref_data = [map { keys %$_ } @{ $aref_values }];
+ my $aref_samples = [map { values %$_ } @{ $aref_values }];
+
+ $self->add_data($aref_data);
+ push @{ $self->_samples() }, @{ $aref_samples };
+
+ return 1;
}
+
sub get_data {
my $self = shift;
return @{ $self->_data() };
}
+sub get_data_without_outliers {
+ my $self = shift;
+
+ if ($self->count() < $Statistics::Descriptive::Min_samples_number) {
+ carp("Need at least $Statistics::Descriptive::Min_samples_number samples\n");
+ return;
+ }
+
+ if (!defined $self->{_outlier_filter}) {
+ carp("Outliers filter not defined\n");
+ return;
+ }
+
+ my $outlier_candidate_index = $self->_outlier_candidate_index;
+ my $possible_outlier = ($self->_data())->[$outlier_candidate_index];
+ my $is_outlier = $self->{_outlier_filter}->($self, $possible_outlier);
+
+ return $self->get_data unless $is_outlier;
+ # Removing the outlier from the dataset
+ my @good_indexes = grep { $_ != $outlier_candidate_index } (0 .. $self->count() - 1);
+
+ my @data = $self->get_data;
+ my @filtered_data = @data[@good_indexes];
+ return @filtered_data;
+}
+
+sub set_outlier_filter {
+ my ($self, $code_ref) = @_;
+
+ if (!$code_ref || ref($code_ref) ne "CODE") {
+ carp("Need to pass a code reference");
+ return;
+ }
+
+ $self->{_outlier_filter} = $code_ref;
+ return 1;
+}
+
+sub _outlier_candidate_index {
+ my $self = shift;
+
+ my $mean = $self->mean();
+ my $outlier_candidate_index = 0;
+ my $max_std_deviation = abs(($self->_data())->[0] - $mean);
+ foreach my $idx (1 .. ($self->count() - 1) ) {
+ my $curr_value = ($self->_data())->[$idx];
+ if ($max_std_deviation < abs($curr_value - $mean) ) {
+ $outlier_candidate_index = $idx;
+ $max_std_deviation = abs($curr_value - $mean);
+ }
+ }
+ return $outlier_candidate_index;
+}
+
+sub set_smoother {
+ my ($self, $args) = @_;
+
+ $args->{data} = $self->_data();
+ $args->{samples} = $self->_samples();
+
+ $self->{_smoother} = Statistics::Descriptive::Smoother->instantiate($args);
+}
+
+sub get_smoothed_data {
+ my ($self, $args) = @_;
+
+ if (!defined $self->{_smoother}) {
+ carp("Smoother object not defined\n");
+ return;
+ }
+ $self->{_smoother}->get_smoothed_data();
+}
+
+sub maxdex {
+ my $self = shift;
+
+ return undef if !$self->count;
+ my $maxdex;
+
+ if ($self->presorted) {
+ $maxdex = $self->count - 1;
+ }
+ else {
+ my $max = $self->max;
+ $maxdex = List::MoreUtils::first_index {$_ == $max} $self->get_data;
+ }
+
+ $self->{maxdex} = $maxdex;
+
+ return $maxdex;
+}
+
+sub mindex {
+ my $self = shift;
+
+ return undef if !$self->count;
+ #my $maxdex = $self->{maxdex};
+ #return $maxdex if defined $maxdex;
+ my $mindex;
+
+ if ($self->presorted) {
+ $mindex = 0;
+ }
+ else {
+ my $min = $self->min;
+ $mindex = List::MoreUtils::first_index {$_ == $min} $self->get_data;
+ }
+
+ $self->{mindex} = $mindex;
+
+ return $mindex;
+}
+
sub sort_data {
my $self = shift;
@@ -359,31 +550,35 @@ sub sort_data {
##Sort the data in descending order
$self->_data([ sort {$a <=> $b} @{$self->_data()} ]);
$self->presorted(1);
- ##Fix the maxima and minima indices
- $self->mindex(0);
- $self->maxdex($#{$self->_data()});
+ ##Fix the maxima and minima indices - no, this is unnecessary now we have methods
+ #$self->mindex(0);
+ #$self->maxdex($#{$self->_data()});
}
return 1;
}
sub percentile {
- my $self = shift;
- my $percentile = shift || 0;
- ##Since we're returning a single value there's no real need
- ##to cache this.
+ my $self = shift;
+ my $percentile = shift || 0;
+ ##Since we're returning a single value there's no real need
+ ##to cache this.
- ##If the requested percentile is less than the "percentile bin
- ##size" then return undef. Check description of RFC 2330 in the
- ##POD below.
- my $count = $self->count();
- return undef if $percentile < 100 / $count;
+ ##If the requested percentile is less than the "percentile bin
+ ##size" then return undef. Check description of RFC 2330 in the
+ ##POD below.
+ my $count = $self->count();
+
+ if ((! $count) || ($percentile < 100 / $count))
+ {
+ return; # allow for both scalar and list context
+ }
- $self->sort_data();
- my $num = $count*$percentile/100;
- my $index = &POSIX::ceil($num) - 1;
- my $val = $self->_data->[$index];
- return wantarray
+ $self->sort_data();
+ my $num = $count*$percentile/100;
+ my $index = &POSIX::ceil($num) - 1;
+ my $val = $self->_data->[$index];
+ return wantarray
? ($val, $index)
: $val
;
@@ -396,7 +591,7 @@ sub _calc_new_median
##Even or odd
if ($count % 2)
- {
+ {
return $self->_data->[($count-1)/2];
}
else
@@ -411,6 +606,8 @@ sub _calc_new_median
sub median {
my $self = shift;
+ return undef if !$self->count;
+
##Cached?
if (! defined($self->_median()))
{
@@ -427,7 +624,10 @@ sub quantile {
carp("Bad quartile type, must be 0, 1, 2, 3 or 4\n");
return;
}
-
+
+ # check data count after the args are checked - should help debugging
+ return undef if !$self->count;
+
$self->sort_data();
return $self->_data->[0] if ( $QuantileNumber == 0 );
@@ -444,7 +644,7 @@ sub quantile {
my $aK_quantile = $self->_data->[ $K_quantile - 1 ];
return $aK_quantile if ( $F_quantile == 0 );
my $aKPlus_quantile = $self->_data->[$K_quantile];
-
+
# Calcul quantile
my $quantile = $aK_quantile
+ ( $F_quantile * ( $aKPlus_quantile - $aK_quantile ) );
@@ -458,8 +658,8 @@ sub _real_calc_trimmed_mean
my $lower = shift;
my $upper = shift;
- my $lower_trim = int ($self->count()*$lower);
- my $upper_trim = int ($self->count()*$upper);
+ my $lower_trim = int ($self->count()*$lower);
+ my $upper_trim = int ($self->count()*$upper);
my ($val,$oldmean) = (0,0);
my ($tm_count,$tm_mean,$index) = (0,0,$lower_trim);
@@ -490,6 +690,9 @@ sub trimmed_mean
($lower,$upper) = ($_[0],$_[1]);
}
+ # check data count after the args
+ return undef if !$self->count;
+
##Cache
my $thistm = join ':',$lower,$upper;
my $cache = $self->_trimmed_mean_cache();
@@ -582,6 +785,8 @@ sub mode
sub geometric_mean {
my $self = shift;
+ return undef if !$self->count;
+
if (!defined($self->_geometric_mean()))
{
my $gm = 1;
@@ -609,23 +814,23 @@ sub skewness {
{
my $n = $self->count();
my $sd = $self->standard_deviation();
-
+
my $skew;
-
+
# skip if insufficient records
if ( $sd && $n > 2) {
-
+
my $mean = $self->mean();
-
+
my $sum_pow3;
-
+
foreach my $rec ( $self->get_data ) {
my $value = (($rec - $mean) / $sd);
$sum_pow3 += $value ** 3;
}
-
+
my $correction = $n / ( ($n-1) * ($n-2) );
-
+
$skew = $correction * $sum_pow3;
}
@@ -641,25 +846,25 @@ sub kurtosis {
if (!defined($self->_kurtosis()))
{
my $kurt;
-
+
my $n = $self->count();
my $sd = $self->standard_deviation();
-
+
if ( $sd && $n > 3) {
my $mean = $self->mean();
-
+
my $sum_pow4;
foreach my $rec ( $self->get_data ) {
$sum_pow4 += ( ($rec - $mean ) / $sd ) ** 4;
}
-
+
my $correction1 = ( $n * ($n+1) ) / ( ($n-1) * ($n-2) * ($n-3) );
my $correction2 = ( 3 * ($n-1) ** 2) / ( ($n-2) * ($n-3) );
-
+
$kurt = ( $correction1 * $sum_pow4 ) - $correction2;
}
-
+
$self->_kurtosis($kurt);
}
@@ -805,12 +1010,26 @@ sub least_squares_fit {
}
$rms = sqrt($rms / $count);
-
+
$self->_least_squares_fit([$q, $m, $r, $rms]);
return @{ $self->_least_squares_fit() };
}
+sub median_absolute_deviation {
+ my ($self) = @_;
+
+ if (!defined($self->_median_absolute_deviation()))
+ {
+ my $stat = $self->new;
+ $stat->add_data(map { abs($_ - $self->median) } $self->get_data);
+ $self->_median_absolute_deviation($stat->median);
+ }
+
+ return $self->_median_absolute_deviation();
+}
+
+
1;
package Statistics::Descriptive;
@@ -873,7 +1092,7 @@ except more efficient.
=item $stat->add_data(1,2,3);
-Adds data to the statistics variable. The cached statistical values are
+Adds data to the statistics variable. The cached statistical values are
updated automatically.
=item $stat->count();
@@ -936,16 +1155,96 @@ described above.
Adds data to the statistics variable. All of the sparse statistical
values are updated and cached. Cached values from Full methods are
-deleted since they are no longer valid.
+deleted since they are no longer valid.
I<Note: Calling add_data with an empty array will delete all of your
Full method cached values! Cached values for the sparse methods are
not changed>
+=item $stat->add_data_with_samples([{1 => 10}, {2 => 20}, {3 => 30},]);
+
+Add data to the statistics variable and set the number of samples each value has been
+built with. The data is the key of each element of the input array ref, while
+the value is the number of samples: [{data1 => smaples1}, {data2 => samples2}, ...]
+
=item $stat->get_data();
Returns a copy of the data array.
+=item $stat->get_data_without_outliers();
+
+Returns a copy of the data array without outliers. The number minimum of
+samples to apply the outlier filtering is C<$Statistics::Descriptive::Min_samples_number>,
+4 by default.
+
+A function to detect outliers need to be defined (see C<set_outlier_filter>),
+otherwise the function will return an undef value.
+
+The filtering will act only on the most extreme value of the data set
+(i.e.: value with the highest absolute standard deviation from the mean).
+
+If there is the need to remove more than one outlier, the filtering
+need to be re-run for the next most extreme value with the initial outlier removed.
+
+This is not always needed since the test (for example Grubb's test) usually can only detect
+the most exreme value. If there is more than one extreme case in a set,
+then the standard deviation will be high enough to make neither case an outlier.
+
+=item $stat->set_outlier_filter($code_ref);
+
+Set the function to filter out the outlier.
+
+C<$code_ref> is the reference to the subroutine implemeting the filtering function.
+
+Returns C<undef> for invalid values of C<$code_ref> (i.e.: not defined or not a
+code reference), C<1> otherwise.
+
+=over 4
+
+=item
+
+Example #1: Undefined code reference
+
+ my $stat = Statistics::Descriptive::Full->new();
+ $stat->add_data(1, 2, 3, 4, 5);
+
+ print $stat->set_outlier_filter(); # => undef
+
+=item
+
+Example #2: Valid code reference
+
+ sub outlier_filter { return $_[1] > 1; }
+
+ my $stat = Statistics::Descriptive::Full->new();
+ $stat->add_data( 1, 1, 1, 100, 1, );
+
+ print $stat->set_outlier_filter( \&outlier_filter ); # => 1
+ my @filtered_data = $stat->get_data_without_outliers();
+ # @filtered_data is (1, 1, 1, 1)
+
+In this example the series is really simple and the outlier filter function as well.
+For more complex series the outlier filter function might be more complex
+(see Grubbs' test for outliers).
+
+The outlier filter function will receive as first parameter the Statistics::Descriptive::Full object,
+as second the value of the candidate outlier. Having the object in the function
+might be useful for complex filters where statistics property are needed (again see Grubbs' test for outlier).
+
+=back
+
+=item $stat->set_smoother({ method => 'exponential', coeff => 0, });
+
+Set the method used to smooth the data and the smoothing coefficient.
+See C<Statistics::Smoother> for more details.
+
+=item $stat->get_smoothed_data();
+
+Returns a copy of the smoothed data array.
+
+The smoothing method and coefficient need to be defined (see C<set_smoother>),
+otherwise the function will return an undef value.
+
=item $stat->sort_data();
Sort the stored data and update the mindex and maxdex methods. This
@@ -965,9 +1264,9 @@ the flag.
=item $stat->skewness();
-Returns the skewness of the data.
+Returns the skewness of the data.
A value of zero is no skew, negative is a left skewed tail,
-positive is a right skewed tail.
+positive is a right skewed tail.
This is consistent with Excel.
=item $stat->kurtosis();
@@ -1032,7 +1331,7 @@ also return the index of the percentile.
=item $x = $stat->quantile($Type);
-Sorts the data and returns estimates of underlying distribution quantiles based on one
+Sorts the data and returns estimates of underlying distribution quantiles based on one
or two order statistics from the supplied elements.
This method use the same algorithm as Excel and R language (quantile B<type 7>).
@@ -1047,7 +1346,7 @@ B<$Type> is an integer value between 0 to 4 :
3 => third quartile (Q3) : upper quartile = highest cut off (25%) of data, or lowest 75% = 75th percentile
4 => fourth quartile (Q4) : maximal value
-Exemple :
+Exemple :
my @data = (1..10);
my $stat = Statistics::Descriptive::Full->new();
@@ -1077,12 +1376,12 @@ Returns the geometric mean of the data.
Returns the mode of the data. The mode is the most commonly occuring datum.
See L<http://en.wikipedia.org/wiki/Mode_%28statistics%29> . If all values
-occur only once, then mode() will return undef.
+occur only once, then mode() will return undef.
=item $stat->trimmed_mean(ltrim[,utrim]);
-C<trimmed_mean(ltrim)> returns the mean with a fraction C<ltrim>
-of entries at each end dropped. C<trimmed_mean(ltrim,utrim)>
+C<trimmed_mean(ltrim)> returns the mean with a fraction C<ltrim>
+of entries at each end dropped. C<trimmed_mean(ltrim,utrim)>
returns the mean after a fraction C<ltrim> has been removed from the
lower end of the data and a fraction C<utrim> has been removed from the
upper end of the data. This method sorts the data before beginning
@@ -1221,6 +1520,10 @@ Colin Kuskie
My email address can be found at http://www.perl.com under Who's Who
or at: http://search.cpan.org/author/COLINK/.
+=head1 CONTRIBUTORS
+
+Fabio Ponciroli & Adzuna Ltd. team (outliers handling)
+
=head1 REFERENCES
RFC2330, Framework for IP Performance Metrics
diff --git a/lib/Statistics/Descriptive/Smoother.pm b/lib/Statistics/Descriptive/Smoother.pm
new file mode 100644
index 0000000..6db8664
--- /dev/null
+++ b/lib/Statistics/Descriptive/Smoother.pm
@@ -0,0 +1,156 @@
+package Statistics::Descriptive::Smoother;
+
+use strict;
+use warnings;
+
+use Carp;
+
+our $VERSION = '3.0607';
+
+sub instantiate {
+ my ($class, $args) = @_;
+
+ my $method = delete $args->{method};
+ my $coeff = delete $args->{coeff} || 0;
+ my $ra_samples = delete $args->{samples};
+ my $ra_data = delete $args->{data};
+
+ if ($coeff < 0 || $coeff > 1) {
+ carp("Invalid smoothing coefficient C $coeff\n");
+ return;
+ }
+ if (@$ra_data < 2) {
+ carp("Need at least 2 samples to smooth the data\n");
+ return;
+ }
+ $method = ucfirst(lc($method));
+ my $sub_class = __PACKAGE__."::$method";
+ eval "require $sub_class";
+ die "No such class $sub_class: $@" if $@;
+
+ return $sub_class->_new({
+ data => $ra_data,
+ samples => $ra_samples,
+ count => scalar @$ra_data,
+ coeff => $coeff,
+ });
+}
+
+sub get_smoothing_coeff { $_[0]->{coeff} }
+
+sub set_smoothing_coeff {
+ my ($self, $coeff) = @_;
+
+ if ($coeff < 0 || $coeff > 1) {
+ carp("Invalid smoothing coefficient C $coeff\n");
+ return;
+ }
+
+ $self->{coeff} = $coeff;
+ return 1;
+}
+
+1;
+
+__END__
+
+=head1 NAME
+
+Statistics::Descriptive::Smoother - Base module for smoothing statistical data
+
+=head1 SYNOPSIS
+
+ use Statistics::Descriptive::Smoother;
+ my $smoother = Statistics::Descriptive::Smoother->instantiate({
+ method => 'exponential',
+ coeff => 0.5,
+ data => [1, 2, 3, 4, 5],
+ samples => [110, 120, 130, 140, 150],
+ });
+ my @smoothed_data = $smoother->get_smoothed_data();
+
+=head1 DESCRIPTION
+
+This module provide methods to smooth the trend of a series of statistical data.
+
+The methods provided are the C<Exponential> and the C<Weighted Exponential> (see respectively
+C<Statistics::Descriptive::Smoother::Exponential> and C<Statistics::Descriptive::Smoother::Weightedexponential>
+for more details).
+
+This class is just a factory that will instantiate the object to perform the
+chosen smoothing algorithm.
+
+=head1 METHODS
+
+=over 5
+
+=item Statistics::Descriptive::Smoother->instantiate({});
+
+Create a new Smoother object.
+
+This method require several parameters:
+
+=over 5
+
+=item method
+
+Method used for the smoothing. Allowed values are: C<exponential> and C<weightedexponential>
+
+=item coeff
+
+Smoothing coefficient. It needs to be in the [0;1] range, otherwise undef will be reutrned.
+C<0> means that the series is not smoothed at all, while C<1> the series is universally equal to the initial unsmoothed value.
+
+=item data
+
+Array ref with the data of the series. At least 2 values are needed to smooth the series, undef is returned otherwise.
+
+=item samples
+
+Array ref with the samples each data value has been built with. This is an optional parameter since it is not used by all the
+smoothing algorithm.
+
+=back
+
+=item $smoother->get_smoothing_coeff();
+
+Returns the smoothing coefficient.
+
+=item $smoother->set_smoothing_coeff(0.5);
+
+Set the smoothing coefficient value. It needs to be in the [0;1] range, otherwise undef will be reutrned.
+
+=back
+
+=head1 AUTHOR
+
+Fabio Ponciroli
+
+=head1 COPYRIGHT
+
+Copyright(c) 2012 by Fabio Ponciroli.
+
+=head1 LICENSE
+
+This file is licensed under the MIT/X11 License:
+http://www.opensource.org/licenses/mit-license.php.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+=cut
diff --git a/lib/Statistics/Descriptive/Smoother/Exponential.pm b/lib/Statistics/Descriptive/Smoother/Exponential.pm
new file mode 100644
index 0000000..120c704
--- /dev/null
+++ b/lib/Statistics/Descriptive/Smoother/Exponential.pm
@@ -0,0 +1,118 @@
+package Statistics::Descriptive::Smoother::Exponential;
+use strict;
+use warnings;
+
+use base 'Statistics::Descriptive::Smoother';
+
+our $VERSION = '3.0607';
+
+sub _new {
+ my ($class, $args) = @_;
+
+ return bless $args || {}, $class;
+}
+
+# The name of the variables used in the code refers to the explanation in the pod
+sub get_smoothed_data {
+ my ($self) = @_;
+
+ my @smoothed_values;
+ push @smoothed_values, @{$self->{data}}[0];
+ my $C = $self->get_smoothing_coeff();
+
+ foreach my $sample_idx (1 .. $self->{count} -1) {
+ my $smoothed_value = $C * ($smoothed_values[-1]) + (1 - $C) * $self->{data}->[$sample_idx];
+ push @smoothed_values, $smoothed_value;
+ }
+ return @smoothed_values;
+}
+
+1;
+
+__END__
+
+=head1 NAME
+
+Statistics::Descriptive::Smoother::Exponential - Implement exponential smoothing
+
+=head1 SYNOPSIS
+
+ use Statistics::Descriptive::Smoother;
+ my $smoother = Statistics::Descriptive::Smoother->instantiate({
+ method => 'exponential',
+ coeff => 0.5,
+ data => [1, 2, 3, 4, 5],
+ samples => [110, 120, 130, 140, 150],
+ });
+ my @smoothed_data = $smoother->get_smoothed_data();
+
+=head1 DESCRIPTION
+
+This module implement the exponential smoothing algorithm to smooth the trend of a series of statistical data.
+
+This algorithm works well for unsmoothed data build with big number of samples. If this is not
+the case you might consider using the C<Weighted Exponential> one.
+
+The algorithm implements the following formula:
+
+S(0) = X(0)
+
+S(t) = C*S(t-1) + (1-C)*X(t)
+
+where:
+
+=over 3
+
+=item * t = index in the series
+
+=item * S(t) = smoothed series value at position t
+
+=item * C = smoothing coefficient. Value in the [0;1] range. C<0> means that the series is not smoothed at all,
+while C<1> the series is universally equal to the initial unsmoothed value.
+
+=item * X(t) = unsmoothed series value at position t
+
+=back
+
+=head1 METHODS
+
+=over 5
+
+=item $stats->get_smoothed_data();
+
+Returns a copy of the smoothed data array.
+
+=back
+
+=head1 AUTHOR
+
+Fabio Ponciroli
+
+=head1 COPYRIGHT
+
+Copyright(c) 2012 by Fabio Ponciroli.
+
+=head1 LICENSE
+
+This file is licensed under the MIT/X11 License:
+http://www.opensource.org/licenses/mit-license.php.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+=cut
diff --git a/lib/Statistics/Descriptive/Smoother/Weightedexponential.pm b/lib/Statistics/Descriptive/Smoother/Weightedexponential.pm
new file mode 100644
index 0000000..ec9f4cd
--- /dev/null
+++ b/lib/Statistics/Descriptive/Smoother/Weightedexponential.pm
@@ -0,0 +1,170 @@
+package Statistics::Descriptive::Smoother::Weightedexponential;
+use strict;
+use warnings;
+
+use Carp;
+use base 'Statistics::Descriptive::Smoother';
+
+our $VERSION = '3.0607';
+
+sub _new {
+ my ($class, $args) = @_;
+
+ if (scalar @{$args->{data}} != scalar @{$args->{samples}}) {
+ carp("Number of data values and samples need to be the same\n");
+ return;
+ }
+
+ return bless $args || {}, $class;
+}
+
+# The name of the variables used in the code refers to the explanation in the pod
+sub get_smoothed_data {
+ my ($self) = @_;
+
+ my (@smoothed_values, @Wts);
+ # W(0) = N(0)
+ push @Wts, @{$self->{samples}}[0];
+ # S(0) = X(0)
+ push @smoothed_values, @{$self->{data}}[0];
+ my $C = $self->get_smoothing_coeff();
+
+ foreach my $idx (1 .. ($self->{count} -1)) {
+ my $Xt = $self->{data}->[$idx];
+ my $Nt = $self->{samples}->[$idx];
+ my $St_1 = $smoothed_values[-1];
+ my $Wt_1 = $Wts[-1];
+
+ push @Wts, $self->_get_Wt($Wt_1, $Nt);
+
+ my $coeff_a = $self->_get_coeff_A($Wt_1, $Nt);
+ my $coeff_b = $self->_get_coeff_B($Wt_1, $Nt);
+
+ my $smoothed_value = ( $St_1 * $coeff_a + $Xt * $coeff_b ) / ( $coeff_a + $coeff_b );
+ push @smoothed_values, $smoothed_value;
+ }
+ return @smoothed_values;
+}
+
+sub _get_Wt {
+ my ($self, $Wt_1, $Nt) = @_;
+
+ my $C = $self->get_smoothing_coeff();
+ my $coeff_a = $self->_get_coeff_A($Wt_1, $Nt);
+ my $coeff_b = $self->_get_coeff_B($Wt_1, $Nt);;
+
+ return (($Wt_1 * $coeff_a + $Nt * $coeff_b)/($coeff_a + $coeff_b));
+}
+
+sub _get_coeff_A {
+ my ($self, $Wt_1, $Nt) = @_;
+
+ my $C = $self->get_smoothing_coeff();
+ return $C * ( $Wt_1 / ($Wt_1 + $Nt) );
+}
+
+sub _get_coeff_B {
+ my ($self, $Wt_1, $Nt) = @_;
+
+ my $C = $self->get_smoothing_coeff();
+ return (1 - $C) * ( $Nt / ($Nt + $Wt_1) );
+}
+
+1;
+
+__END__
+
+=head1 NAME
+
+Statistics::Descriptive::Smoother::Weigthedexponential - Implement weighted
+exponential smoothing
+
+=head1 SYNOPSIS
+
+ use Statistics::Descriptive::Smoother;
+ my $smoother = Statistics::Descriptive::Smoother->instantiate({
+ method => 'weightedexponential',
+ coeff => 0.5,
+ data => [1, 2, 3, 4, 5],
+ samples => [110, 120, 130, 140, 150],
+ });
+ my @smoothed_data = $smoother->get_smoothed_data();
+
+=head1 DESCRIPTION
+
+This module implement the weighted exponential smoothing algorithm to smooth
+the trend of a series of statistical data.
+
+This algorithm can help to control large swings in the unsmoothed data that
+arise from small samples for those data points.
+
+The algorithm implements the following formula:
+
+W(0) = N(0)
+
+W(t) = ( W(t-1) * CoeffA + N(t) * CoeffB ) / (CoeffA + CoeffB)
+
+CoeffA = C * ( W(t-1) / (W(t-1) + N(t) ) )
+
+CoeffB = (1 - C) * ( N(t) * (W(t-1) + N(t)) )
+
+
+S(t) = (S(t-1)*CoeffA + X(t)*CoeffB) / (CoeffA + CoeffB)
+
+where:
+
+=over 3
+
+=item * t = index in the series
+
+=item * S(t) = smoothed series value at position t
+
+=item * C = smoothing coefficient. Value in the [0;1] range. C<0> means that the series is not smoothed at all,
+while C<1> the series is universally equal to the initial unsmoothed value.
+
+=item * X(t) = unsmoothed series value at position t
+
+=back
+
+=head1 METHODS
+
+=over 5
+
+=item $stats->get_smoothed_data();
+
+Returns a copy of the smoothed data array.
+
+=back
+
+=head1 AUTHOR
+
+Fabio Ponciroli
+
+=head1 COPYRIGHT
+
+Copyright(c) 2012 by Fabio Ponciroli.
+
+=head1 LICENSE
+
+This file is licensed under the MIT/X11 License:
+http://www.opensource.org/licenses/mit-license.php.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+=cut
diff --git a/scripts/bump-version-number.pl b/scripts/bump-version-number.pl
new file mode 100644
index 0000000..027153d
--- /dev/null
+++ b/scripts/bump-version-number.pl
@@ -0,0 +1,35 @@
+#!/usr/bin/perl
+
+use strict;
+use warnings;
+
+use File::Find::Object;
+use IO::All;
+
+my $tree = File::Find::Object->new({}, 'lib/');
+
+my $version_n = shift(@ARGV);
+
+if (!defined($version_n))
+{
+ die "Specify version number as an argument! bump-version-number.pl '0.0.1'";
+}
+
+while (my $r = $tree->next()) {
+ if ($r =~ m{/\.svn\z})
+ {
+ $tree->prune();
+ }
+ elsif ($r =~ m{\.pm\z})
+ {
+ my @lines = io->file($r)->getlines();
+ foreach (@lines)
+ {
+ s#(\$VERSION = '|^Version )\d+\.\d+(?:\.\d+)?('|)#$1 . $version_n . $2#e;
+ }
+ io->file($r)->print(
+ @lines
+ );
+ }
+}
+
diff --git a/scripts/tag-release.pl b/scripts/tag-release.pl
index 2e46be6..0aeec74 100644
--- a/scripts/tag-release.pl
+++ b/scripts/tag-release.pl
@@ -5,8 +5,8 @@ use warnings;
use IO::All;
-my ($version) =
- (map { m{\$VERSION *= *'([^']+)'} ? ($1) : () }
+my ($version) =
+ (map { m{\$VERSION *= *'([^']+)'} ? ($1) : () }
io->file('lib/Statistics/Descriptive.pm')->getlines()
)
;
@@ -19,10 +19,9 @@ if (!defined ($version))
my $mini_repos_base = 'https://svn.berlios.de/svnroot/repos/web-cpan/Statistics-Descriptive';
my @cmd = (
- "svn", "copy", "-m",
+ "hg", "tag", "-m",
"Tagging the Statistics-Descriptive release as $version",
- "$mini_repos_base/trunk",
- "$mini_repos_base/tags/releases/$version",
+ "releases/$version",
);
print join(" ", @cmd), "\n";
diff --git a/t/cpan-changes.t b/t/cpan-changes.t
new file mode 100644
index 0000000..5ec3b98
--- /dev/null
+++ b/t/cpan-changes.t
@@ -0,0 +1,12 @@
+#!/usr/bin/perl
+
+use strict;
+use warnings;
+
+use Test::More;
+
+eval 'use Test::CPAN::Changes';
+plan skip_all => 'Test::CPAN::Changes required for this test' if $@;
+
+changes_ok();
+
diff --git a/t/descr.t b/t/descr.t
index 019e6a0..fdd31a2 100644
--- a/t/descr.t
+++ b/t/descr.t
@@ -3,84 +3,20 @@
use strict;
use warnings;
-use Test::More tests => 28;
+use Test::More tests => 61;
+
+use lib 't/lib';
+use Utils qw/is_between compare_hash_by_ranges/;
use Benchmark;
use Statistics::Descriptive;
-sub compare_hash_by_ranges
-{
- local $Test::Builder::Level = $Test::Builder::Level + 1;
-
- my $got_hash_ref = shift;
- my $expected = shift;
- my $blurb = shift;
-
- my $got =
- [
- map { [$_, $got_hash_ref->{$_} ] }
- sort { $a <=> $b }
- keys(%$got_hash_ref)
- ]
- ;
-
- my $success = 1;
-
- if (scalar(@$expected) != scalar(@$got))
- {
- $success = 0;
- diag("Number of keys differ in hashes.");
- }
- else
- {
- COMPARE_KEYS:
- for my $idx (0 .. $#$got)
- {
- my ($got_key, $got_val) = @{$got->[$idx]};
- my ($expected_bottom, $expected_top, $expected_val)
- = @{$expected->[$idx]};
-
- if (! ( ($got_key >= $expected_bottom)
- && ($got_key <= $expected_top)
- && ($got_val == $expected_val)
- )
- )
- {
- $success = 0;
- diag(<<"EOF");
-Key/Val pair No. $idx is out of range or wrong:
-Got: [$got_key, $got_val]
-Expected: [$expected_bottom, $expected_top, $expected_val]
-EOF
-
- last COMPARE_KEYS;
- }
- }
- }
-
- ok($success, $blurb);
-}
-
-sub is_between
-{
- local $Test::Builder::Level = $Test::Builder::Level + 1;
-
- my ($have, $want_bottom, $want_top, $blurb) = @_;
-
- ok (
- (($have >= $want_bottom) &&
- ($want_top >= $have)),
- $blurb
- );
-}
-
-
{
# test #1
my $stat = Statistics::Descriptive::Full->new();
my @results = $stat->least_squares_fit();
# TEST
- ok (!scalar(@results), "Results on an non-filled object are empty.");
+ ok (!scalar(@results), "Least-squares results on a non-filled object are empty.");
# test #2
# data are y = 2*x - 1
@@ -349,7 +285,7 @@ sub is_between
{
my $stat = Statistics::Descriptive::Full->new();
my $expected;
-
+
$stat->add_data(1 .. 9, 100);
# TEST
@@ -367,11 +303,11 @@ sub is_between
$expected + 1E-13,
"Kurtosis of $expected +/- 1E-13"
);
-
+
$stat->add_data(100 .. 110);
-
+
# now check that cached skew and kurt values are recalculated
-
+
# TEST
$expected = -0.306705104889384;
is_between ($stat->skewness(),
@@ -413,3 +349,174 @@ sub is_between
}
+{
+ # This is a fix for:
+ # https://rt.cpan.org/Ticket/Display.html?id=72495
+ # Thanks to Robert Messer
+ my $stat = Statistics::Descriptive::Full->new();
+
+ my $ret = $stat->percentile(100);
+
+ # TEST
+ ok (!defined($ret), 'Returns undef and does not die.');
+}
+
+
+
+# test stats when no data have been added
+{
+ my $stat = Statistics::Descriptive::Full->new();
+ my ($result, $str);
+
+ # An accessor method for _permitted would be handy,
+ # or one to get all the stats methods
+ my @methods = qw {
+ mean sum variance standard_deviation
+ min mindex max maxdex sample_range
+ skewness kurtosis median
+ harmonic_mean geometric_mean
+ mode least_squares_fit
+ percentile frequency_distribution
+ };
+ # least_squares_fit is handled in an earlier test, so is actually a duplicate here
+
+ #diag 'Results are undef when no data added';
+ # need to update next line when new methods are tested here
+ # TEST:$method_count=18
+ foreach my $method (sort @methods) {
+ $result = $stat->$method;
+ # TEST*$method_count
+ ok (!defined ($result), "$method is undef when object has no data.");
+ }
+
+ # quantile and trimmed_mean require valid args, so don't test in the method loop
+ my $method = 'quantile';
+ $result = $stat->$method(1);
+ # TEST
+ ok (!defined ($result), "$method is undef when object has no data.");
+
+ $method = 'trimmed_mean';
+ $result = $stat->$method(0.1);
+ # TEST
+ ok (!defined ($result), "$method is undef when object has no data.");
+}
+
+# test SD when only one value added
+{
+ my $stat = Statistics::Descriptive::Full->new();
+ $stat->add_data( 1 );
+
+ my $result = $stat->standard_deviation();
+ # TEST
+ ok ($result == 0, "SD is zero when object has one record.");
+}
+
+# Test function returns undef in list context when no data added.
+# The test itself is almost redundant.
+# Fixes https://rt.cpan.org/Ticket/Display.html?id=74890
+{
+ my $stat = Statistics::Descriptive::Full->new();
+
+ # TEST
+ is_deeply(
+ [ $stat->median(), ],
+ [ undef() ],
+ "->median() Returns undef in list-context.",
+ );
+
+ # TEST
+ is_deeply(
+ [ $stat->standard_deviation(), ],
+ [ undef() ],
+ "->standard_deviation() Returns undef in list-context.",
+ );
+}
+
+{
+ my $stats = Statistics::Descriptive::Full->new();
+
+ $stats->add_data_with_samples([{1 => 10}, {2 => 20}, {3 => 30}, {4 => 40}, {5 => 50}]);
+
+ # TEST
+ is_deeply(
+ $stats->_data(),
+ [ 1, 2, 3, 4, 5 ],
+ 'add_data_with_samples: data set is correct',
+ );
+
+ # TEST
+ is_deeply(
+ $stats->_samples(),
+ [ 10, 20, 30, 40, 50 ],
+ 'add_data_with_samples: samples are correct',
+ );
+
+}
+
+# Tests for mindex and maxdex on unsorted data,
+# including when new data are added which should not change the values
+{
+ my $stats_class = 'Statistics::Descriptive::Full';
+ my $stat1 = $stats_class->new();
+
+ my @data1 = (20, 1 .. 3, 100, 1..5);
+ my @data2 = (25, 30);
+
+ my $e_maxdex = 4;
+ my $e_mindex = 1;
+
+ $stat1->add_data(@data1); # initialise
+
+ # TEST*2
+ is ($stat1->mindex, $e_mindex, "initial mindex is correct");
+ is ($stat1->maxdex, $e_maxdex, "initial maxdex is correct");
+
+ # TEST*2
+ $stat1->add_data(@data2); # add new data
+ is ($stat1->mindex, $e_mindex, "mindex is correct after new data added");
+ is ($stat1->maxdex, $e_maxdex, "maxdex is correct after new data added");
+
+ # TEST*2
+ $stat1->median; # trigger a sort
+ $e_maxdex = scalar @data1 + scalar @data2 - 1;
+ is ($stat1->mindex, 0, "mindex is correct after sorting");
+ is ($stat1->maxdex, $e_maxdex, "maxdex is correct after sorting");
+
+}
+
+
+# what happens when we add new data?
+# Recycle the same data so mean, sd etc remain the same
+{
+ my $stats_class = 'Statistics::Descriptive::Full';
+ my $stat1 = $stats_class->new();
+ my $stat2 = $stats_class->new();
+
+ my @data1 = (1 .. 9, 100);
+ my @data2 = (100 .. 110);
+
+ # sample of methods
+ my @methods = qw /mean standard_deviation count skewness kurtosis median/;
+
+ $stat1->add_data(@data1); # initialise
+ foreach my $meth (@methods) { # run some methods
+ $stat1->$meth;
+ }
+ $stat1->add_data(@data2); # add new data
+ foreach my $meth (@methods) { # re-run some methods
+ $stat1->$meth;
+ }
+
+ $stat2->add_data(@data1, @data2); # initialise with all data
+ foreach my $meth (@methods) { # run some methods
+ $stat2->$meth;
+ }
+
+ # TEST
+ is_deeply (
+ $stat1,
+ $stat2,
+ 'stats consistent after adding new data',
+ );
+
+}
diff --git a/t/descr_smooth_methods.t b/t/descr_smooth_methods.t
new file mode 100644
index 0000000..e59ffc0
--- /dev/null
+++ b/t/descr_smooth_methods.t
@@ -0,0 +1,95 @@
+#!/usr/bin/perl
+
+use strict;
+use warnings;
+
+use Test::More tests => 3;
+
+use Statistics::Descriptive;
+
+local $SIG{__WARN__} = sub { };
+
+my @original_data = (1 .. 10);
+
+{
+ # testing set_smoother
+ my $stats = Statistics::Descriptive::Full->new();
+
+ $stats->add_data(\@original_data );
+
+ $stats->set_smoother({
+ method => 'exponential',
+ coeff => 0,
+ });
+ # TEST
+ isa_ok ( $stats->{_smoother}, 'Statistics::Descriptive::Smoother::Exponential', 'set_smoother: smoother set correctly');
+
+}
+
+{
+ # testing get_smoothed_data
+ my $stats = Statistics::Descriptive::Full->new();
+
+ # TEST
+ is ( $stats->get_smoothed_data(), undef, 'get_smoothed_data: smoother needs to be defined');
+
+ $stats->add_data(\@original_data );
+
+ $stats->set_smoother({
+ method => 'exponential',
+ coeff => 0.5,
+ });
+
+ my @expected_values = (
+ 1,
+ 1.5,
+ 2.25,
+ 3.125,
+ 4.0625,
+ 5.03125,
+ 6.015625,
+ 7.0078125,
+ 8.00390625,
+ 9.001953125,
+ );
+
+ my @smoothed_data = $stats->get_smoothed_data();
+
+ # TEST
+ is_deeply( \@smoothed_data, \@expected_values, 'Smoothing with C=0.5');
+}
+
+=pod
+
+=head1 AUTHOR
+
+Fabio Ponciroli
+
+=head1 COPYRIGHT
+
+Copyright(c) 2012 by Fabio Ponciroli.
+
+=head1 LICENSE
+
+This file is licensed under the MIT/X11 License:
+http://www.opensource.org/licenses/mit-license.php.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+=cut
diff --git a/t/freq_distribution-1-rt-34999.t b/t/freq_distribution-1-rt-34999.t
index 4706526..c245650 100644
--- a/t/freq_distribution-1-rt-34999.t
+++ b/t/freq_distribution-1-rt-34999.t
@@ -2,7 +2,7 @@
use strict;
use warnings;
-
+
use Test::More tests => 2;
use Statistics::Descriptive;
@@ -20,7 +20,7 @@ my @data=(
my $stat = Statistics::Descriptive::Full->new();
$stat->add_data(@data);
# I should get 20 partitions, shouldn't I?
-my %freqs=$stat->frequency_distribution (20);
+my %freqs=$stat->frequency_distribution (20);
# TEST
is (scalar(keys(%freqs)),
diff --git a/t/lib/Utils.pm b/t/lib/Utils.pm
new file mode 100644
index 0000000..0831e96
--- /dev/null
+++ b/t/lib/Utils.pm
@@ -0,0 +1,146 @@
+package Utils;
+
+use strict;
+use warnings;
+
+require Exporter;
+our @ISA = qw/Exporter/;
+our @EXPORT_OK = qw/is_between compare_hash_by_ranges is_array_between/;
+
+use Test::More;
+
+sub is_between {
+ local $Test::Builder::Level = $Test::Builder::Level + 1;
+
+ my ($have, $want_bottom, $want_top, $blurb) = @_;
+
+ ok (
+ _is_between($have, $want_bottom, $want_top),
+ $blurb
+ );
+}
+
+sub is_array_between {
+ local $Test::Builder::Level = $Test::Builder::Level + 1;
+
+ my ($got_array_ref, $expected_array_ref, $low_tolerance, $high_tolerance, $blurb) = @_;
+
+ my $success = 1;
+ if (scalar @$expected_array_ref != scalar @$got_array_ref) {
+ $success = 0;
+ diag('Arrays have different lengths');
+ }
+ else {
+ for my $idx (0 .. $#$got_array_ref) {
+ my $expected_bottom = $expected_array_ref->[$idx] - $low_tolerance;
+ my $expected_top = $expected_array_ref->[$idx] + $high_tolerance;
+ unless (_is_between($got_array_ref->[$idx], $expected_bottom, $expected_top)) {
+ $success = 0;
+ diag(<<"EOF");
+Value $idx is out of range:
+Got: [$got_array_ref->[$idx]]
+Expected: [$expected_bottom, $expected_top, $expected_array_ref->[$idx]]
+EOF
+
+ last;
+ }
+ }
+ }
+ ok($success, $blurb);
+}
+
+sub compare_hash_by_ranges
+{
+ local $Test::Builder::Level = $Test::Builder::Level + 1;
+
+ my $got_hash_ref = shift;
+ my $expected = shift;
+ my $blurb = shift;
+
+ my $got =
+ [
+ map { [$_, $got_hash_ref->{$_} ] }
+ sort { $a <=> $b }
+ keys(%$got_hash_ref)
+ ]
+ ;
+
+ my $success = 1;
+
+ if (scalar(@$expected) != scalar(@$got))
+ {
+ $success = 0;
+ diag("Number of keys differ in hashes.");
+ }
+ else
+ {
+ COMPARE_KEYS:
+ for my $idx (0 .. $#$got)
+ {
+ my ($got_key, $got_val) = @{$got->[$idx]};
+ my ($expected_bottom, $expected_top, $expected_val)
+ = @{$expected->[$idx]};
+
+ if (! ( ($got_key >= $expected_bottom)
+ && ($got_key <= $expected_top)
+ && ($got_val == $expected_val)
+ )
+ )
+ {
+ $success = 0;
+ diag(<<"EOF");
+Key/Val pair No. $idx is out of range or wrong:
+Got: [$got_key, $got_val]
+Expected: [$expected_bottom, $expected_top, $expected_val]
+EOF
+
+ last COMPARE_KEYS;
+ }
+ }
+ }
+
+ ok($success, $blurb);
+}
+
+sub _is_between {
+ my ($have, $want_bottom, $want_top,) = @_;
+
+ return (($have >= $want_bottom) && ($want_top >= $have));
+}
+
+1;
+
+=pod
+
+=head1 AUTHOR
+
+Shlomi Fish, L<http://www.shlomifish.org/> , C<shlomif at cpan.org>
+
+=head1 COPYRIGHT
+
+Copyright(c) 2012 by Shlomi Fish.
+
+=head1 LICENSE
+
+This file is licensed under the MIT/X11 License:
+http://www.opensource.org/licenses/mit-license.php.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+=cut
diff --git a/t/median_absolute_deviation.t b/t/median_absolute_deviation.t
new file mode 100644
index 0000000..a2112d9
--- /dev/null
+++ b/t/median_absolute_deviation.t
@@ -0,0 +1,26 @@
+#!/usr/bin/perl
+
+use strict;
+use warnings;
+
+use Test::More tests => 2;
+
+use Statistics::Descriptive;
+
+{
+ my $stat = Statistics::Descriptive::Full->new();
+
+ $stat->add_data( 1, 1, 1, 2, 2, 2, 2, 4, 7 );
+
+ my $mad = $stat->median_absolute_deviation();
+
+ # TEST
+ ok (defined($mad),
+ "median_absolute_deviation is not undefined"
+ );
+
+ # TEST
+ is($mad, 1,
+ "median_absolute_deviation is correct"
+ );
+}
diff --git a/t/outliers.t b/t/outliers.t
new file mode 100644
index 0000000..3a885bc
--- /dev/null
+++ b/t/outliers.t
@@ -0,0 +1,144 @@
+#!/usr/bin/perl
+
+use strict;
+use warnings;
+
+use Test::More tests => 11;
+
+use Statistics::Descriptive;
+
+sub foo {return;};
+
+local $SIG{__WARN__} = sub { };
+
+{
+ # testing set_outlier_filter
+ my $stat = Statistics::Descriptive::Full->new();
+
+ # TEST
+ ok ( !defined($stat->set_outlier_filter()), 'set_outlier_filter: undef code reference value');
+ # TEST
+ ok ( !defined($stat->set_outlier_filter(1)), 'set_outlier_filter: invalid code ref value');
+
+ # TEST
+ is ( $stat->set_outlier_filter(\&foo), 1, 'set_outlier_filter: valid code reference - return value');
+ # TEST
+ is ( $stat->{_outlier_filter}, \&foo, 'set_outlier_filter: valid code reference - internal');
+
+}
+
+{
+ # testing get_data_without_outliers without removing outliers
+ my $stat = Statistics::Descriptive::Full->new();
+
+ # TEST
+ ok ( !defined($stat->get_data_without_outliers()), 'get_data_without_outliers: insufficient samples');
+
+ $stat->add_data( 1, 2, 3, 4, 5 );
+ # TEST
+ ok ( !defined($stat->get_data_without_outliers()), 'get_data_without_outliers: undefined filter');
+
+ # We force the filter function to never detect outliers...
+ $stat->set_outlier_filter( sub {0} );
+
+ no warnings 'redefine';
+ local *Statistics::Descriptive::Full::_outlier_candidate_index = sub { 0 };
+ my @results = $stat->get_data_without_outliers();
+
+ #...we expect the data set to be unmodified
+ # TEST
+ is_deeply (
+ [@results],
+ [1, 2, 3, 4, 5],
+ 'get_data_without_outliers: no outliers',
+ );
+
+}
+
+{
+ # testing get_data_without_outliers removing outliers
+ my $stat = Statistics::Descriptive::Full->new();
+
+ # 100 is definitively the candidate to be an outlier in this series
+ $stat->add_data( 1, 2, 3, 4, 100, 6, 7, 8 );
+
+ # We force the filter function to always detect outliers for this data set
+ $stat->set_outlier_filter( sub {$_[1] > 0} );
+ my @results = $stat->get_data_without_outliers();
+
+ # Note that 100 has been filtered out from the data set
+ # TEST
+ is_deeply (
+ [@results],
+ [1, 2, 3, 4, 6, 7, 8, ],
+ 'get_data_without_outliers: remove outliers',
+ );
+
+}
+
+my ($first_val, $second_val);
+sub check_params { ($first_val, $second_val) = @_; }
+
+{
+ # testing params passed to outlier filter
+ my $stat = Statistics::Descriptive::Full->new();
+
+ # 100 is definitively the candidate to be an outlier in this series
+ $stat->add_data( 1, 2, 3, 4, 100, 6, 7, 8 );
+
+ $stat->set_outlier_filter( \&check_params );
+ my @results = $stat->get_data_without_outliers();
+
+ # TEST
+ isa_ok ($first_val, 'Statistics::Descriptive::Full', 'first param of outlier filter ok');
+ # TEST
+ is ($second_val, 100, 'second param of outlier filter ok');
+
+}
+
+{
+ # testing _outlier_candidate_index
+ my $stat = Statistics::Descriptive::Full->new();
+
+ # 100 is definitively the candidate to be an outlier in this series
+ $stat->add_data( 1, 2, 3, 4, 100, 6, 7, 8 );
+
+ # TEST
+ is ($stat->_outlier_candidate_index, 4, '_outlier_candidate_index' );
+
+}
+
+=pod
+
+=head1 AUTHOR
+
+Fabio Ponciroli
+
+=head1 COPYRIGHT
+
+Copyright(c) 2012 by Fabio Ponciroli.
+
+=head1 LICENSE
+
+This file is licensed under the MIT/X11 License:
+http://www.opensource.org/licenses/mit-license.php.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+=cut
diff --git a/t/quantile.t b/t/quantile.t
index e6c0134..e28d3bd 100644
--- a/t/quantile.t
+++ b/t/quantile.t
@@ -2,7 +2,7 @@
#==================================================================
# Author : Djibril Ousmanou
# Copyright : 2009
-# Update : 20/07/2009
+# Update : 20/07/2009
# AIM : Test quantile type 7 calcul
#==================================================================
use strict;
diff --git a/t/smoother.t b/t/smoother.t
new file mode 100644
index 0000000..9e7f9e1
--- /dev/null
+++ b/t/smoother.t
@@ -0,0 +1,165 @@
+#!/usr/bin/perl
+
+use strict;
+use warnings;
+
+use Test::More tests => 11;
+
+use Statistics::Descriptive::Smoother;
+
+local $SIG{__WARN__} = sub { };
+
+{
+
+ #Test factory pattern
+ my $smoother = Statistics::Descriptive::Smoother->instantiate({
+ method => 'exponential',
+ coeff => 0,
+ data => [1,2,3],
+ samples => [100, 100, 100],
+ });
+
+ # TEST
+ isa_ok ($smoother, 'Statistics::Descriptive::Smoother::Exponential', 'Exponential class correctly created');
+}
+
+{
+
+ my $smoother = Statistics::Descriptive::Smoother->instantiate({
+ method => 'weightedExponential',
+ coeff => 0,
+ data => [1,2,3],
+ samples => [100, 100, 100],
+ });
+
+ # TEST
+ isa_ok ($smoother, 'Statistics::Descriptive::Smoother::Weightedexponential', 'Weightedexponential class correctly created');
+
+}
+
+{
+
+ # Test invalid smoothing method
+ eval
+ {
+ Statistics::Descriptive::Smoother->instantiate({
+ method => 'invalid_method',
+ coeff => 0,
+ data => [1,2,3],
+ samples => [100, 100, 100],
+ });
+ };
+
+ # TEST
+ ok ($@, 'Invalid method');
+
+}
+
+{
+
+ #TODO get output from Carp
+ #Test invalid coefficient
+ my $smoother_neg = Statistics::Descriptive::Smoother->instantiate({
+ method => 'exponential',
+ coeff => -123,
+ data => [1,2,3],
+ samples => [100, 100, 100],
+ });
+
+ # TEST
+ is ($smoother_neg, undef, 'Invalid coefficient: < 0');
+
+ my $smoother_pos = Statistics::Descriptive::Smoother->instantiate({
+ method => 'exponential',
+ coeff => 123,
+ data => [1,2,3],
+ samples => [100, 100, 100],
+ });
+
+ # TEST
+ is ($smoother_pos, undef, 'Invalid coefficient: > 1');
+
+}
+
+{
+
+ #Test unsufficient number of samples
+ my $smoother = Statistics::Descriptive::Smoother->instantiate({
+ method => 'exponential',
+ coeff => 0,
+ data => [1],
+ samples => [100],
+ });
+
+ # TEST
+ is ($smoother, undef, 'Insufficient number of samples');
+
+}
+
+{
+
+ #Test smoothing coefficient accessors
+ my $smoother = Statistics::Descriptive::Smoother->instantiate({
+ method => 'exponential',
+ coeff => 0.5,
+ data => [1,2,3],
+ samples => [100, 100, 100],
+ });
+
+ # TEST
+ is ($smoother->get_smoothing_coeff(), 0.5, 'get_smoothing_coeff');
+
+ my $ok = $smoother->set_smoothing_coeff(0.7);
+
+ # TEST
+ ok ($ok, 'set_smoothing_coeff: set went fine');
+
+ # TEST
+ is ($smoother->get_smoothing_coeff(), 0.7, 'set_smoothing_coeff: correct value set');
+
+ my $ok2 = $smoother->set_smoothing_coeff(123);
+
+ # TEST
+ is ($ok2, undef, 'set_smoothing_coeff: set failed');
+
+ # TEST
+ is ($smoother->get_smoothing_coeff(), 0.7, 'set_smoothing_coeff: value not modified after failure');
+
+}
+
+1;
+
+=pod
+
+=head1 AUTHOR
+
+Fabio Ponciroli
+
+=head1 COPYRIGHT
+
+Copyright(c) 2012 by Fabio Ponciroli.
+
+=head1 LICENSE
+
+This file is licensed under the MIT/X11 License:
+http://www.opensource.org/licenses/mit-license.php.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+=cut
diff --git a/t/smoother_exponential.t b/t/smoother_exponential.t
new file mode 100644
index 0000000..b76f456
--- /dev/null
+++ b/t/smoother_exponential.t
@@ -0,0 +1,114 @@
+#!/usr/bin/perl
+
+use strict;
+use warnings;
+
+use lib 't/lib';
+use Utils qw/is_array_between/;
+
+use Test::More tests => 3;
+
+use Statistics::Descriptive::Smoother;
+
+my @original_data = (1 .. 10);
+my @original_samples = (3, 3, 3, 3, 3, 3, 3, 3, 3, 3,);
+
+{
+
+ #Test no smoothing
+ my $smoother = Statistics::Descriptive::Smoother->instantiate({
+ method => 'exponential',
+ coeff => 0,
+ data => \@original_data,
+ samples => \@original_samples,
+ });
+
+ my @smoothed_data = $smoother->get_smoothed_data();
+
+ # When the smoothing coefficient is 0 the series is not smoothed
+ # TEST
+ is_deeply( \@smoothed_data, \@original_data, 'No smoothing C=0');
+}
+
+{
+
+ #Test max smoothing
+ my $smoother = Statistics::Descriptive::Smoother->instantiate({
+ method => 'exponential',
+ coeff => 1,
+ data => \@original_data,
+ samples => \@original_samples,
+ });
+
+ my @smoothed_data = $smoother->get_smoothed_data();
+
+ # When the smoothing coefficient is 1 the series is universally equal to the initial unsmoothed value
+ my @expected_values = map { $original_data[0] } 1 .. $smoother->{count};
+ # TEST
+ is_deeply( \@smoothed_data, \@expected_values, 'Max smoothing C=1');
+}
+
+{
+
+ #Test smoothing coeff 0.5
+ my $smoother = Statistics::Descriptive::Smoother->instantiate({
+ method => 'exponential',
+ coeff => 0.5,
+ data => \@original_data,
+ samples => \@original_samples,
+ });
+
+ my @smoothed_data = $smoother->get_smoothed_data();
+ my @expected_values = (
+ 1,
+ 1.5,
+ 2.25,
+ 3.125,
+ 4.0625,
+ 5.03125,
+ 6.015625,
+ 7.0078125,
+ 8.00390625,
+ 9.001953125,
+ );
+
+ # TEST
+ is_array_between( \@smoothed_data, \@expected_values, 1E-13, 1E-13, 'Smoothing with C=0.5');
+}
+
+1;
+
+=pod
+
+=head1 AUTHOR
+
+Fabio Ponciroli
+
+=head1 COPYRIGHT
+
+Copyright(c) 2012 by Fabio Ponciroli.
+
+=head1 LICENSE
+
+This file is licensed under the MIT/X11 License:
+http://www.opensource.org/licenses/mit-license.php.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+=cut
diff --git a/t/smoother_weightedexponential.t b/t/smoother_weightedexponential.t
new file mode 100644
index 0000000..d58b152
--- /dev/null
+++ b/t/smoother_weightedexponential.t
@@ -0,0 +1,131 @@
+#!/usr/bin/perl
+
+use strict;
+use warnings;
+
+use lib 't/lib';
+use Utils qw/is_array_between/;
+
+use Test::More tests => 4;
+
+use Statistics::Descriptive::Smoother;
+
+local $SIG{__WARN__} = sub { };
+
+my @original_data = (1 .. 10);
+my @original_samples = (100, 50, 100, 50, 100, 50, 100, 50, 100, 50,);
+
+{
+
+ #Test no smoothing
+ my $smoother = Statistics::Descriptive::Smoother->instantiate({
+ method => 'weightedExponential',
+ coeff => 0,
+ data => \@original_data,
+ samples => \@original_samples,
+ });
+
+ my @smoothed_data = $smoother->get_smoothed_data();
+
+ # When the smoothing coefficient is 0 the series is not smoothed
+ # TEST
+ is_deeply( \@smoothed_data, \@original_data, 'No smoothing C=0');
+}
+
+{
+
+ #Test max smoothing
+ my $smoother = Statistics::Descriptive::Smoother->instantiate({
+ method => 'weightedExponential',
+ coeff => 1,
+ data => \@original_data,
+ samples => \@original_samples,
+ });
+
+ my @smoothed_data = $smoother->get_smoothed_data();
+
+ # When the smoothing coefficient is 1 the series is universally equal to the initial unsmoothed value
+ my @expected_values = map { $original_data[0] } 1 .. $smoother->{count};
+ # TEST
+ is_deeply( \@smoothed_data, \@expected_values, 'Max smoothing C=1');
+}
+
+{
+
+ #Test smoothing coeff 0.5
+ my $smoother = Statistics::Descriptive::Smoother->instantiate({
+ method => 'weightedExponential',
+ coeff => 0.5,
+ data => \@original_data,
+ samples => \@original_samples,
+ });
+
+ my @smoothed_data = $smoother->get_smoothed_data();
+ my @expected_values = (
+ 1,
+ 1.33333333333333,
+ 2.24242424242424,
+ 2.85944551901999,
+ 4.0651836704636,
+ 4.75526654493058,
+ 6.03174342835728,
+ 6.7367839208657,
+ 8.02706266125788,
+ 8.73457937329917,
+ );
+
+ # TEST
+ is_array_between( \@smoothed_data, \@expected_values, 1E-13, 1E-13, 'Smoothing with C=0.5');
+}
+
+{
+
+ #Test different number of samples and data are not allowed
+ my $smoother = Statistics::Descriptive::Smoother->instantiate({
+ method => 'weightedExponential',
+ coeff => 0,
+ data => [1,2,3,4],
+ samples => [1,2,3],
+ });
+
+ # TEST
+ is ( $smoother, undef, 'Different number of samples and data');
+}
+
+
+1;
+
+=pod
+
+=head1 AUTHOR
+
+Fabio Ponciroli
+
+=head1 COPYRIGHT
+
+Copyright(c) 2012 by Fabio Ponciroli.
+
+=head1 LICENSE
+
+This file is licensed under the MIT/X11 License:
+http://www.opensource.org/licenses/mit-license.php.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+=cut
diff --git a/t/style-trailing-space.t b/t/style-trailing-space.t
new file mode 100644
index 0000000..7b8ece4
--- /dev/null
+++ b/t/style-trailing-space.t
@@ -0,0 +1,29 @@
+#!/usr/bin/perl
+
+use strict;
+use warnings;
+
+use Test::More;
+
+eval "use Test::TrailingSpace";
+if ($@)
+{
+ plan skip_all => "Test::TrailingSpace required for trailing space test.";
+}
+else
+{
+ plan tests => 1;
+}
+
+my $finder = Test::TrailingSpace->new(
+ {
+ root => '.',
+ filename_regex => qr/(?:(?:\.(?:t|pm|pl|PL|yml|json|arc|vim))|README|Changes|LICENSE|MANIFEST)\z/,
+ },
+);
+
+# TEST
+$finder->no_trailing_space(
+ "No trailing space was found."
+);
+
--
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/pkg-perl/packages/libstatistics-descriptive-perl.git
More information about the Pkg-perl-cvs-commits
mailing list