[Git][debian-gis-team/cftime][master] 4 commits: New upstream version 1.6.0+ds

Fri Mar 4 06:10:14 GMT 2022


Bas Couwenberg pushed to branch master at Debian GIS Project / cftime


Commits:
72b4e4f0 by Bas Couwenberg at 2022-03-04T06:51:44+01:00
New upstream version 1.6.0+ds
- - - - -
5eb68cb4 by Bas Couwenberg at 2022-03-04T06:51:45+01:00
Update upstream source from tag 'upstream/1.6.0+ds'

Update to upstream version '1.6.0+ds'
with Debian dir f556abe2c60a35f3fe78f2e869efe30478525b14
- - - - -
b90b4f64 by Bas Couwenberg at 2022-03-04T06:53:42+01:00
New upstream release.

- - - - -
1d4e1c6b by Bas Couwenberg at 2022-03-04T06:54:11+01:00
Set distribution to unstable.

- - - - -


7 changed files:

- + .github/workflows/build.yml
- .github/workflows/miniconda.yml
- Changelog
- README.md
- debian/changelog
- src/cftime/_cftime.pyx
- test/test_cftime.py


Changes:

=====================================
.github/workflows/build.yml
=====================================
@@ -0,0 +1,34 @@
+name: Build and test with development python
+on: [push, pull_request]
+jobs:
+  build-linux:
+    name: Python (${{ matrix.python-version }})
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version: ["3.11-dev"]
+    steps:
+
+    - uses: actions/checkout at v2
+
+    - name: Set up Python ${{ matrix.python-version }}
+      uses: actions/setup-python at v2
+      with:
+        python-version: ${{ matrix.python-version }}
+
+    - name: Update Pip
+      run: |
+        python -m pip install --upgrade pip
+
+    - name: Install cftime dependencies via pip
+      run: |
+        python -m pip install -r requirements.txt
+        python -m pip install -r requirements-dev.txt
+
+    - name: Install cftime
+      run: |
+        python setup.py install
+
+    - name: Test cftime
+      run: |
+        py.test -vv test


=====================================
.github/workflows/miniconda.yml
=====================================
@@ -10,7 +10,7 @@ jobs:
     runs-on: ${{ matrix.os }}
     strategy:
       matrix:
-        python-version: [ "3.7", "3.8", "3.9", "3.10"]
+        python-version: [ "3.7", "3.8", "3.9", "3.10" ]
         os: [windows-latest, ubuntu-latest, macos-latest]
         platform: [x64, x32]
 #  debug on a single os/platform/python version


=====================================
Changelog
=====================================
@@ -1,3 +1,11 @@
+version 1.6.0 (release tag v1.6.0rel)
+=====================================
+ * fix for masked array inputs (issue #267).
+ * improved performance of the num2date algorithm, in some cases providing
+   an over 100x speedup (issue #269, PR#270).
+ * fix for date2index for select != 'exact' when select='exact' works (issue
+   #272, PR#273)
+
 version 1.5.2 (release tag v1.5.2rel)
 =====================================
  * silently change calendar='gregorian' to 'standard' internally, 


=====================================
README.md
=====================================
@@ -12,6 +12,8 @@ Time-handling functionality from netcdf4-python
 ## News
 For details on the latest updates, see the [Changelog](https://github.com/Unidata/cftime/blob/master/Changelog).
 
+3/4/2022:  Version 1.6.0 released.  Big speed-ups for num2date, date2index bugfix for select != 'exact' when select='exact' works, fix for date2num with masked array inputs.
+
 1/22/2022: Version 1.5.2 released (wheels for Apple M1 available on pypi for python 3.8,3.9 and 3.10). is_leap_year
 function added (issue #259).
 


=====================================
debian/changelog
=====================================
@@ -1,3 +1,9 @@
+cftime (1.6.0+ds-1) unstable; urgency=medium
+
+  * New upstream release.
+
+ -- Bas Couwenberg <sebastic at debian.org>  Fri, 04 Mar 2022 06:53:59 +0100
+
 cftime (1.5.2+ds-1) unstable; urgency=medium
 
   * New upstream release.


=====================================
src/cftime/_cftime.pyx
=====================================
@@ -38,7 +38,7 @@ cdef int[12] _dayspermonth_leap = [31, 29, 31, 30, 31, 30, 31, 31, 30, 31, 30, 3
 cdef int[13] _cumdayspermonth = [0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334, 365]
 cdef int[13] _cumdayspermonth_leap = [0, 31, 60, 91, 121, 152, 182, 213, 244, 274, 305, 335, 366]
 
-__version__ = '1.5.2'
+__version__ = '1.6.0'
 
 # Adapted from http://delete.me.uk/2005/03/iso8601.html
 # Note: This regex ensures that all ISO8601 timezone formats are accepted - but, due to legacy support for other timestrings, not all incorrect formats can be rejected.
@@ -254,8 +254,12 @@ def date2num(dates,units,calendar=None,has_year_zero=None):
         use_python_datetime = False
         # convert basedate to specified calendar
         basedate =  to_calendar_specific_datetime(basedate, calendar, False, has_year_zero=has_year_zero)
-    times = []; n = 0
-    for date in dates.flat:
+    times = []
+    for n, date in enumerate(dates.flat):
+        if ismasked and mask.flat[n]:
+            times.append(None)
+            continue
+
         # use python datetime if possible.
         if use_python_datetime:
             # remove time zone offset
@@ -263,20 +267,18 @@ def date2num(dates,units,calendar=None,has_year_zero=None):
                 date = date.replace(tzinfo=None) - date.utcoffset()
         else: # convert date to same calendar specific cftime.datetime instance
             date = to_calendar_specific_datetime(date, calendar, False, has_year_zero=has_year_zero)
-        if ismasked and mask.flat[n]:
-            times.append(None)
+
+        td = date - basedate
+        if td % unit_timedelta == timedelta(0):
+            # Explicitly cast result to np.int64 for Windows compatibility
+            quotient = np.int64(td // unit_timedelta)
+            times.append(quotient)
         else:
-            td = date - basedate
-            if td % unit_timedelta == timedelta(0):
-                # Explicitly cast result to np.int64 for Windows compatibility
-                quotient = np.int64(td // unit_timedelta)
-                times.append(quotient)
-            else:
-                times.append(td / unit_timedelta)
-        n += 1
+            times.append(td / unit_timedelta)
+                
     if ismasked: # convert to masked array if input was masked array
-        times = np.array(times)
-        times = np.ma.masked_where(times==None,times)
+        times = np.array(times, dtype=float)  # None -> nan
+        times = np.ma.masked_invalid(times)
         if isscalar:
             return times[0]
         else:
@@ -424,6 +426,52 @@ def scale_times(num, factor):
         else:
             return num * factor
 
+
+def decode_date_from_scalar(time_in_microseconds, basedate):
+    """Decode a date from a scalar input."""
+    delta = time_in_microseconds.astype("timedelta64[us]").astype(timedelta)
+    try:
+        return basedate + delta
+    except OverflowError:
+        raise ValueError("OverflowError in datetime, possibly because year < datetime.MINYEAR")
+
+
+def decode_dates_from_array(times_in_microseconds, basedate):
+    """Decode values encoded by an integer array in units of microseconds to dates.
+    
+    This is an optimized algorithm that operates by flattening and sorting the input
+    array of integers, decoding the first date using the original base date, and then
+    incrementally adding timedeltas to decode the rest of the dates in the array.  This
+    is an optimal approach, because it minimizes the length of the timedeltas used in
+    each addition operation required to decode the times (timedelta addition is the rate
+    limiting step in the process).  The original order of the elements and shape of the
+    array are restored at the end.  The sorting and unsorting steps add only a small
+    overhead.  See discussion and timing results in GitHub issue 269.
+    """
+    original_shape = times_in_microseconds.shape
+    times_in_microseconds = times_in_microseconds.ravel()
+
+    sort_indices = np.argsort(times_in_microseconds)
+    unsort_indices = np.argsort(sort_indices)
+    times_in_microseconds = times_in_microseconds[sort_indices]
+
+    # We first cast to the np.timedelta64[us] dtype out of convenience, but ultimately
+    # cast to datetime.timedelta objects for operations with cftime objects (we cannot
+    # cast from integers to datetime.timedelta objects directly).
+    deltas = times_in_microseconds.astype("timedelta64[us]")
+    differential_deltas = np.diff(deltas).astype(timedelta)
+
+    dates = np.empty(times_in_microseconds.shape, dtype="O")
+    try:
+        dates[0] = basedate + deltas[0].astype(timedelta)
+        for i in range(len(differential_deltas)):
+            dates[i + 1] = dates[i] + differential_deltas[i]
+    except OverflowError:
+        raise ValueError("OverflowError in datetime, possibly because year < datetime.MINYEAR")
+
+    return dates[unsort_indices].reshape(original_shape)
+
+
 @cython.embedsignature(True)
 def num2date(
     times,
@@ -535,14 +583,18 @@ def num2date(
     scaled_times = scale_times(times, factor)
     scaled_times = cast_to_int(scaled_times,units=unit)
 
-    # Through np.timedelta64, convert integers scaled to have units of
-    # microseconds to datetime.timedelta objects, the timedelta type compatible
-    # with all cftime.datetime objects.
-    deltas = scaled_times.astype("timedelta64[us]").astype(timedelta)
-    try:
-        return basedate + deltas
-    except OverflowError:
-        raise ValueError("OverflowError in datetime, possibly because year < datetime.MINYEAR")
+    if scaled_times.ndim == 0 or scaled_times.size == 0:
+        return decode_date_from_scalar(scaled_times, basedate)
+    else:
+        if isinstance(scaled_times, np.ma.MaskedArray):
+            # The algorithm requires data be present for all values. To handle this, we fill 
+            # masked values with 0 temporarily and then restore the mask at the end.
+            original_mask = np.ma.getmask(scaled_times)
+            scaled_times = scaled_times.filled(0)
+            dates = decode_dates_from_array(scaled_times, basedate)
+            return np.ma.MaskedArray(dates, mask=original_mask)
+        else:
+            return decode_dates_from_array(scaled_times, basedate)
 
 
 @cython.embedsignature(True)
@@ -839,6 +891,20 @@ def time2index(times, nctime, calendar=None, select='exact'):
     if calendar == None:
         calendar = getattr(nctime, 'calendar', 'standard')
 
+    if select != 'exact':
+        # if select works, then 'nearest' == 'exact', 'before' == 'exact'-1 and
+        # 'after' == 'exact'+1
+        try:
+            index = time2index(times, nctime, calendar=calendar, select='exact')
+            if select == 'nearest':
+                return index
+            elif select == 'before':
+                return index-1
+            else:
+                return index+1
+        except ValueError:
+            pass
+
     num = np.atleast_1d(times)
     N = len(nctime)
 


=====================================
test/test_cftime.py
=====================================
@@ -961,6 +961,10 @@ class TestDate2index(unittest.TestCase):
         self.standardtime = self.TestTime(datetime(1950, 1, 1), 366, 24,
                                           'hours since 1900-01-01', 'standard')
 
+        self.issue272time = self.TestTime(datetime(1950, 1, 1), 5, 24,
+                                          'hours since 1900-01-01', 'standard')
+        self.issue272time._data=np.array([1053144, 1053150, 1053156, 1053157,
+            1053162],np.int32)
         self.time_vars = {}
         self.time_vars['time'] = CFTimeVariable(
             values=self.standardtime,
@@ -1117,6 +1121,18 @@ class TestDate2index(unittest.TestCase):
                            select='nearest')
         assert(index == 11)
 
+    def test_issue272(self):
+        timeArray = self.issue272time
+        date = datetime(2020, 2, 22, 13)
+        assert(date2index(date, timeArray, calendar="gregorian",
+            select="exact")==3)
+        assert(date2index(date, timeArray, calendar="gregorian",
+            select="before")==2)
+        assert(date2index(date, timeArray, calendar="gregorian",
+            select="after")==4)
+        assert(date2index(date, timeArray, calendar="gregorian",
+            select="nearest")==3)
+
 
 class issue584TestCase(unittest.TestCase):
     """Regression tests for issue #584."""
@@ -2037,6 +2053,53 @@ def test_date2num_num2date_roundtrip(encoding_units, freq, calendar):
         meets_tolerance = np.abs(decoded - times) <= tolerance
         assert np.all(meets_tolerance)
 
+def test_date2num_missing_data():
+    # Masked array
+    a = [
+        cftime.DatetimeGregorian(2000, 12, 1),
+        cftime.DatetimeGregorian(2000, 12, 2),
+        cftime.DatetimeGregorian(2000, 12, 3),
+        cftime.DatetimeGregorian(2000, 12, 4),
+    ]
+    mask = [True, False, True, False]
+    array = np.ma.array(a, mask=mask)
+    out = date2num(array, units="days since 2000-12-01", calendar="standard")
+    assert ((out == np.ma.array([-99, 1, -99, 3] , mask=mask)).all())
+    assert ((out.mask == mask).all())
+
+    # Scalar masked array
+    a = cftime.DatetimeGregorian(2000, 12, 1)
+    mask = True
+    array = np.ma.array(a, mask=mask)
+    out = date2num(array, units="days since 2000-12-01", calendar="standard")
+    assert out is np.ma.masked
+
+
+def test_num2date_preserves_shape():
+    # The optimized num2date algorithm operates on a flattened array.  This
+    # check ensures that the original shape of the times is restored in the 
+    # result.
+    a = np.array([[0, 1, 2], [3, 4, 5]])
+    result = num2date(a, units="days since 2000-01-01", calendar="standard")
+    expected = np.array([cftime.DatetimeGregorian(2000, 1, i) for i in range(1, 7)]).reshape((2, 3))
+    np.testing.assert_equal(result, expected)
+
+
+def test_num2date_preserves_order():
+    # The optimized num2date algorithm sorts the encoded times before decoding them.
+    # This check ensures that the order of the times is restored in the result.
+    a = np.array([1, 0])
+    result = num2date(a, units="days since 2000-01-01", calendar="standard")
+    expected = np.array([cftime.DatetimeGregorian(2000, 1, i) for i in [2, 1]])
+    np.testing.assert_equal(result, expected)
+
+
+def test_num2date_empty_array():
+    a = np.array([[]])
+    result = num2date(a, units="days since 2000-01-01", calendar="standard")
+    expected = np.array([[]], dtype="O")
+    np.testing.assert_equal(result, expected)
+
 
 if __name__ == '__main__':
     unittest.main()



View it on GitLab: https://salsa.debian.org/debian-gis-team/cftime/-/compare/3655dd74d4278aac61d48e8ae31d8db016ffab50...1d4e1c6b6136ac45b7e9bbd0a5908e69f60d4724

-- 
View it on GitLab: https://salsa.debian.org/debian-gis-team/cftime/-/compare/3655dd74d4278aac61d48e8ae31d8db016ffab50...1d4e1c6b6136ac45b7e9bbd0a5908e69f60d4724
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/pkg-grass-devel/attachments/20220304/295e4d92/attachment-0001.htm>