moocore.apply_within_sets#

moocore.apply_within_sets(x, sets, func, **kwargs)[source]#

Split x by row according to sets and apply func to each sub-array.

Parameters:
  • x (ArrayLike) – 2D array to be divided into sub-arrays.

  • sets (ArrayLike) – A list or 1D array of length equal to the number of rows of x. The values are used as-is to determine the groups and do not need to be sorted.

  • func (Callable[..., Any]) – A function that can take a 2D array as input. This function may return (1) a 2D array with the same number of rows as the input, (2) a 1D array as long as the number of input rows, (3) a scalar value, or (4) a 2D array with a single row.

  • kwargs – Additional keyword arguments to func.

Returns:

ndarray – An array whose shape depends on the output of func. See Examples below.

Examples

>>> sets = np.array([3, 1, 2, 4, 2, 3, 1])
>>> x = np.arange(len(sets) * 2).reshape(-1, 2)
>>> x = np.hstack((x, sets.reshape(-1, 1)))

If func returns an array with the same number of rows as the input (case 1), then the output is ordered in exactly the same way as the input.

>>> moocore.apply_within_sets(x, sets, lambda x: x)
array([[ 0,  1,  3],
       [ 2,  3,  1],
       [ 4,  5,  2],
       [ 6,  7,  4],
       [ 8,  9,  2],
       [10, 11,  3],
       [12, 13,  1]])

This is also the behavior if func returns a 1D array with one value per input row (case 2).

>>> moocore.apply_within_sets(x, sets, lambda x: x.sum(axis=1))
array([ 4,  6, 11, 17, 19, 24, 26])

If func returns a single scalar (case 3) or a 2D array with a single row (case 4), then the order of the output is the order of the unique values as found in sets, without sorting the unique values, which is what pandas.Series.unique() returns and NOT what numpy.unique() returns.

>>> moocore.apply_within_sets(x, sets, lambda x: x.max())
array([11, 13,  9,  7])
>>> moocore.apply_within_sets(x, sets, lambda x: [x.max(axis=0)])
array([[10, 11,  3],
       [12, 13,  1],
       [ 8,  9,  2],
       [ 6,  7,  4]])

In the previous example, func returns a 2D array with a single row. The following will produce an error because it returns a 1D array, which is interpreted as case 2, but the number of values does not match the number of input rows.

>>> moocore.apply_within_sets(
...     x, sets, lambda x: x.max(axis=0)
... )  
Traceback (most recent call last):
    ...
ValueError: `func` returned an array of length 3 but the input has length 2 for rows [0 5]