Note
Go to the end to download the full example code.
Using moocore with Pandas#
This example shows how to use moocore functions with Pandas (https://pandas.pydata.org/). This example requires pandas version >= 2.0.0
import moocore
import pandas as pd
print(f"pandas version: {pd.__version__}")
pandas version: 2.3.3
First, we create a toy Pandas DataFrame.
df = pd.DataFrame(
    dict(
        obj1=[1, 2, 3, 4, 5],
        obj2=[5, 4, 3, 2, 1],
        obj3=[100, 200, 200, 300, 100],
        algo=2 * ["foo"] + 2 * ["bar"] + ["foo"],
    )
)
df
Normalize it (only replace the objective columns!).
obj_cols = ["obj1", "obj2", "obj3"]
df[obj_cols] = moocore.normalise(df[obj_cols], to_range=[1, 2])
df
Calculate the hypervolume for each algo using groupby() and apply().
ref = 2.1
hv = (
    df.groupby("algo")[obj_cols]
    .apply(moocore.hypervolume, ref=ref)
    .reset_index(name="hv")
)
hv
Or we can just use:
hv = moocore.apply_within_sets(
    df[obj_cols], df["algo"], moocore.hypervolume, ref=ref
)
hv
array([0.3435 , 0.22475])
moocore.apply_within_sets() processes each group in
order, even if the elements of the same group are not contiguous. That is, it
processes the groups like pandas.Series.unique() and not like
set or numpy.unique().
df["algo"].unique()
array(['foo', 'bar'], dtype=object)
If we have multiple columns that we want to use to define the sets, such as algo and run:
df = pd.DataFrame(
    dict(
        obj1=[1, 2, 3, 4, 5, 6, 5, 4, 3, 1],
        obj2=[6, 5, 4, 3, 2, 1, 5, 4, 5, 6],
        obj3=[1, 2, 3, 4, 5, 6, 6, 7, 5, 2],
        algo=["a"] * 3 + ["b"] * 3 + ["a", "b"] * 2,
        run=[1, 1, 2, 1, 1, 2, 2, 2, 1, 1],
    )
)
obj_cols = ["obj1", "obj2", "obj3"]
df
We can still use groupby() but we may need to reset and clean-up the index.
df.groupby(["algo", "run"])[obj_cols].apply(
    moocore.filter_dominated
).reset_index(level=["algo", "run"])
Or we can combine the multiple columns as one to define the sets:
sets = df["algo"].astype(str) + "-" + df["run"].astype(str)
sets
0    a-1
1    a-1
2    a-2
3    b-1
4    b-1
5    b-2
6    a-2
7    b-2
8    a-1
9    b-1
dtype: object
then identify nondominated rows within each set:
is_nondom = moocore.is_nondominated_within_sets(df[obj_cols], sets=sets)
is_nondom
array([ True,  True,  True,  True,  True,  True, False,  True, False,
        True])
And use the boolean vector above to filter rows:
df[is_nondom]
This is different from calculating the nondominated set over all sets:
moocore.filter_dominated(df[obj_cols])
Total running time of the script: (0 minutes 0.023 seconds)