Using moocore with Pandas#

This example shows how to use moocore functions with Pandas (https://pandas.pydata.org/). This example requires pandas version >= 2.0.0

import moocore
import pandas as pd

print(f"pandas version: {pd.__version__}")
pandas version: 2.2.3

First, we create a toy Pandas DataFrame.

df = pd.DataFrame(
    dict(
        obj1=[1, 2, 3, 4, 5],
        obj2=[5, 4, 3, 2, 1],
        obj3=[100, 200, 200, 300, 100],
        algo=2 * ["foo"] + 2 * ["bar"] + ["foo"],
    )
)
df
obj1 obj2 obj3 algo
0 1 5 100 foo
1 2 4 200 foo
2 3 3 200 bar
3 4 2 300 bar
4 5 1 100 foo


Normalize it (only replace the objective columns!).

obj_cols = ["obj1", "obj2", "obj3"]
df[obj_cols] = moocore.normalise(df[obj_cols], to_range=[1, 2])
df
obj1 obj2 obj3 algo
0 1.00 2.00 1.0 foo
1 1.25 1.75 1.5 foo
2 1.50 1.50 1.5 bar
3 1.75 1.25 2.0 bar
4 2.00 1.00 1.0 foo


Calculate the hypervolume for each algo using groupby() and apply().

ref = 2.1
hv = (
    df.groupby("algo")[obj_cols]
    .apply(moocore.hypervolume, ref=ref)
    .reset_index(name="hv")
)
hv
algo hv
0 bar 0.22475
1 foo 0.34350


Or we can just use:

hv = moocore.apply_within_sets(
    df[obj_cols], df["algo"], moocore.hypervolume, ref=ref
)
hv
array([0.3435 , 0.22475])

moocore.apply_within_sets() processes each group in order, even if the elements of the same group are not contiguous. That is, it processes the groups like pandas.Series.unique() and not like set or numpy.unique().

df["algo"].unique()
array(['foo', 'bar'], dtype=object)

If we have multiple columns that we want to use to define the sets, such as algo and run:

df = pd.DataFrame(
    dict(
        obj1=[1, 2, 3, 4, 5, 6, 5, 4, 3, 1],
        obj2=[6, 5, 4, 3, 2, 1, 5, 4, 5, 6],
        obj3=[1, 2, 3, 4, 5, 6, 6, 7, 5, 2],
        algo=["a"] * 3 + ["b"] * 3 + ["a", "b"] * 2,
        run=[1, 1, 2, 1, 1, 2, 2, 2, 1, 1],
    )
)
obj_cols = ["obj1", "obj2", "obj3"]
df
obj1 obj2 obj3 algo run
0 1 6 1 a 1
1 2 5 2 a 1
2 3 4 3 a 2
3 4 3 4 b 1
4 5 2 5 b 1
5 6 1 6 b 2
6 5 5 6 a 2
7 4 4 7 b 2
8 3 5 5 a 1
9 1 6 2 b 1


We can still use groupby() but we may need to reset and clean-up the index.

df.groupby(["algo", "run"])[obj_cols].apply(
    moocore.filter_dominated
).reset_index(level=["algo", "run"])
algo run obj1 obj2 obj3
0 a 1 1 6 1
1 a 1 2 5 2
2 a 2 3 4 3
3 b 1 4 3 4
4 b 1 5 2 5
9 b 1 1 6 2
5 b 2 6 1 6
7 b 2 4 4 7


Or we can combine the multiple columns as one to define the sets:

sets = df["algo"].astype(str) + "-" + df["run"].astype(str)
sets
0    a-1
1    a-1
2    a-2
3    b-1
4    b-1
5    b-2
6    a-2
7    b-2
8    a-1
9    b-1
dtype: object

then identify nondominated rows within each set:

is_nondom = moocore.is_nondominated_within_sets(df[obj_cols], sets=sets)
is_nondom
array([ True,  True,  True,  True,  True,  True, False,  True, False,
        True])

And use the boolean vector above to filter rows:

df[is_nondom]
obj1 obj2 obj3 algo run
0 1 6 1 a 1
1 2 5 2 a 1
2 3 4 3 a 2
3 4 3 4 b 1
4 5 2 5 b 1
5 6 1 6 b 2
7 4 4 7 b 2
9 1 6 2 b 1


Total running time of the script: (0 minutes 0.197 seconds)

Gallery generated by Sphinx-Gallery