Using moocore with Pandas#

This example shows how to use moocore functions with Pandas (https://pandas.pydata.org/). This example requires pandas version >= 2.0.0

import moocore
import pandas as pd

print(f"pandas version: {pd.__version__}")

pandas version: 2.3.1

First, we create a toy Pandas DataFrame.

df = pd.DataFrame(
    dict(
        obj1=[1, 2, 3, 4, 5],
        obj2=[5, 4, 3, 2, 1],
        obj3=[100, 200, 200, 300, 100],
        algo=2 * ["foo"] + 2 * ["bar"] + ["foo"],
    )
)
df

	obj1	obj2	obj3	algo
0	1	5	100	foo
1	2	4	200	foo
2	3	3	200	bar
3	4	2	300	bar
4	5	1	100	foo

Normalize it (only replace the objective columns!).

obj_cols = ["obj1", "obj2", "obj3"]
df[obj_cols] = moocore.normalise(df[obj_cols], to_range=[1, 2])
df

	obj1	obj2	obj3	algo
0	1.00	2.00	1.0	foo
1	1.25	1.75	1.5	foo
2	1.50	1.50	1.5	bar
3	1.75	1.25	2.0	bar
4	2.00	1.00	1.0	foo

Calculate the hypervolume for each algo using groupby() and apply().

ref = 2.1
hv = (
    df.groupby("algo")[obj_cols]
    .apply(moocore.hypervolume, ref=ref)
    .reset_index(name="hv")
)
hv

	algo	hv
0	bar	0.22475
1	foo	0.34350

Or we can just use:

hv = moocore.apply_within_sets(
    df[obj_cols], df["algo"], moocore.hypervolume, ref=ref
)
hv

array([0.3435 , 0.22475])

moocore.apply_within_sets() processes each group in order, even if the elements of the same group are not contiguous. That is, it processes the groups like pandas.Series.unique() and not like set or numpy.unique().

df["algo"].unique()

array(['foo', 'bar'], dtype=object)

If we have multiple columns that we want to use to define the sets, such as algo and run:

df = pd.DataFrame(
    dict(
        obj1=[1, 2, 3, 4, 5, 6, 5, 4, 3, 1],
        obj2=[6, 5, 4, 3, 2, 1, 5, 4, 5, 6],
        obj3=[1, 2, 3, 4, 5, 6, 6, 7, 5, 2],
        algo=["a"] * 3 + ["b"] * 3 + ["a", "b"] * 2,
        run=[1, 1, 2, 1, 1, 2, 2, 2, 1, 1],
    )
)
obj_cols = ["obj1", "obj2", "obj3"]
df

	obj1	obj2	obj3	algo	run
0	1	6	1	a	1
1	2	5	2	a	1
2	3	4	3	a	2
3	4	3	4	b	1
4	5	2	5	b	1
5	6	1	6	b	2
6	5	5	6	a	2
7	4	4	7	b	2
8	3	5	5	a	1
9	1	6	2	b	1

We can still use groupby() but we may need to reset and clean-up the index.

df.groupby(["algo", "run"])[obj_cols].apply(
    moocore.filter_dominated
).reset_index(level=["algo", "run"])

	algo	run	obj1	obj2	obj3
0	a	1	1	6	1
1	a	1	2	5	2
2	a	2	3	4	3
3	b	1	4	3	4
4	b	1	5	2	5
9	b	1	1	6	2
5	b	2	6	1	6
7	b	2	4	4	7

Or we can combine the multiple columns as one to define the sets:

sets = df["algo"].astype(str) + "-" + df["run"].astype(str)
sets

  a-1
  a-1
  a-2
  b-1
  b-1
  b-2
  a-2
  b-2
  a-1
  b-1
dtype: object

then identify nondominated rows within each set:

is_nondom = moocore.is_nondominated_within_sets(df[obj_cols], sets=sets)
is_nondom

array([ True,  True,  True,  True,  True,  True, False,  True, False,
        True])

And use the boolean vector above to filter rows:

df[is_nondom]

	obj1	obj2	obj3	algo	run
0	1	6	1	a	1
1	2	5	2	a	1
2	3	4	3	a	2
3	4	3	4	b	1
4	5	2	5	b	1
5	6	1	6	b	2
7	4	4	7	b	2
9	1	6	2	b	1

Total running time of the script: (0 minutes 0.020 seconds)

Gallery generated by Sphinx-Gallery

	obj1	obj2	obj3	algo	run
0	1	6	1	a	1
1	2	5	2	a	1
2	3	4	3	a	2
3	4	3	4	b	1
4	5	2	5	b	1
5	6	1	6	b	2
6	5	5	6	a	2
7	4	4	7	b	2
8	3	5	5	a	1
9	1	6	2	b	1

	obj1	obj2	obj3	algo	run
0	1	6	1	a	1
1	2	5	2	a	1
2	3	4	3	a	2
3	4	3	4	b	1
4	5	2	5	b	1
5	6	1	6	b	2
6	5	5	6	a	2
7	4	4	7	b	2
8	3	5	5	a	1
9	1	6	2	b	1

	obj1	obj2	obj3	algo	run
0	1	6	1	a	1
1	2	5	2	a	1
2	3	4	3	a	2
3	4	3	4	b	1
4	5	2	5	b	1
5	6	1	6	b	2
6	5	5	6	a	2
7	4	4	7	b	2
8	3	5	5	a	1
9	1	6	2	b	1