Note
Go to the end to download the full example code.
Using moocore with Pandas#
This example shows how to use moocore
functions with Pandas (https://pandas.pydata.org/). This example requires pandas version >= 2.0.0
import moocore
import pandas as pd
print(f"pandas version: {pd.__version__}")
pandas version: 2.2.3
First, we create a toy Pandas DataFrame
.
df = pd.DataFrame(
dict(
obj1=[1, 2, 3, 4, 5],
obj2=[5, 4, 3, 2, 1],
obj3=[100, 200, 200, 300, 100],
algo=2 * ["foo"] + 2 * ["bar"] + ["foo"],
)
)
df
Normalize it (only replace the objective columns!).
obj_cols = ["obj1", "obj2", "obj3"]
df[obj_cols] = moocore.normalise(df[obj_cols], to_range=[1, 2])
df
Calculate the hypervolume for each algo
using groupby()
and apply()
.
ref = 2.1
hv = (
df.groupby("algo")[obj_cols]
.apply(moocore.hypervolume, ref=ref)
.reset_index(name="hv")
)
hv
Or we can just use:
hv = moocore.apply_within_sets(
df[obj_cols], df["algo"], moocore.hypervolume, ref=ref
)
hv
array([0.3435 , 0.22475])
moocore.apply_within_sets()
processes each group in
order, even if the elements of the same group are not contiguous. That is, it
processes the groups like pandas.Series.unique()
and not like
set
or numpy.unique()
.
df["algo"].unique()
array(['foo', 'bar'], dtype=object)
If we have multiple columns that we want to use to define the sets, such as algo
and run
:
df = pd.DataFrame(
dict(
obj1=[1, 2, 3, 4, 5, 6, 5, 4, 3, 1],
obj2=[6, 5, 4, 3, 2, 1, 5, 4, 5, 6],
obj3=[1, 2, 3, 4, 5, 6, 6, 7, 5, 2],
algo=["a"] * 3 + ["b"] * 3 + ["a", "b"] * 2,
run=[1, 1, 2, 1, 1, 2, 2, 2, 1, 1],
)
)
obj_cols = ["obj1", "obj2", "obj3"]
df
We can still use groupby()
but we may need to reset and clean-up the index.
df.groupby(["algo", "run"])[obj_cols].apply(
moocore.filter_dominated
).reset_index(level=["algo", "run"])
Or we can combine the multiple columns as one to define the sets:
sets = df["algo"].astype(str) + "-" + df["run"].astype(str)
sets
0 a-1
1 a-1
2 a-2
3 b-1
4 b-1
5 b-2
6 a-2
7 b-2
8 a-1
9 b-1
dtype: object
then identify nondominated rows within each set:
is_nondom = moocore.is_nondominated_within_sets(df[obj_cols], sets=sets)
is_nondom
array([ True, True, True, True, True, True, False, True, False,
True])
And use the boolean vector above to filter rows:
df[is_nondom]
Total running time of the script: (0 minutes 0.197 seconds)