Unary quality metrics#

Inverted Generational Distance (IGD and IGD+) and Averaged Hausdorff Distance#

igd(data, /, ref, *[, maximise])

Inverted Generational Distance (IGD).

igd_plus(data, /, ref, *[, maximise])

Modified IGD (IGD+).

avg_hausdorff_dist(data, /, ref, *[, ...])

Average Hausdorff distance.

Functions to compute the inverted generational distance (IGD and IGD+) and the averaged Hausdorff distance between nondominated sets of points.

The generational distance (GD) of a set \(A\) is defined as the distance between each point \(a \in A\) and the closest point \(r\) in a reference set \(R\), averaged over the size of \(A\). Formally,

\[GD_p(A,R) = \left(\frac{1}{|A|}\sum_{a\in A}\min_{r\in R} d(a,r)^p\right)^{\frac{1}{p}}\]

where the distance in our implementation is the Euclidean distance:

\[d(a,r) = \sqrt{\sum_{k=1}^m (a_k - r_k)^2}\]

The inverted generational distance (IGD) is calculated as \(IGD_p(A,R) = GD_p(R,A)\).

The modified inverted generational distanced (IGD+) was proposed by Ishibuchi et al.1 to ensure that IGD+ is weakly Pareto compliant, similarly to epsilon_additive() or epsilon_mult(). It modifies the distance measure as:

\[d^+(r,a) = \sqrt{\sum_{k=1}^m (\max\{r_k - a_k, 0\})^2}\]

The average Hausdorff distance (\(\Delta_p\)) was proposed by Schütze et al.2 and it is calculated as:

\[\Delta_p(A,R) = \max\{ IGD_p(A,R), IGD_p(R,A) \}\]

IGDX 3 is the application of IGD to decision vectors instead of objective vectors to measure closeness and diversity in decision space. One can use the functions igd() or igd_plus() (recommended) directly, just passing the decision vectors as data.

There are different formulations of the GD and IGD metrics in the literature that differ on the value of \(p\), on the distance metric used and on whether the term \(|A|^{-1}\) is inside (as above) or outside the exponent \(1/p\). GD was first proposed by Van Veldhuizen and Lamont4 with \(p=2\) and the term \(|A|^{-1}\) outside the exponent. IGD seems to have been mentioned first by Coello Coello and Reyes-Sierra5, however, some people also used the name D-metric for the same concept with \(p=1\) and later papers have often used IGD/GD with \(p=1\). Schütze et al.2 proposed to place the term \(|A|^{-1}\) inside the exponent, as in the formulation shown above. This has a significant effect for GD and less so for IGD given a constant reference set. IGD+ also follows this formulation. We refer to Ishibuchi et al.1 and Bezerra et al.6 for a more detailed historical perspective and a comparison of the various variants.

Following Ishibuchi et al.1, we always use \(p=1\) in our implementation of IGD and IGD+ because (1) it is the setting most used in recent works; (2) it makes irrelevant whether the term \(|A|^{-1}\) is inside or outside the exponent \(1/p\); and (3) the meaning of IGD becomes the average Euclidean distance from each reference point to its nearest objective vector. It is also slightly faster to compute.

GD should never be used directly to compare the quality of approximations to a Pareto front, as it often contradicts Pareto optimality (it is not weakly Pareto-compliant).

IGD is still popular due to historical reasons, but we strongly recommend IGD+ instead of IGD, since the latter contradicts Pareto optimality in some cases (see examples in igd_plus()) whereas IGD+ is weakly Pareto-compliant.

The average Hausdorff distance (\(\Delta_p(A,R)\)) is also not weakly Pareto-compliant, as shown in the examples in igd_plus().

Epsilon metric#

epsilon_additive(data, /, ref, *[, maximise])

Additive epsilon metric.

epsilon_mult(data, /, ref, *[, maximise])

Multiplicative epsilon metric.

The epsilon metric of a set \(A \subset \mathbb{R}^m\) with respect to a reference set \(R \subset \mathbb{R}^m\) is defined as 7

\[epsilon(A,R) = \max_{r \in R} \min_{a \in A} \max_{1 \leq i \leq m} epsilon(a_i, r_i)\]

where \(a\) and \(b\) are objective vectors of length \(m\).

In the case of minimization of objective \(i\), \(epsilon(a_i,b_i)\) is computed as \(a_i/b_i\) for the multiplicative variant (respectively, \(a_i - b_i\) for the additive variant), whereas in the case of maximization of objective \(i\), \(epsilon(a_i,b_i) = b_i/a_i\) for the multiplicative variant (respectively, \(b_i - a_i\) for the additive variant). This allows computing a single value for problems where some objectives are to be maximized while others are to be minimized. Moreover, a lower value corresponds to a better approximation set, independently of the type of problem (minimization, maximization or mixed). However, the meaning of the value is different for each objective type. For example, imagine that objective 1 is to be minimized and objective 2 is to be maximized, and the multiplicative epsilon computed here for \(epsilon(A,R) = 3\). This means that \(A\) needs to be multiplied by 1/3 for all \(a_1\) values and by 3 for all \(a_2\) values in order to weakly dominate \(R\).

The multiplicative variant can be computed as \(\exp(epsilon_{+}(\log(A), \log(R)))\), which makes clear that the computation of the multiplicative version for zero or negative values doesn’t make sense. See the examples in epsilon_additive().

The current implementation uses the naive algorithm that requires \(O(m \cdot |A| \cdot |R|)\), where \(m\) is the number of objectives (dimension of vectors).

Hypervolume metric#

hypervolume(data, /, ref, *[, maximise])

Hypervolume indicator.

Hypervolume(ref[, maximise])

Object-oriented interface for the hypervolume indicator.

RelativeHypervolume(ref, ref_set[, maximise])

Computes the hypervolume value of fronts relative to the hypervolume of a reference front.

total_whv_rect(x, /, rectangles, *, ref[, ...])

Compute total weighted hypervolume given a set of rectangles.

whv_rect(x, /, rectangles, *, ref[, maximise])

Compute weighted hypervolume given a set of rectangles.

The hypervolume of a set of multidimensional points \(A \subset \mathbb{R}^m\) with respect to a reference point \(\vec{r} \in \matbb{R}^m\) is the volume of the region dominated by the set and bounded by the reference point 8. Points in \(A\) that do not strictly dominated \(\vec{r}\) do not contribute to the hypervolume value, thus, ideally, the reference point must be strictly dominated by all points in the true Pareto front.

More precisely, the hypervolume is the Lebesgue integral of the union of axis-aligned hyperrectangles (orthotopes), where each hyperrectangle is defined by one point from \(\vec{a} \in A\) and the reference point. The union of axis-aligned hyperrectangles is also called an orthogonal polytope.

The hypervolume is compatible with Pareto-optimality 7,9, that is, \(\nexists A,B \subset \mathbb{R}^m\), such that \(A\) is better than \(B\) in terms of Pareto-optimality and \(\text{hyp}(A) \leq \text{hyp}(B)\). In other words, if a set is better than another in terms of Pareto-optimality, the hypervolume of the former must be strictly larger than the hypervolume of the latter. Conversely, if the hypervolume of a set is larger than the hypervolume of another, then we know for sure than the latter set cannot be better than the former in terms of Pareto-optimality.

Approximating the hypervolume metric#

hv_approx(data, /, ref[, maximise, ...])

Approximate the hypervolume indicator.

whv_hype(data, /, *, ref, ideal[, maximise, ...])

Approximation of the (weighted) hypervolume by Monte-Carlo sampling (2D only).

Computing the hypervolume can be time consuming, thus several approaches have been proposed in the literature to approximate its value via Monte-Carlo sampling. These methods are implemented in whv_hype() and hv_approx().

The default option method="DZ2019" of hv_approx() implements the method proposed by Deng and Zhang10 to approximate the hypervolume:

\[\widehat{HV}_r(A) = \frac{\pi^\frac{m}{2}}{2^m \Gamma(\frac{m}{2} + 1)}\frac{1}{n}\sum_{i=1}^n \max_{y \in A} s(w^{(i)}, y)^m\]

where \(m\) is the number of objectives, \(n\) is the number of weights \(w^{(i)}\) sampled, \(\Gamma\) is the gamma function math.gamma(), i.e., the analytical continuation of the factorial function, and \(s(w, y) = \min_{k=1}^m (r_k - y_k)/w_k\). The weights \(w^{(i)}, i=1\ldots n\) are sampled from the unit normal vector such that each weight \(w = \frac{|x|}{\|x\|_2}\) where each component of \(x\) is independently sampled from the standard normal distribution. The original source code in C++/MATLAB can be found here.

Bibliography#

[1] (1,2,3)

Hisao Ishibuchi, Hiroyuki Masuda, Yuki Tanigaki, and Yusuke Nojima. Modified distance calculation in generational distance and inverted generational distance. In António Gaspar-Cunha, Carlos Henggeler Antunes, and Carlos A. Coello Coello, editors, Evolutionary Multi-criterion Optimization, EMO 2015 Part I, volume 9018 of Lecture Notes in Computer Science, pages 110–125. Springer, Heidelberg, Germany, 2015. [BibTeX].

[2] (1,2)

Oliver Schütze, X. Esquivel, A. Lara, and Carlos A. Coello Coello. Using the averaged Hausdorff distance as a performance measure in evolutionary multiobjective optimization. IEEE Transactions on Evolutionary Computation, 16(4):504–522, 2012. [BibTeX].

[3]

A. Zhou, Qingfu Zhang, and Yaochu Jin. Approximating the set of Pareto-optimal solutions in both the decision and objective spaces by an estimation of distribution algorithm. IEEE Transactions on Evolutionary Computation, 13(5):1167–1189, 2009. doi:10.1109/TEVC.2009.2021467, [BibTeX].

[4]

David A. Van Veldhuizen and Gary B. Lamont. Evolutionary computation and convergence to a Pareto front. In John R. Koza, editor, Late Breaking Papers at the Genetic Programming 1998 Conference, 221–228. Stanford University, California, July 1998. Stanford University Bookstore. [BibTeX].

[5]

Carlos A. Coello Coello and Margarita Reyes-Sierra. A study of the parallelization of a coevolutionary multi-objective evolutionary algorithm. In Raúl Monroy, Gustavo Arroyo-Figueroa, Luis Enrique Sucar, and Humberto Sossa, editors, Proceedings of MICAI, volume 2972 of Lecture Notes in Artificial Intelligence, pages 688–697. Springer, Heidelberg, Germany, 2004. [BibTeX].

[6]

Leonardo C. T. Bezerra, Manuel López-Ibáñez, and Thomas Stützle. An empirical assessment of the properties of inverted generational distance indicators on multi- and many-objective optimization. In Heike Trautmann, Günter Rudolph, Kathrin Klamroth, Oliver Schütze, Margaret M. Wiecek, Yaochu Jin, and Christian Grimme, editors, Evolutionary Multi-criterion Optimization, EMO 2017, volume 10173 of Lecture Notes in Computer Science, pages 31–45. Springer International Publishing, Cham, Switzerland, 2017. doi:10.1007/978-3-319-54157-0_3, [BibTeX].

[7] (1,2)

Eckart Zitzler, Lothar Thiele, Marco Laumanns, Carlos M. Fonseca, and Viviane Grunert da Fonseca. Performance assessment of multiobjective optimizers: an analysis and review. IEEE Transactions on Evolutionary Computation, 7(2):117–132, 2003. doi:10.1109/TEVC.2003.810758, [BibTeX].

[8]

Eckart Zitzler and Lothar Thiele. Multiobjective optimization using evolutionary algorithms - A comparative case study. In Agoston E. Eiben, Thomas Bäck, Marc Schoenauer, and Hans-Paul Schwefel, editors, Parallel Problem Solving from Nature – PPSN V, volume 1498 of Lecture Notes in Computer Science, pages 292–301. Springer, Heidelberg, Germany, 1998. doi:10.1007/BFb0056872, [BibTeX].

[9]

Joshua D. Knowles and David Corne. On metrics for comparing non-dominated sets. In Proceedings of the 2002 Congress on Evolutionary Computation (CEC'02), 711–716. Piscataway, NJ, 2002. IEEE Press. [BibTeX].

[10]

Jingda Deng and Qingfu Zhang. Approximating hypervolume and hypervolume contributions using polar coordinate. IEEE Transactions on Evolutionary Computation, 23(5):913–918, October 2019. doi:10.1109/tevc.2019.2895108, [BibTeX].