read_datasets#

moocore.read_datasets(filename)[source]#

Read an input dataset file, parsing the file and returning a numpy array.

Parameters:: filename (str | PathLike | StringIO) – Filename of the dataset file or io.StringIO directly containing the file contents. If it does not contain an absolute path, the filename is relative to the current working directory. If the filename has extension .xz, it is decompressed to a temporary file before reading it. Each line of the file corresponds to one point of one dataset. Different datasets are separated by an empty line.
Returns:: ndarray – An array containing a representation of the data in the file. The first $n - 1$ columns contain the numerical data for each of the objectives. The last column contains an identifier for which set the data is relevant to.

Examples

>>> filename = moocore.get_dataset_path("input1.dat")
>>> moocore.read_datasets(filename)  
array([[ 8.07559653,  2.40702554,  1.        ],
       [ 8.66094446,  3.64050144,  1.        ],
       [ 0.20816431,  4.62275469,  1.        ],
       ...
       [ 4.92599726,  2.70492519, 10.        ],
       [ 1.22234394,  5.68950311, 10.        ],
       [ 7.99466959,  2.81122537, 10.        ],
       [ 2.12700289,  2.43114174, 10.        ]])

The numpy array represents this data:

Objective 1	Objective 2	Set Number
8.07559653	2.40702554
8.66094446	3.64050144
…	…	…
7.99466959	2.81122537
2.12700289	2.43114174

It is also possible to read datasets from a string:

>>> from io import StringIO
>>> fh = StringIO("0.5 0.5\n\n1 0\n0 1\n\n0.5 0.5")
>>> moocore.read_datasets(fh)
array([[0.5, 0.5, 1. ],
       [1. , 0. , 2. ],
       [0. , 1. , 2. ],
       [0.5, 0.5, 3. ]])