ImAppNIO-Wiki

This is an old revision of the document!

COST Action 15140: ImAppNIO - Working Group 3: Benchmarks - Open Discussion Wiki

Aims

Working Group 3 of the COST Action 15140: ImAppNIO is tasked with the development of practice-driven theory-affine benchmarks. Bringing together the perspectives which respectively dominate the efforts of WG1 and WG2 its task is to develop useful benchmark for nature-inspired search and optimisation heuristics with a strong focus on discrete search spaces and discrete optimisation problems making sure that the developed benchmarks are relevant from a practical perspective and accessible from a theoretical perspective.

It will pursue the same goals for discrete nature-inspired search and optimisation heuristics as the COmparing Continuous Optimisers (COCO) initiative for continuous optimisers. It will establish a workshop and the main international conferences in the field such as Genetic and Evolutionary Computation Conference - GECCO and International Conference on Parallel Problem Solving from Nature – PPSN. The focus will be on competitions based on these workshops. This will serve as a test bed for the results of WG1, a source of inspiration for WG2 and an important and highly visible focus point for anyone interest in the application of nature-inspired search and optimisation heuristics. The benchmarks that are developed will be made publicly available as part of the COST Action's Internet website and WG3 is responsible for providing and maintaining this.

Working Group 3 uses this Wiki to develop, discuss, and present contents in an open way.

PART A: Identifying the benchmark functions

Characteristics of the Sought Benchmark Problems

Following the spirit of COCO, the challenge is that the Benchmark functions should capture the difficulties of combinatorial problems in practice but at the same time be comprehensible such that algorithm behaviours can be understood or interpreted according to the performance on a given benchmark problem.

Ideally (not necessarily for all), we would like the benchmark functions to be:

1) scalable with the problem size;

2) to be non-trivial in the black box optimisation sense (the function may be shifted such that the global optimum may be any point).

While the challenge may be significant, especially for classical combinatorial optimisation problems (not so much for toy problems), achieving this goal would help greatly in bridging the gap between theoreticians and experimentalists.

Suggestions by Thomas Weise

I think there can be several approaches.

1. one could use some of the hardest instances of several classical problems and try to somehow standardize them.

2. one could try to develop new real-world problems (such as the Traveling Thief Problem by Markus Wagner).

3. one could try to develop a problem that can represent different hard aspects (such as NK or NKp landscapes).

Concerning Point 3, I have also done some work in this direction. Characterising what features make optimization problems hard and developing a model where these features can be tuned separately. I would suggest to include some settings of this function in the final test bed, as they are easy to understand, easy to implement, and fast to compute.

Suggestions by Markus Wagner

We in Adelaide have a bit of experience with interconnected problems, in particular, with an academic problem called the Travelling Thief Problem (TTP). This is a combination of two well-known combinatorial problems: the Knapsack Problem (KP) and the Travelling Salesperson Problem (TSP). The two are combined (among other) using a so-called rent rate R that needs to be paid for the time travelled - also, the speed reduces with increasing load, so packing maximally results in great slowdown and thus reduced overall objective score:

objective score = profit of items - R * travel time

Advantages regarding scalability and adjustability:

1. By changing R, we can shift the influence from one component to the other component: R=0 leaves us with the KP and R=large leaves us with the TSP.

2. If the tour is fixed and one is left with the knapsack problem, then Neumann and Polyakovskiy called this “Packing while Travelling” (PWT), for which a DP and a FPTAS exist. Thus, the search space is just the KP space.

3. There are 9720 instance online from with 51 to 85k cities and with 50 to 850k items (https://cs.adelaide.edu.au/~optlog/research/combinatorial.php). For the tiniest instances, exact approaches are computationally feasible.

4. The way the instances (including R) are defined gives us at least 1 solution with objective score 0 (by considering “optimal TSP tour” and “maximum KP profit”). There are no guarantees beyond this.

5. Multi-objective considerations are straight-forward if necessary in the future (and are already being investigated), e.g., objective

6. A relaxed version has been presented at GECCO 2016 (Multiple Traveling Thieves Problem), where it is no longer necessary to visit all cities and it is possible to have multiple thieves.

The overall setting is most likely sitting at the “quite complex” end of the spectrum of functions that the WG3 might be after. Markus' perceived order of difficulty (increasing f.l.t.r.): PWT < TTP < MTTP.

A current article with an overview and comparison of relevant approaches: https://cs.adelaide.edu.au/~markus/pub/2017joh-ttp.pdf (Springer: http://link.springer.com/article/10.1007%2Fs10732-017-9328-y)

Suggestions by Thomas Stuetzle

I think SAT and TSP could be some problems but this very likely at least for SAT leads to duplication, as in the SAT community there is already since long times regular efforts to evaluate complete SAT solvers and local search solvers in the SAT competitions (I think they are annual or every two years).

For TSP there was the TSP challenge by David Johnson et al. I think for TSP it would make sense to try with new benchmarks.

For both concerns I would like to draw your attention to the following:

“These benchmark functions should capture the difficulties of combinatorial optimization problems in practice, but at the same time be comprehensible so that the resulting algorithm behaviors can be understood or interpreted.”

and

“The sought benchmark functions should ideally be scalable with the problem size and non-trivial in the black-box optimization sense, i.e., allow for shifting the optimum to any point.”

If we do not get instances at least as hard as in these competitions [SAT,TSP] and algorithms that are competitive with those already used in these competitions, I don’t see what particular type of impact one would make. Obviously, it is interesting to study algorithm behavior and so on but I think the most interesting is to study that of the high-performing ones (as is also typically done in the BBOB case!).

I think for each of the benchmarks one should give some baseline algorithms that are publicly available and that people can use to see whether they could compete (if this is a goal).

By the way, when doing such things seriously on combinatorial optimization, one notices that BBOB is so much easier, as there is only one problem with many instances, while for covering combinatorial optimization one would have to cover many problems each with many instances and often specialized algorithms.

The added difficulty is that in combinatorial optimization usually what counts is the computation time not function evaluations. So one needs a very good surrogate for computation time, conversion of computation times or evaluate all algorithms on a same platform.

As a side thing: I’d exclude polynomially solvable problems (at least those where n log n or also n^2 algorithms exist) except one goes to very large instances.

Quite a few people would like also to have easy problems to show when some algorithms or parameters are really bad.

I think it is fine to do so when this is well motivated (which would be the case in what you mention). However, I would then avoid to make a challenge on solving well polynomial problems if this has now positive effect on the original NP-hard ones as this can easily de-generate in a pseudo-competition (at least if one does it only experimentally—in occasional situations I would now exclude that something interesting results ..)

Suggestions by Hernan Aguirre (On Twitter)

Kauffman's NK-landscapes for single-objective optimization and its extensions to multi- and many-objectives (MNK-landacapes and rhoMNK-landscapes) offer the possibility of testing on sub-classes of problems that require binary representation. Are a must for serious benchmarking.

PART B: Evaluating Algorithm performance

Apart from defining an initial set of benchmark functions appropriate ways to compare algorithm performance will have to be agreed upon.

Suggestions by Thomas Weise

Traditionally one would let the algorithms run for a fixed amount of time (which may be the number of fitness function evaluations) and then look at the end result. It is well understood that this may not be the best method. Instead, several measurements should be collected to document the progress of each run. In our work on the TSP Suite, we built a software along the lines of COCO, but for the TSP.

A “complete” benchmark, should also prescribe how the measurements should be stored (e.g., csv text file in a specific directory structure). If that can be done, than all researchers will immediately have comparable results. COCO/BBOB/TSP Suite do this implicitly by providing the experiment execution and evaluation as a single software package. As a result we avoid ending up with essentially incomparable experimental results.

APPENDIX: Wiki material of previous WG3 Leaders

Call for Benchmark Problems - How to Contribute

In our initial phase we are seeking suggestions for benchmark problems. You are invited to contribute by copying this Template Benchmark Problem page and linking it in the index below. Furthermore, feel free to augment or comment any ideas provided so far. Do not forget to subscribe to at least your own pages (under menu item ``manage subscriptions'') in order to receive email-notifications whenever colleagues augment, annotate, or change something! We hope that a fruitful discussion will emerge!

List of Suggested Benchmark Problems

* Template Benchmark Problem

Insert your new suggestions here in alphabetic order:

* Deceptive Monotone Function

* Explore the enhancement on plant assimilation of nutrients and crop growth

Further Suggestions, Issues, or Aspects to Discuss

Please create a new page for any other ideas, suggestions, issues etc. that you want to discuss here. Add new entries at the top!

* First Suggestion (to be replaced)

Contacts

WG3 Leader and Vice-Leaders: Pietro S. Oliveto, Borys Wrobel,Aleš Zamuda

Recent New Members (Since October 2017)

Jörg Lässig jlaessig@googlemail.com, Thomas Weise tweise@ustc.edu.cn.

~~DISCUSSION~~

ImAppNIO-Wiki

User Tools

Site Tools

Table of Contents

COST Action 15140: ImAppNIO - Working Group 3: Benchmarks - Open Discussion Wiki

Aims

PART A: Identifying the benchmark functions

Characteristics of the Sought Benchmark Problems

Suggestions by Thomas Weise

Suggestions by Markus Wagner

Suggestions by Thomas Stuetzle

Suggestions by Hernan Aguirre (On Twitter)

PART B: Evaluating Algorithm performance

Suggestions by Thomas Weise

APPENDIX: Wiki material of previous WG3 Leaders

Call for Benchmark Problems - How to Contribute

List of Suggested Benchmark Problems

Further Suggestions, Issues, or Aspects to Discuss

Contacts

Recent New Members (Since October 2017)

Page Tools