Deceptive_Monotone_Function
* Johannes Lengler, johannes.lengler@ethz.ch, ETH Zurich, Switzerland
The ground space is $\{0,1\}^n$ with $n=10.000$. We pick uniformly at random index sets $A_0,A_1, \ldots A_40 \subseteq \{1,\ldots,n\}$, each of size $5000$. In each $A_i$, we choose uniformly a set $B_i$ of size $1000$.
For $x \in \{0,1\}^n$, we define the level of $x$ to be $\ell(x) := \max\{0\leq i \leq 39 \mid \text{at least 95\% of the bits in B_i are one-bits in $x$}\}$. (The level is zero if no such $i$ exists.)
The fitness of $x$ is then $f(x) = n^2 \cdot \ell(x) + n \cdot\sum_{i \in A_{\ell(x)+1}} x_i + \sum{1 \leq i \leq n} x_i$.
The problem comes from purely theoretical consideration (Lengler, Steger, “Drift Analysis and Evolutionary Algorithms Revisited”). It is a monotone function, i.e, whenever a zero-bit is changes into a one-bit then the function value increases. In particular, the one-string is the unique global optimum.
As every other monotone function, it is easy to optimize if the mutation rate is rather weak, but hard if the mutation rate is strong. It has been theoretically shown (if n and the number of sets $A_i$, $B_i$ increases), that the (1+1)-EA with mutation rate $c/n$ for $c<1$ optimizes the function in time $O(n\log n)$, while for $c>2.2$ it takes exponential time $2^{\Omega(n)}$.
In practice, I have (non-exhaustively!) tested the concrete numbers above for the (1+1)-EA with c = 0.9 and c = 4. In the former case, after 200,000 iterations the (1+1)-EA has optimized $99.998\%$ of the bits. For $c=4$ after the same number of steps it has only optimized $\approx 90.5\%$ of the bits, with no visible improvement in the last 100,000 iterations. (The fitness did increase in these last steps, but not the distance to the optimum.)
The function has only been studied theoretically for the (1+1)-EA. However, I expect the results to carry over for algorithms that use larger populations, crossovers etc. The main point of the function is that it is deceptive for algorithms which have too large mutation rates. With high mutation rates small errors are easily accepted for the sake of seemingly more relevant advantages at other points, and making too many such small errors is heavily punished by the function. More precisely, when a string $x$ has level $\ell$ then the fitness function gives out a large reward for optimizing level $\ell+1$. If the mutation rate is large, then any improvement for level $\ell+1$ is accepted, although such improvement often comes with significant collateral damage for the progress previously made for other levels.
I don’t know whether similar functions exist in real-world settings, or whether this function is of practical relevance.
From a theoretical point, the function is very easily accessible. It scales well, and since it is essentially defined by random sets, there are plenty of tools from probability theory which can be used to study the function.
As said above, I believe that this function generally punishes too high mutation rates, even if cross-over and similar techniques are used, but I have not tested this hypothesis, nor has it been studied theoretically.
I have fully described how to generate a benchmark instance, cf. also (Lengler, Steger). Other instances can easily be created, and the problem can be scaled up by increasing $n$, the sizes of $A_i$ and $B_i$, the number of levels, and the fraction of correct bits that are used to define the level of $x$.
I do find the function very interesting from a theoretical point of view. However, I feel that it only makes sense to study it further if it sparks interest in other researchers as well.
~~DISCUSSION~~