# How Perturbation Strength Shapes the Global Structure of TSP Fitness Landscapes

The dataset contains landscape data for "How Perturbation Strength Shapes the Global Structure of TSP Fitness Landscapes", P. McMenemy, N. Veerapen, G. Ochoa. The 18th European Conference on Evolutionary Computation in Combinatorial Optimisation (EvoCOP 2018), 4 - 6 April 2018, Parma, Italy. The dataset describes the network structure of the local optima networks for 180 Travelling Salesman Problem instances, across 10 perturbation strengths, that are sampled in the paper. The TSP instances themselves are provided in the files clusteredInstances.zip and uniformInstances.zip and were generated using the DIMACS TSP instance generator (http://dimacs.rutgers.edu/Challenges/TSP/download.html). 
For the networks based on these instances, C506.zip, C755.zip and C1010.zip (resp. E506.zip, E755.zip and E1010.zip) each contain networks for clustered (resp. uniform) instances of size 506, 755 and 1010. Each of these 6 main zip files is split into 30 folders corresponding to a TSP instance. Each of these folders contains 10 zip files that describe the local optima network under one of the 10 different perturbation strengths. Each of these zip files contains the files described below.

## How to interprete the data

The networks for each instance are described by five plain text files with the sols, nodes, edges, runs and edge_history file name extensions. Note that the edge_history files were not stored for most networks since they are not necessary to build the networks and are generally the largest files produced. Furthermore the edge_history files were not used for the analysis presented in the paper.

The letter and number pairs in the file names indicate the flags that were used to run the linkern program provide in Concorde (http://www.math.uwaterloo.ca/tsp/concorde.html). Flag s is the random seed used. X defines the number of consecutive iterations without improvement before the a run ends. r defines the number of runs. Flag z is a custom flag that defines how many double bridge kicks are applied in succession (perturbation strength).

Some flags are not included in the file names. The initial solution of each run is generated by alternating between a random solution and the quick Boruvka method for each run. This is equivalent to alternating between the existing 0 (random) and 4 (quick Boruvka) options for the I flag. A kick is computed using the random walk method and corresponds to flag K with value 3.

Only solutions that strictly improved the previous solution in the run were recorded and failed escape attempts were discarded.

The sols file contains the list of the solutions and each line corresponds to a unique solution. The file is split into two tab-delimited columns:
ID - an integer ID of the solution (same as in the nodes file)
SOLUTION - a comma separated representation of the solution where the cities are indexed according to their order of appearance in the TSPLIB instance starting from 0. The city indices are represented in hexadecimal form.

The nodes file contains the list of the nodes in the local optima network and each line corresponds to a unique solution. The file is split into two tab-delimited columns:
ID - an integer ID of the solution in the network (same as in the sols file)
FITNESS - the fitness of the solution

The edges file contains the list of directed edges in the local optima network and each line corresponds to a unique edge. The file is split into three tab-delimited columns:
ID_START - the ID of the node starting the edge
ID_END - the ID of the node ending the edge
COUNT - the number of times this edge was observed during the sampling procedure

The runs file contains a summary of each run in the sampling procedure. The file is split into eight tab-delimited columns:
RUN - identifier of the run
TIME - runtime for run (in seconds)
ITERS - number of iterations in the run
START - identifier of the initial node in the run (same ID as in .nodes file)
GLOBAL_FOUND - 1 if at least one global optimum was found in the run, 0 otherwise
BEST_FITNESS - best fitness in run
NUM_BEST - number nodes with best fitness in run
BEST_NODES - comma separated identifiers of nodes with best fitness (same ID as in .nodes file)

The edge_history file contains a summary of each iteration of the sampling procedure. The file is split into four tab-delimited columns:
RUN - the number of the run
ITER - the number of the iteration within the run
ID_START - the ID of the node starting the edge (the run)
ID_END - the ID of the node ending the edge (the run), the value is -1 if the solution does not have a strictly improved fitness
NUM_KICKS - the number of double bridge kicks used during this iteration