# Mapping the global structure of TSP fitness landscapes

The dataset contains landscape data for "Mapping the global structure of TSP fitness landscapes", G. Ochoa, N. Veerapen, Journal of Heuristics, 2017. The dataset describes the network structure of the local optima networks for Traveling Salesman Problem instances that are sampled using Chained Lin-Kernighan. TSPLIB.zip contains networks from the TSPLIB instances (http://comopt.ifi.uni-heidelberg.de/software/TSPLIB95/) that are used in the paper. DIMACSclustered.zip and DIMACSuniform.zip contain networks for instances generated using the DIMACS TSP instance generator (http://dimacs.rutgers.edu/Challenges/TSP/download.html).

## How to interprete the data

The networks for each instance are described by five plain text files with the sols, nodes, edges, edge_history and runs file name extensions. 

The letter and number pairs in the file names indicate the flags that were used to run the linkern program provide in Concorde (http://www.math.uwaterloo.ca/tsp/concorde.html). Flag s is the random seed used. X defines the number of consecutive iterations without improvement before the a run ends. r defines the number of runs. Flag z is a custom flag that defines how many double bridge kicks are applied in succession.

Some flags are not included in the file names. The initial solution of each run is generated by alternating between a random solution and the quick Boruvka method for each run. This is equivalent to alternating between the existing 0 (random) and 4 (quick Boruvka) options for the I flag. A kick is computed using the random walk method and corresponds to flag K with value 3.

Only solutions that strictly improved the previous solution in the run were recorded and failed escape attempts were discarded.

The sols file contains the list of the solutions and each line corresponds to a unique solution. The file is split into two tab-delimited columns:
ID - an integer ID of the solution (same as in the nodes file)
SOLUTION - a comma separated representation of the solution where the cities are indexed according to their order of appearance in the TSPLIB instance starting from 0. The city indices are represented in hexadecimal form.

The nodes file contains the list of the nodes in the local optima network and each line corresponds to a unique solution. The file is split into two tab-delimited columns:
ID - an integer ID of the solution in the network (same as in the sols file)
FITNESS - the fitness of the solution

The edges file contains the list of directed edges in the local optima network and each line corresponds to a unique edge. The file is split into three tab-delimited columns:
ID_START - the ID of the node starting the edge
ID_END - the ID of the node ending the edge
COUNT - the number of times this edge was observed during the sampling procedure

The edge_history file contains a summary of each iteration of the sampling procedure. The file is split into four tab-delimited columns:
RUN - the number of the run
ITER - the number of the iteration within the run
ID_START - the ID of the node starting the edge (the run)
ID_END - the ID of the node ending the edge (the run), the value is -1 if the solution does not have a strictly improved fitness
NUM_KICKS - the number of double bridge kicks used during this iteration

The runs file contains a summary of each run in the sampling procedure. The file is split into eight tab-delimited columns:
RUN - identifier of the run
TIME - runtime for run (in seconds)
ITERS - number of iterations in the run
START - identifier of the initial node in the run (same ID as in .nodes file)
GLOBAL_FOUND - 1 if at least one global optimum was found in the run, 0 otherwise
BEST_FITNESS - best fitness in run
NUM_BEST - number nodes with best fitness in run
BEST_NODES - comma separated identifiers of nodes with best fitness (same ID as in .nodes file)