# Tunnelling Crossover Networks for the Asymmetric TSP

The dataset contains landscape data for "Tunnelling Crossover Networks for the Asymmetric TSP", N. Veerapen, G. Ochoa, R. Tinós, D. Whitley, The 14th International Conference on Parallel Problem Solving from Nature, PPSN2016, 17-21 September 2016, Edinburgh, Scotland. The dataset describes the network structure of the local optima networks for the 25 Asymmetric Traveling Salesman Problem instances that are sampled in the paper according to two different methodologies: using an evolutionary algorithm based on the Generalized Partition Crossover, and using Chained Lin-Kernighan.

The data are organised into three zip files, one for each method and one for generated instances. These instances (C50.0, C100.0, C200.0, E50.0, E100.0, and E200.0) were generated using the DIMACS TSP instance generator (http://dimacs.rutgers.edu/Challenges/TSP/download.html) and the distance matrices were perturbed to obtained asymmetric instances. The rest of the instances are from TSPLIB (http://comopt.ifi.uni-heidelberg.de/software/TSPLIB95/).

## How to interprete the data

The networks for each instance are described by five plain text files with the sols, nodes, edges, edge_history and node_history file name extensions. The sols, nodes and edges files follow the same structure for both search algorithms and the other two are slightly different. The file naming convention is also different. For Chained Lin-Kernighan, only solutions that improved the previous solution in the run were recorded and failed escape attempts were discarded.

### Naming convention

#### Chained Lin-Kernighan

The letter and number pairs in the file names indicate the flags that were used to run the linkern program provide in Concorde (http://www.math.uwaterloo.ca/tsp/concorde.html). The I flag defines the way the initial solution is generated and the value 5 defines a custom option that alternates between a random solution and the quick Boruvka method for each run. This is equivalent to alternating between the existing 0 (random) and 4 (quick Boruvka) options. The K flag defines how a kick is computed and the value 3 represents the random walk method. R defines the number of kicks. r defines the number of runs. Flag z is a custom flag that defines how many double bridge kicks are applied in succession (in the paper this corresponds to parameter p).

#### GAPX GA

The letter r indicates the number of runs, g the number of generations, m the mutation operator with 1 being a sequence of double bridges, s the population size. 

### Files with common structures

The sols file contains the list of the solutions and each line corresponds to a unique solution. The file is split into two tab-delimited columns:
ID - an integer ID of the solution (same as in the nodes file)
SOLUTION - a comma separated representation of the solution where the cities are indexed according to their order of appearance in the instance file, starting from 0

The nodes file contains the list of the nodes in the local optima network and each line corresponds to a unique solution. The file is split into two tab-delimited columns:
ID - an integer ID of the solution in the network (same as in the sols file)
FITNESS - the fitness of the solution

The edges file contains the list of directed edges in the local optima network and each line corresponds to a unique edge. The file is split into three tab-delimited columns:
ID_START - the ID of the node starting the edge
ID_END - the ID of the node ending the edge
COUNT - the number of times this edge was observed during the sampling procedure

### Files with different structures

#### Chained Lin-Kernighan

The edge_history file contains a summary of each iteration of the sampling procedure. The file is split into four tab-delimited columns:
RUN - the number of the run
ITER - the number of the iteration within the run
ID_START - the ID of the node starting the edge (the run)
ID_END - the ID of the node ending the edge (the run), the value is -1 if the solution does not have a strictly improved fitness
NUM_KICKS - the number of double bridge kicks used during this iteration

The node_history file contains a summary of each run in the sampling procedure. The file is split into six tab-delimited columns:
RUN - the number of the run
START - the ID of the first node in the run
GLOBAL_FOUND - 1 if at least one global optimum was found in the run, 0 otherwise
BEST_FITNESS - fitness of the best solution found in the run
NUM_BEST - number of solutions that have the best fitness found in the run
BEST_NODES - comma separated IDs of the nodes that have the best fitness found in the run

#### GAPX GA

The edge_history file contains a summary of each iteration of the sampling procedure. The file is split into eight tab-delimited columns:
RUN - the number of the run
GEN - the number of the iteration within the run
OPCOUNT - counter for the operation performed, a crossover operation is represented on two consecutive rows with the same counter value 
TYPE - type of the operation, 0 for crossover, 1 for mutation
PARAM - the parameter for the operation, 0 indicates no parameter was used, positive integers are the number of consecutive double bridges for the mutation
PARENT - the ID of the node that is a parent in the crossover or that is being mutated 
CHILD - the ID of the node that is a child in the crossover or that is the result of the mutation
SAMESOLAFTERLS - 1 if the application of local search (3-opt) after the operation results in the same solution, 0 otherwise

The node_history file contains a summary of each run in the sampling procedure. The file is split into six tab-delimited columns:
RUN - the number of the run
GEN - the number of generations
GLOBAL_FOUND - 1 if at least one global optimum was found in the run, 0 otherwise
BEST_FITNESS - fitness of the best solution found in the run
NUM_BEST - number of solutions that have the best fitness found in the run
BEST_NODES - comma separated IDs of the nodes that have the best fitness found in the run