Data for the paper "Relating training instances to Automatic Design of Algorithms for bin packing via features"

Brownlee, Alexander E I; Woodward, John R; Veerapen, Nadarajen

Please use this identifier to cite or link to this item: http://hdl.handle.net/11667/108

Appears in Collections:	University of Stirling Research Data
Title:	Data for the paper "Relating training instances to Automatic Design of Algorithms for bin packing via features"
Creator(s):	Brownlee, Alexander E I Woodward, John R Veerapen, Nadarajen
Contact Email:	alexander.brownlee@stir.ac.uk
Keywords:	Automatic design of algorithms features bin packing
Date Available:	5-Apr-2018
Citation:	Brownlee, AEI; Woodward, JR; Veerapen, N (2018): Data for the paper "Relating training instances to Automatic Design of Algorithms for bin packing via features". University of Stirling. Faculty of Natural Sciences. Dataset. http://hdl.handle.net/11667/108
Publisher:	University of Stirling. Faculty of Natural Sciences.
Dataset Description (Abstract):	Automatic Design of Algorithms (ADA) shifts the burden of algorithm choice and design from developer to machine. Constructing an appropriate solver from a set of problem instances becomes a machine learning problem, with instances as training data. An efficient solver is trained for unseen problem instances with similar characteristics to those in the training set. However, this paper reveals that, as with classification and regression, for ADA not all training sets are equally valuable. We apply a typical genetic programming ADA approach for bin packing problems to several new and existing public benchmark sets. Algorithms trained on some sets are general and apply well to most others, whereas some training sets result in highly specialised algorithms that do not generalise. We relate these findings to features (simple metrics) of instances. Using instance sets with narrowly-distributed features for training results in highly specialised algorithms, whereas those with well-spread features result in very general algorithms. We show that variance in certain features has a strong correlation with the generality of the trained policies. Our results provide further grounding for recent work using features to predict algorithm performance, and show the suitability of particular instance sets for training in ADA for bin packing. This dataset includes the raw experimental data, new benchmark instances and figures for all experiments in the paper.
Dataset Description (TOC):	Data sets and figures for the paper "Relating Training Instances to Automatic Design of Algorithms for Bin Packing via Features"; Alexander E.I. Brownlee, John R. Woodward, Nadarajen Veerapen; Companion Proceedings of GECCO 2018, Kyoto Japan. figures/ is the full set of figures for the instance sets and footprints stirling-instances/ contains the generated instances mentioned in the paper. Dir names are stir_generated_binCapacity_itemSizeLowerBound_itemSizeUpperBound. Filenames are stir_binCapacity_numberOfItems_itemSizeLowerBound_itemSizeUpperBound Format of files are the same as the Falkenauer benchmarks: line1:number of items line2:volume of bins any addition lines: item size data-wide/ contains the results of GP runs. There are three types of files in here: - waescher_instances.txt is just a list of the instance files in a given set (Waescher in this case) - waescher_performance_0.txt is the output from a GP run for the named instance set (Waescher here). Line 1 is the best GP program found. The remainder is tab-separated data for each bin packing instance in the set; instanceSet instanceName(usually automatically generated) numberOfBins(required when using the GP evolved packing policy) - waescher_29_on_stir_generated_150_76_100_performance.txt is the result when running the policy that was trained on waescher (run 29), on the instances in the stir_generated_150_76_100 set. content is same as line above. metricsAndRawBinCountsAll-allbp-trained-footprints - comma separated file with all data used for the analysis. Each row is the data for bin bin packing instance. Columns are: - format: the file format - set: the instance set the instance belongs to - id: a unique ID - file: filename containing the instance - numberInFile: some of the files have more than one instance. This is the index within the file. - isVolumesAreInts-ItemListCompressionRatio: the static features for the instance - ScaledFit0_raw-ScaledFit10_raw: the performance features for the instance (number of bins required by the scaled fit policies) - ScaledFit0_rank-ScaledFit10_dist: scaled variants of the performance features (unused in the paper) - 2cbp_0-stir_generated_150_76_100_29: number of bins required for this instance when using the policy trained on the named set, in GP run number x - 2cbp_footprintCount-stir_generated_150_76_100_footprintCount: number of times this instances appears in the footprint of policies trained on the names instance set. Full details are given in the README.txt file. Dedicated UnZip software is recommended for accessing the dataset, for example, IZArc.
Type:	dataset
Contract/Grant Title:	FAIME: A Feature based Framework to Automatically Integrate and Improve Metaheuristics via Examples DAASE: Dynamic Adaptive Automated Software Engineering
Funder(s):	EPSRC - Engineering and Physical Sciences Research Council
Contract/Grant Number:	EP/N002849/1 EP/J017515/1
RMS ID:	1855 1067
URI:	http://hdl.handle.net/11667/108
Rights:	Rights covered by the standard CC-BY 4.0 licence: https://creativecommons.org/licenses/by/4.0/
Affiliation(s) of Dataset Creator(s):	University of Stirling (Computing Science - CSM Dept) Queen Mary University of London

Files in This Item:

File	Description	Size	Format
data.zip		29.86 MB	Unknown	View/Open
Readme.txt		2.46 kB	Text	View/Open

This item is protected by original copyright

View Licence

Show full item record Recommend this item

DataSTORRE

DataSTORRE: Stirling Online Repository for Research Data.