The REAL corpus

Bartie, Phil; Mackaness, William; Gkatzia, Dimitra; Rieser, Verena

Please use this identifier to cite or link to this item: http://hdl.handle.net/11667/82

Full metadata record

DC Field	Value	Language
dc.contributor	Bartie, Phil	-
dc.contributor.other	EC - European Commission	en_GB
dc.contributor.other	EPSRC - Engineering and Physical Sciences Research Council	en_GB
dc.coverage.spatial	United Kingdom	en_GB
dc.coverage.temporal	2015-2016	en_GB
dc.creator	Bartie, Phil	-
dc.creator	Mackaness, William	-
dc.creator	Gkatzia, Dimitra	-
dc.creator	Rieser, Verena	-
dc.date.accessioned	2016-09-28T13:04:23Z	-
dc.date.available	2016-09-28T13:04:23Z	-
dc.date.created	2016-05-23	-
dc.identifier.uri	http://hdl.handle.net/11667/82	-
dc.description.abstract	Our interest is in people’s capacity to efficiently and effectively describe geographic objects in urban scenes. The broader ambition is to develop spatial models capable of equivalent functionality able to construct such referring expressions. To that end we present a newly crowd-sourced data set of natural language references to objects anchored in complex urban scenes (In short: The REAL Corpus – Referring Expressions Anchored Language). The REAL corpus contains a collection of images of real-world urban scenes together with verbal descriptions of target objects generated by humans, paired with data on how successful other people were able to identify the same object based on these descriptions. In total, the corpus contains 32 images with on average 27 descriptions per image and 3 verifications for each description. In addition, the corpus is annotated with a variety of linguistically motivated features. The paper highlights issues posed by collecting data using crowd-sourcing with an unrestricted input format, as well as using real-world urban scenes.	en_GB
dc.description.tableofcontents	The dataset includes a set of SOURCE images of features in typical urban scenes. A target was indicated in each image and participants were asked to describe that target (these words/phrases were typed by the participant). A validation process then asked other participants to read the description and tag (click) the object on the corresponding image. A set of validation images were generated to show if the tagged location was correct. Source Images – these are presented at two resolutions – the high quality 3000by2000 pixel version and a lower 825by550 pixel version of the same image. - source images are given a filename imgN.jpg and a corresponding version of the image with the designated target indicated is saved as imgNt.jpg - the participant saw the source version of the image but could toggle to see the target version briefly to know which object to describe in the scene Validation Images – these images are 825 x 550 pixels and have superimposed GREEN (correct target) and RED (incorrect) dots for where the validators have clicked. This gives an indication of how well the description worked and other features that were confused with the intended The data collected from the web based experiments are available in 2 formats (XL and TXT). ReferringExpressionsData_withValidationDetails.xlsx – userid (an integer number), age (range value – check look up table supplied for details), gender (male,female), photoid (links to the source images),x(coordinate x value where clicked),y(coordinate y value where clicked), annotation shown, status of validator (correct, incorrect, cantfind, ambiguous), validator_userid,validator_age(see lookup table), validator_gender(male,female) ReferringExpressionsData_withValidationDetails-TAB_delimited.txt Lookup table – age - this indicated the age ranges recorded in the results table. (e.g. class 4 = 41yr-50yr)	en_GB
dc.language.iso	eng	en_GB
dc.publisher	University of Stirling. Faculty of Natural Sciences.	en_GB
dc.relation	Bartie, P; Mackaness, W; Gkatzia, D; Rieser, V (2016) The REAL corpus. Version 1.0. University of Stirling. Faculty of Natural Sciences. Dataset and Image. http://hdl.handle.net/11667/82	en_GB
dc.relation.isversionof	http://www.timemirror.com/lrec2016.html	en_GB
dc.relation.isreferencedby	Bartie, P., Mackaness, W., Gkatzia, D. and Rieser, V (2016) The REAL corpus: A crowd-sourced Corpus of human generated and evaluated spatial references to real-world urban scenes In: Calzolari, N., Choukri, K., Mazo, H., Moreno, A., Declerck, T., Goggi, S., Grobelnik, M., Odijk, J., Piperidis, S., Maegaard, B., Mariani, J. (ed.) Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, Paris: European Language Resources Association (ELRA). 10th International Conference on Language Resources and Evaluation, LREC 2016, 23.5.2016 - 28.5.2016, Portoroz, Slovenia, pp. 2153-2155. http://www.lrec-conf.org/proceedings/lrec2016/pdf/1035_Paper.pdf Available from: http://hdl.handle.net/1893/26431	en_GB
dc.rights	Rights covered by the standard CC-BY 4.0 licence: https://creativecommons.org/licenses/by/4.0/	en_GB
dc.source	Web experimenets	en_GB
dc.subject	Geoinformatics	en_GB
dc.subject	Natural Language Processing	en_GB
dc.subject	Referring expressions	en_GB
dc.subject.classification	::Information and communication technologies	en_GB
dc.title	The REAL corpus	en_GB
dc.type	dataset	en_GB
dc.type	image	en_GB
dc.description.version	1.0	en_GB
dc.contributor.email	phil.bartie@stir.ac.uk	en_GB
dc.identifier.projectid	FP7/2007- 2013	en_GB
dc.identifier.projectid	EP/L026775/1	en_GB
dc.identifier.projectid	EP/M005429/1	en_GB
dc.title.project	SPACEBOOK project	en_GB
dc.title.project	GUI - Generation for Uncertain Information	en_GB
dc.title.project	DILiGENt -Domain-Independent Language Generation	en_GB
dc.contributor.affiliation	University of Stirling (Biological and Environmental Sciences)	en_GB
dc.contributor.affiliation	University of Edinburgh	en_GB
dc.contributor.affiliation	Edinburgh Napier University	en_GB
dc.contributor.affiliation	Heriot-Watt University	en_GB
dc.date.publicationyear	2016	en_GB
Appears in Collections:	University of Stirling Research Data

Files in This Item:

File	Description	Size	Format
REAL_Corpus.zip	Data (zip)	61.17 MB	Unknown	View/Open

This item is protected by original copyright

View Licence

Show simple item record Recommend this item

DataSTORRE

DataSTORRE: Stirling Online Repository for Research Data.