Please use this identifier to cite or link to this item: http://hdl.handle.net/11667/82
Full metadata record
DC FieldValueLanguage
dc.contributorBartie, Phil-
dc.contributor.otherEC - European Commissionen_GB
dc.contributor.otherEPSRC - Engineering and Physical Sciences Research Councilen_GB
dc.coverage.spatialUnited Kingdomen_GB
dc.coverage.temporal2015-2016en_GB
dc.creatorBartie, Phil-
dc.creatorMackaness, William-
dc.creatorGkatzia, Dimitra-
dc.creatorRieser, Verena-
dc.date.accessioned2016-09-28T13:04:23Z-
dc.date.available2016-09-28T13:04:23Z-
dc.date.created2016-05-23-
dc.identifier.urihttp://hdl.handle.net/11667/82-
dc.description.abstractOur interest is in people’s capacity to efficiently and effectively describe geographic objects in urban scenes. The broader ambition is to develop spatial models capable of equivalent functionality able to construct such referring expressions. To that end we present a newly crowd-sourced data set of natural language references to objects anchored in complex urban scenes (In short: The REAL Corpus – Referring Expressions Anchored Language). The REAL corpus contains a collection of images of real-world urban scenes together with verbal descriptions of target objects generated by humans, paired with data on how successful other people were able to identify the same object based on these descriptions. In total, the corpus contains 32 images with on average 27 descriptions per image and 3 verifications for each description. In addition, the corpus is annotated with a variety of linguistically motivated features. The paper highlights issues posed by collecting data using crowd-sourcing with an unrestricted input format, as well as using real-world urban scenes.en_GB
dc.description.tableofcontentsThe dataset includes a set of SOURCE images of features in typical urban scenes. A target was indicated in each image and participants were asked to describe that target (these words/phrases were typed by the participant). A validation process then asked other participants to read the description and tag (click) the object on the corresponding image. A set of validation images were generated to show if the tagged location was correct. Source Images – these are presented at two resolutions – the high quality 3000by2000 pixel version and a lower 825by550 pixel version of the same image. - source images are given a filename imgN.jpg and a corresponding version of the image with the designated target indicated is saved as imgNt.jpg - the participant saw the source version of the image but could toggle to see the target version briefly to know which object to describe in the scene Validation Images – these images are 825 x 550 pixels and have superimposed GREEN (correct target) and RED (incorrect) dots for where the validators have clicked. This gives an indication of how well the description worked and other features that were confused with the intended The data collected from the web based experiments are available in 2 formats (XL and TXT). ReferringExpressionsData_withValidationDetails.xlsx – userid (an integer number), age (range value – check look up table supplied for details), gender (male,female), photoid (links to the source images),x(coordinate x value where clicked),y(coordinate y value where clicked), annotation shown, status of validator (correct, incorrect, cantfind, ambiguous), validator_userid,validator_age(see lookup table), validator_gender(male,female) ReferringExpressionsData_withValidationDetails-TAB_delimited.txt Lookup table – age - this indicated the age ranges recorded in the results table. (e.g. class 4 = 41yr-50yr)en_GB
dc.language.isoengen_GB
dc.publisherUniversity of Stirling. Faculty of Natural Sciences.en_GB
dc.relationBartie, P; Mackaness, W; Gkatzia, D; Rieser, V (2016) The REAL corpus. Version 1.0. University of Stirling. Faculty of Natural Sciences. Dataset and Image. http://hdl.handle.net/11667/82en_GB
dc.relation.isversionofhttp://www.timemirror.com/lrec2016.htmlen_GB
dc.relation.isreferencedbyBartie, P., Mackaness, W., Gkatzia, D. and Rieser, V (2016) The REAL corpus: A crowd-sourced Corpus of human generated and evaluated spatial references to real-world urban scenes In: Calzolari, N., Choukri, K., Mazo, H., Moreno, A., Declerck, T., Goggi, S., Grobelnik, M., Odijk, J., Piperidis, S., Maegaard, B., Mariani, J. (ed.) Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, Paris: European Language Resources Association (ELRA). 10th International Conference on Language Resources and Evaluation, LREC 2016, 23.5.2016 - 28.5.2016, Portoroz, Slovenia, pp. 2153-2155. http://www.lrec-conf.org/proceedings/lrec2016/pdf/1035_Paper.pdf Available from: http://hdl.handle.net/1893/26431en_GB
dc.rightsRights covered by the standard CC-BY 4.0 licence: https://creativecommons.org/licenses/by/4.0/en_GB
dc.sourceWeb experimenetsen_GB
dc.subjectGeoinformaticsen_GB
dc.subjectNatural Language Processingen_GB
dc.subjectReferring expressionsen_GB
dc.subject.classification::Information and communication technologiesen_GB
dc.titleThe REAL corpusen_GB
dc.typedataseten_GB
dc.typeimageen_GB
dc.description.version1.0en_GB
dc.contributor.emailphil.bartie@stir.ac.uken_GB
dc.identifier.projectidFP7/2007- 2013en_GB
dc.identifier.projectidEP/L026775/1en_GB
dc.identifier.projectidEP/M005429/1en_GB
dc.title.projectSPACEBOOK projecten_GB
dc.title.projectGUI - Generation for Uncertain Informationen_GB
dc.title.projectDILiGENt -Domain-Independent Language Generationen_GB
dc.contributor.affiliationUniversity of Stirling (Biological and Environmental Sciences)en_GB
dc.contributor.affiliationUniversity of Edinburghen_GB
dc.contributor.affiliationEdinburgh Napier Universityen_GB
dc.contributor.affiliationHeriot-Watt Universityen_GB
dc.date.publicationyear2016en_GB
Appears in Collections:University of Stirling Research Data

Files in This Item:
File Description SizeFormat 
REAL_Corpus.zipData (zip)61.17 MBUnknownView/Open


This item is protected by original copyright



Items in DataSTORRE are protected by copyright, with all rights reserved, unless otherwise indicated.