Please use this identifier to cite or link to this item: http://hdl.handle.net/11667/82
Appears in Collections:University of Stirling Research Data
Title: The REAL corpus
Creator(s): Bartie, Phil
Mackaness, William
Gkatzia, Dimitra
Rieser, Verena
Contact Email: phil.bartie@stir.ac.uk
Keywords: Geoinformatics
Natural Language Processing
Referring expressions
Date Available: 28-Sep-2016
Citation: Bartie, P; Mackaness, W; Gkatzia, D; Rieser, V (2016) The REAL corpus. Version 1.0. University of Stirling. School of Natural Sciences. Dataset and Image. http://hdl.handle.net/11667/82
Publisher: University of Stirling. School of Natural Sciences.
Dataset Description (Abstract): Our interest is in people’s capacity to efficiently and effectively describe geographic objects in urban scenes. The broader ambition is to develop spatial models capable of equivalent functionality able to construct such referring expressions. To that end we present a newly crowd-sourced data set of natural language references to objects anchored in complex urban scenes (In short: The REAL Corpus – Referring Expressions Anchored Language). The REAL corpus contains a collection of images of real-world urban scenes together with verbal descriptions of target objects generated by humans, paired with data on how successful other people were able to identify the same object based on these descriptions. In total, the corpus contains 32 images with on average 27 descriptions per image and 3 verifications for each description. In addition, the corpus is annotated with a variety of linguistically motivated features. The paper highlights issues posed by collecting data using crowd-sourcing with an unrestricted input format, as well as using real-world urban scenes.
Dataset Description (TOC): The dataset includes a set of SOURCE images of features in typical urban scenes. A target was indicated in each image and participants were asked to describe that target (these words/phrases were typed by the participant). A validation process then asked other participants to read the description and tag (click) the object on the corresponding image. A set of validation images were generated to show if the tagged location was correct. Source Images – these are presented at two resolutions – the high quality 3000by2000 pixel version and a lower 825by550 pixel version of the same image. - source images are given a filename imgN.jpg and a corresponding version of the image with the designated target indicated is saved as imgNt.jpg - the participant saw the source version of the image but could toggle to see the target version briefly to know which object to describe in the scene Validation Images – these images are 825 x 550 pixels and have superimposed GREEN (correct target) and RED (incorrect) dots for where the validators have clicked. This gives an indication of how well the description worked and other features that were confused with the intended The data collected from the web based experiments are available in 2 formats (XL and TXT). ReferringExpressionsData_withValidationDetails.xlsx – userid (an integer number), age (range value – check look up table supplied for details), gender (male,female), photoid (links to the source images),x(coordinate x value where clicked),y(coordinate y value where clicked), annotation shown, status of validator (correct, incorrect, cantfind, ambiguous), validator_userid,validator_age(see lookup table), validator_gender(male,female) ReferringExpressionsData_withValidationDetails-TAB_delimited.txt Lookup table – age - this indicated the age ranges recorded in the results table. (e.g. class 4 = 41yr-50yr)
Type: dataset
image
Contract/Grant Title: SPACEBOOK project
GUI - Generation for Uncertain Information
DILiGENt -Domain-Independent Language Generation
Funder(s): EC - European Commission
EPSRC - Engineering and Physical Sciences Research Council
Contract/Grant Number: FP7/2007- 2013
EP/L026775/1
EP/M005429/1
Geographic Location(s): United Kingdom
Time Period: 2015-2016
URI: http://hdl.handle.net/11667/82
Rights: Rights covered by the standard CC-BY 4.0 licence: https://creativecommons.org/licenses/by/4.0/
Affiliation(s) of Dataset Creator(s): University of Stirling
University of Edinburgh
Edinburgh Napier University
Heriot-Watt University

Files in This Item:
File Description SizeFormat 
REAL_Corpus.zipData (zip)61.17 MBUnknownView/Open


This item is protected by original copyright



Items in DataSTORRE are protected by copyright, with all rights reserved, unless otherwise indicated.