Please use this identifier to cite or link to this item: http://hdl.handle.net/11667/81
Full metadata record
DC FieldValueLanguage
dc.contributorAdeel, Ahsan-
dc.contributor.otherEPSRC - Engineering and Physical Sciences Research Councilen_GB
dc.creatorAbel, Andrew-
dc.creatorHussain, Amir-
dc.date.accessioned2016-09-27T09:33:12Z-
dc.date.available2016-09-27T09:33:12Z-
dc.date.created2016-05-01-
dc.identifier.urihttp://hdl.handle.net/11667/81-
dc.description.abstractThis dataset contains a range of joint audiovisual vectors, in the form of 2D-DCT visual features, and the equivalent audio log-filterbank vector. All visual vectors were extracted by tracking and cropping the lip region of a range of Grid videos (1000 videos from five speakers, giving a total of 5000 videos), and then transforming the region with 2D-DCT. The audio vector was extracted by windowing the audio signal, and transforming each frame into a log-filterbank vector. The visual signal was then interpolated to match the audio, and a number of large datasets were created, with the frames shuffled randomly to prevent bias, and with different pairings, including multiple visual frames to estimate a single audio frame (from one visual to one audio pairings, to 28 visual to one audio pairings). The aim of this dataset was to evaluate how well audio speech could be estimated using visual information, and is in a format that can be input into a machine learning approach such as a neural network. The dataset was created by Andrew Abel and Amir Hussain, original data taken from the Grid Corpus, recorded by Cooke, Barker, Cunningham and Shao (see: An audio-visual corpus for speech perception and automatic speech recognition, 2006).en_GB
dc.description.tableofcontentsThe contents of the files are detailed in contents.xlsxen_GB
dc.language.isoengen_GB
dc.publisherUniversity of Stirling. Computing Science and Mathematics.en_GB
dc.relationAbel, A; Hussain, A (2016): Audiovisual Dataset for audiovisual speech mapping using the Grid Corpus. University of Stirling. Computing Science and Mathematics. Dataset. http://hdl.handle.net/11667/81en_GB
dc.rightsRights covered by the standard CC-BY 4.0 licence: https://creativecommons.org/licenses/by/4.0/en_GB
dc.subject.classification::Information and communication technologiesen_GB
dc.titleAudiovisual Dataset for audiovisual speech mapping using the Grid Corpusen_GB
dc.title.alternativeAn audiovisual corpus of paired vectorsen_GB
dc.typedataseten_GB
dc.contributor.emailahu@cs.stir.ac.uken_GB
dc.identifier.rmsid1803en_GB
dc.identifier.projectidEP/M026981/1en_GB
dc.title.projectTowards visually-driven speech enhancement for cognitively-inspired multi-modal hearing-aid devices (AV-COGHEAR)en_GB
dc.contributor.affiliationUniversity of Stirling (Computing Science - CSM Dept)en_GB
dc.date.publicationyear2016en_GB
Appears in Collections:University of Stirling Research Data

Files in This Item:
File Description SizeFormat 
contents.xlsxTable of contents10.24 kBMicrosoft Excel XMLView/Open
TESTB_50x5_Shuf_1prior_0_diff_alignDCT.matA small scale test dataset to confirm that the system functions as required, using aligned data with endpointed GRID sentences16.05 MBUnknownView/Open
TESTB_50x5_Shuf_1prior_0_diff_fullvidDCT.matA small scale test dataset to confirm that the system functions as required, using aligned data with the full GRID sentences26.46 MBUnknownView/Open
unsplit_900x1_spk_1_Shuf_1prior_0_diff_alignedDCT.mat900 Grid Sentences from 1 speaker (spk1) of the grid dataset, with only 1 frame used for visual data, using aligned data46.69 MBUnknownView/Open
unsplit_900x1_spk_2_Shuf_1prior_0_diff_alignedDCT.mat900 Grid Sentences from 1 speaker (spk2) of the grid dataset, with only 1 frame used for visual data, using aligned data58.42 MBUnknownView/Open
unsplit_900x1_spk_1_Shuf_14prior_0_diff_alignedDCT.mat900 Grid Sentences from 1 speaker (spk1) of the grid dataset, with 14 frames used for visual data, using aligned data296.79 MBUnknownView/Open
unsplit_900x1_spk_2_Shuf_14prior_0_diff_alignedDCT.mat900 Grid Sentences from 1 speaker (spk2) of the grid dataset, with 14 frame used for visual data, using aligned data390.5 MBUnknownView/Open
unsplit_900x1_spk_3_Shuf_1prior_0_diff_alignedDCT.mat900 Grid Sentences from 1 speaker (spk3) of the grid dataset, with only 1 frame used for visual data, using aligned data58.83 MBUnknownView/Open
unsplit_900x1_spk_4_Shuf_1prior_0_diff_alignedDCT.mat900 Grid Sentences from 1 speaker (spk4) of the grid dataset, with only 1 frame used for visual data, using aligned data60.31 MBUnknownView/Open
unsplit_900x1_spk_3_Shuf_14prior_0_diff_alignedDCT.mat900 Grid Sentences from 1 speaker (spk3) of the grid dataset, with 14 frame used for visual data, using aligned data389.22 MBUnknownView/Open
unsplit_900x1_spk_5_Shuf_1prior_0_diff_alignedDCT.mat900 Grid Sentences from 1 speaker (spk5) of the grid dataset, with only 1 frame used for visual data, using aligned data60.42 MBUnknownView/Open
unsplit_900x1_spk_4_Shuf_14prior_0_diff_alignedDCT.mat900 Grid Sentences from 1 speaker (spk4) of the grid dataset, with 14 frame used for visual data, using aligned data400.18 MBUnknownView/Open
unsplit_900x1_spk_5_Shuf_14prior_0_diff_alignedDCT.mat900 Grid Sentences from 1 speaker (spk5) of the grid dataset, with 14 frame used for visual data, using aligned data402.09 MBUnknownView/Open
unsplit_900x5_Shuf_1prior_0_diff_alignDCT.mat900 Grid Sentences from 5 speakers of the grid dataset (900*5 sents), with only 1 frame used for visual data, using aligned data284.96 MBUnknownView/Open
unsplit_900x5_Shuf_1prior_0_diff_fullvidDCT.mat900 Grid Sentences from 5 speakers of the grid dataset (900*5 sents), with only 1 frame used for visual data, using ful grid sentences470.51 MBUnknownView/Open
unsplit_900x5_Shuf_4prior_0_diff_alignDCT.mat900 Grid Sentences from 5 speakers of the grid dataset (900*5 sents), with 4 frames used for visual data, using aligned data699.81 MBUnknownView/Open
unsplit_900x5_Shuf_8prior_0_diff_alignDCT.mat900 Grid Sentences from 5 speakers of the grid dataset (900*5 sents), with 8 frames used for visual data, using aligned data1.21 GBUnknownView/Open
unsplit_900x5_Shuf_12prior_0_diff_alignDCT.mat900 Grid Sentences from 5 speakers of the grid dataset (900*5 sents), with 12 frames used for visual data, using aligned data1.67 GBUnknownView/Open
unsplit_900x5_Shuf_14prior_0_diff_alignDCT.mat900 Grid Sentences from 5 speakers of the grid dataset (900*5 sents), with 14 frames used for visual data, using aligned data1.88 GBUnknownView/Open
unsplit_900x5_Shuf_16prior_0_diff_alignDCT.mat900 Grid Sentences from 5 speakers of the grid dataset (900*5 sents), with 16 frames used for visual data, using aligned data2.08 GBUnknownView/Open
unsplit_900x5_Shuf_20prior_0_diff_alignDCT.mat900 Grid Sentences from 5 speakers of the grid dataset (900*5 sents), with 20 frames used for visual data, using aligned data2.44 GBUnknownView/Open


This item is protected by original copyright



Items in DataSTORRE are protected by copyright, with all rights reserved, unless otherwise indicated.