Please use this identifier to cite or link to this item: http://hdl.handle.net/11667/81
Appears in Collections:University of Stirling Research Data
Title: Audiovisual Dataset for audiovisual speech mapping using the Grid Corpus
Other Titles: An audiovisual corpus of paired vectors
Creator(s): Abel, Andrew
Hussain, Amir
Contact Email: ahu@cs.stir.ac.uk
Date Available: 27-Sep-2016
Citation: Abel, A; Hussain, A (2016): Audiovisual Dataset for audiovisual speech mapping using the Grid Corpus. University of Stirling. Dataset. http://hdl.handle.net/11667/81
Publisher: Computing Science and Mathematics, University of Stirling, Scotland
Dataset Description (Abstract): This dataset contains a range of joint audiovisual vectors, in the form of 2D-DCT visual features, and the equivalent audio log-filterbank vector. All visual vectors were extracted by tracking and cropping the lip region of a range of Grid videos (1000 videos from five speakers, giving a total of 5000 videos), and then transforming the region with 2D-DCT. The audio vector was extracted by windowing the audio signal, and transforming each frame into a log-filterbank vector. The visual signal was then interpolated to match the audio, and a number of large datasets were created, with the frames shuffled randomly to prevent bias, and with different pairings, including multiple visual frames to estimate a single audio frame (from one visual to one audio pairings, to 28 visual to one audio pairings). The aim of this dataset was to evaluate how well audio speech could be estimated using visual information, and is in a format that can be input into a machine learning approach such as a neural network. The dataset was created by Andrew Abel and Amir Hussain, original data taken from the Grid Corpus, recorded by Cooke, Barker, Cunningham and Shao (see: An audio-visual corpus for speech perception and automatic speech recognition, 2006).
Dataset Description (TOC): The contents of the files are detailed in contents.xlsx
Type: dataset
Contract/Grant Title: Towards visually-driven speech enhancement for cognitively-inspired multi-modal hearing-aid devices (AV-COGHEAR)
Funder(s): EPSRC - Engineering and Physical Sciences Research Council
Contract/Grant Number: EP/M026981/1
URI: http://hdl.handle.net/11667/81
Rights: Rights covered by the standard CC-BY 4.0 licence: https://creativecommons.org/licenses/by/4.0/
Affiliation(s) of Dataset Creator(s): University of Stirling

Files in This Item:
File Description SizeFormat 
contents.xlsxTable of contents10.24 kBMicrosoft Excel XMLView/Open
TESTB_50x5_Shuf_1prior_0_diff_alignDCT.matA small scale test dataset to confirm that the system functions as required, using aligned data with endpointed GRID sentences16.05 MBUnknownView/Open
TESTB_50x5_Shuf_1prior_0_diff_fullvidDCT.matA small scale test dataset to confirm that the system functions as required, using aligned data with the full GRID sentences26.46 MBUnknownView/Open
unsplit_900x1_spk_1_Shuf_1prior_0_diff_alignedDCT.mat900 Grid Sentences from 1 speaker (spk1) of the grid dataset, with only 1 frame used for visual data, using aligned data46.69 MBUnknownView/Open
unsplit_900x1_spk_2_Shuf_1prior_0_diff_alignedDCT.mat900 Grid Sentences from 1 speaker (spk2) of the grid dataset, with only 1 frame used for visual data, using aligned data58.42 MBUnknownView/Open
unsplit_900x1_spk_1_Shuf_14prior_0_diff_alignedDCT.mat900 Grid Sentences from 1 speaker (spk1) of the grid dataset, with 14 frames used for visual data, using aligned data296.79 MBUnknownView/Open
unsplit_900x1_spk_2_Shuf_14prior_0_diff_alignedDCT.mat900 Grid Sentences from 1 speaker (spk2) of the grid dataset, with 14 frame used for visual data, using aligned data390.5 MBUnknownView/Open
unsplit_900x1_spk_3_Shuf_1prior_0_diff_alignedDCT.mat900 Grid Sentences from 1 speaker (spk3) of the grid dataset, with only 1 frame used for visual data, using aligned data58.83 MBUnknownView/Open
unsplit_900x1_spk_4_Shuf_1prior_0_diff_alignedDCT.mat900 Grid Sentences from 1 speaker (spk4) of the grid dataset, with only 1 frame used for visual data, using aligned data60.31 MBUnknownView/Open
unsplit_900x1_spk_3_Shuf_14prior_0_diff_alignedDCT.mat900 Grid Sentences from 1 speaker (spk3) of the grid dataset, with 14 frame used for visual data, using aligned data389.22 MBUnknownView/Open
unsplit_900x1_spk_5_Shuf_1prior_0_diff_alignedDCT.mat900 Grid Sentences from 1 speaker (spk5) of the grid dataset, with only 1 frame used for visual data, using aligned data60.42 MBUnknownView/Open
unsplit_900x1_spk_4_Shuf_14prior_0_diff_alignedDCT.mat900 Grid Sentences from 1 speaker (spk4) of the grid dataset, with 14 frame used for visual data, using aligned data400.18 MBUnknownView/Open
unsplit_900x1_spk_5_Shuf_14prior_0_diff_alignedDCT.mat900 Grid Sentences from 1 speaker (spk5) of the grid dataset, with 14 frame used for visual data, using aligned data402.09 MBUnknownView/Open
unsplit_900x5_Shuf_1prior_0_diff_alignDCT.mat900 Grid Sentences from 5 speakers of the grid dataset (900*5 sents), with only 1 frame used for visual data, using aligned data284.96 MBUnknownView/Open
unsplit_900x5_Shuf_1prior_0_diff_fullvidDCT.mat900 Grid Sentences from 5 speakers of the grid dataset (900*5 sents), with only 1 frame used for visual data, using ful grid sentences470.51 MBUnknownView/Open
unsplit_900x5_Shuf_4prior_0_diff_alignDCT.mat900 Grid Sentences from 5 speakers of the grid dataset (900*5 sents), with 4 frames used for visual data, using aligned data699.81 MBUnknownView/Open
unsplit_900x5_Shuf_8prior_0_diff_alignDCT.mat900 Grid Sentences from 5 speakers of the grid dataset (900*5 sents), with 8 frames used for visual data, using aligned data1.21 GBUnknownView/Open
unsplit_900x5_Shuf_12prior_0_diff_alignDCT.mat900 Grid Sentences from 5 speakers of the grid dataset (900*5 sents), with 12 frames used for visual data, using aligned data1.67 GBUnknownView/Open
unsplit_900x5_Shuf_14prior_0_diff_alignDCT.mat900 Grid Sentences from 5 speakers of the grid dataset (900*5 sents), with 14 frames used for visual data, using aligned data1.88 GBUnknownView/Open
unsplit_900x5_Shuf_16prior_0_diff_alignDCT.mat900 Grid Sentences from 5 speakers of the grid dataset (900*5 sents), with 16 frames used for visual data, using aligned data2.08 GBUnknownView/Open
unsplit_900x5_Shuf_20prior_0_diff_alignDCT.mat900 Grid Sentences from 5 speakers of the grid dataset (900*5 sents), with 20 frames used for visual data, using aligned data2.44 GBUnknownView/Open


This item is protected by original copyright



Items in DataSTORRE are protected by copyright, with all rights reserved, unless otherwise indicated.