Please use this identifier to cite or link to this item:
http://hdl.handle.net/11667/81
Appears in Collections: | University of Stirling Research Data |
Title: | Audiovisual Dataset for audiovisual speech mapping using the Grid Corpus |
Other Titles: | An audiovisual corpus of paired vectors |
Creator(s): | Abel, Andrew Hussain, Amir |
Contact Email: | ahu@cs.stir.ac.uk |
Date Available: | 27-Sep-2016 |
Citation: | Abel, A; Hussain, A (2016): Audiovisual Dataset for audiovisual speech mapping using the Grid Corpus. University of Stirling. Computing Science and Mathematics. Dataset. http://hdl.handle.net/11667/81 |
Publisher: | University of Stirling. Computing Science and Mathematics. |
Dataset Description (Abstract): | This dataset contains a range of joint audiovisual vectors, in the form of 2D-DCT visual features, and the equivalent audio log-filterbank vector. All visual vectors were extracted by tracking and cropping the lip region of a range of Grid videos (1000 videos from five speakers, giving a total of 5000 videos), and then transforming the region with 2D-DCT. The audio vector was extracted by windowing the audio signal, and transforming each frame into a log-filterbank vector. The visual signal was then interpolated to match the audio, and a number of large datasets were created, with the frames shuffled randomly to prevent bias, and with different pairings, including multiple visual frames to estimate a single audio frame (from one visual to one audio pairings, to 28 visual to one audio pairings). The aim of this dataset was to evaluate how well audio speech could be estimated using visual information, and is in a format that can be input into a machine learning approach such as a neural network. The dataset was created by Andrew Abel and Amir Hussain, original data taken from the Grid Corpus, recorded by Cooke, Barker, Cunningham and Shao (see: An audio-visual corpus for speech perception and automatic speech recognition, 2006). |
Dataset Description (TOC): | The contents of the files are detailed in contents.xlsx |
Type: | dataset |
Contract/Grant Title: | Towards visually-driven speech enhancement for cognitively-inspired multi-modal hearing-aid devices (AV-COGHEAR) |
Funder(s): | EPSRC - Engineering and Physical Sciences Research Council |
Contract/Grant Number: | EP/M026981/1 |
RMS ID: | 1803 |
URI: | http://hdl.handle.net/11667/81 |
Rights: | Rights covered by the standard CC-BY 4.0 licence: https://creativecommons.org/licenses/by/4.0/ |
Affiliation(s) of Dataset Creator(s): | University of Stirling (Computing Science - CSM Dept) |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
contents.xlsx | Table of contents | 10.24 kB | Microsoft Excel XML | View/Open |
TESTB_50x5_Shuf_1prior_0_diff_alignDCT.mat | A small scale test dataset to confirm that the system functions as required, using aligned data with endpointed GRID sentences | 16.05 MB | Unknown | View/Open |
TESTB_50x5_Shuf_1prior_0_diff_fullvidDCT.mat | A small scale test dataset to confirm that the system functions as required, using aligned data with the full GRID sentences | 26.46 MB | Unknown | View/Open |
unsplit_900x1_spk_1_Shuf_1prior_0_diff_alignedDCT.mat | 900 Grid Sentences from 1 speaker (spk1) of the grid dataset, with only 1 frame used for visual data, using aligned data | 46.69 MB | Unknown | View/Open |
unsplit_900x1_spk_2_Shuf_1prior_0_diff_alignedDCT.mat | 900 Grid Sentences from 1 speaker (spk2) of the grid dataset, with only 1 frame used for visual data, using aligned data | 58.42 MB | Unknown | View/Open |
unsplit_900x1_spk_1_Shuf_14prior_0_diff_alignedDCT.mat | 900 Grid Sentences from 1 speaker (spk1) of the grid dataset, with 14 frames used for visual data, using aligned data | 296.79 MB | Unknown | View/Open |
unsplit_900x1_spk_2_Shuf_14prior_0_diff_alignedDCT.mat | 900 Grid Sentences from 1 speaker (spk2) of the grid dataset, with 14 frame used for visual data, using aligned data | 390.5 MB | Unknown | View/Open |
unsplit_900x1_spk_3_Shuf_1prior_0_diff_alignedDCT.mat | 900 Grid Sentences from 1 speaker (spk3) of the grid dataset, with only 1 frame used for visual data, using aligned data | 58.83 MB | Unknown | View/Open |
unsplit_900x1_spk_4_Shuf_1prior_0_diff_alignedDCT.mat | 900 Grid Sentences from 1 speaker (spk4) of the grid dataset, with only 1 frame used for visual data, using aligned data | 60.31 MB | Unknown | View/Open |
unsplit_900x1_spk_3_Shuf_14prior_0_diff_alignedDCT.mat | 900 Grid Sentences from 1 speaker (spk3) of the grid dataset, with 14 frame used for visual data, using aligned data | 389.22 MB | Unknown | View/Open |
unsplit_900x1_spk_5_Shuf_1prior_0_diff_alignedDCT.mat | 900 Grid Sentences from 1 speaker (spk5) of the grid dataset, with only 1 frame used for visual data, using aligned data | 60.42 MB | Unknown | View/Open |
unsplit_900x1_spk_4_Shuf_14prior_0_diff_alignedDCT.mat | 900 Grid Sentences from 1 speaker (spk4) of the grid dataset, with 14 frame used for visual data, using aligned data | 400.18 MB | Unknown | View/Open |
unsplit_900x1_spk_5_Shuf_14prior_0_diff_alignedDCT.mat | 900 Grid Sentences from 1 speaker (spk5) of the grid dataset, with 14 frame used for visual data, using aligned data | 402.09 MB | Unknown | View/Open |
unsplit_900x5_Shuf_1prior_0_diff_alignDCT.mat | 900 Grid Sentences from 5 speakers of the grid dataset (900*5 sents), with only 1 frame used for visual data, using aligned data | 284.96 MB | Unknown | View/Open |
unsplit_900x5_Shuf_1prior_0_diff_fullvidDCT.mat | 900 Grid Sentences from 5 speakers of the grid dataset (900*5 sents), with only 1 frame used for visual data, using ful grid sentences | 470.51 MB | Unknown | View/Open |
unsplit_900x5_Shuf_4prior_0_diff_alignDCT.mat | 900 Grid Sentences from 5 speakers of the grid dataset (900*5 sents), with 4 frames used for visual data, using aligned data | 699.81 MB | Unknown | View/Open |
unsplit_900x5_Shuf_8prior_0_diff_alignDCT.mat | 900 Grid Sentences from 5 speakers of the grid dataset (900*5 sents), with 8 frames used for visual data, using aligned data | 1.21 GB | Unknown | View/Open |
unsplit_900x5_Shuf_12prior_0_diff_alignDCT.mat | 900 Grid Sentences from 5 speakers of the grid dataset (900*5 sents), with 12 frames used for visual data, using aligned data | 1.67 GB | Unknown | View/Open |
unsplit_900x5_Shuf_14prior_0_diff_alignDCT.mat | 900 Grid Sentences from 5 speakers of the grid dataset (900*5 sents), with 14 frames used for visual data, using aligned data | 1.88 GB | Unknown | View/Open |
unsplit_900x5_Shuf_16prior_0_diff_alignDCT.mat | 900 Grid Sentences from 5 speakers of the grid dataset (900*5 sents), with 16 frames used for visual data, using aligned data | 2.08 GB | Unknown | View/Open |
unsplit_900x5_Shuf_20prior_0_diff_alignDCT.mat | 900 Grid Sentences from 5 speakers of the grid dataset (900*5 sents), with 20 frames used for visual data, using aligned data | 2.44 GB | Unknown | View/Open |
This item is protected by original copyright |
Items in DataSTORRE are protected by copyright, with all rights reserved, unless otherwise indicated.