Audiovisual Dataset for audiovisual speech mapping using the Grid Corpus

Abel, Andrew; Hussain, Amir

Please use this identifier to cite or link to this item: http://hdl.handle.net/11667/81

Appears in Collections:	University of Stirling Research Data
Title:	Audiovisual Dataset for audiovisual speech mapping using the Grid Corpus
Other Titles:	An audiovisual corpus of paired vectors
Creator(s):	Abel, Andrew Hussain, Amir
Contact Email:	ahu@cs.stir.ac.uk
Date Available:	27-Sep-2016
Citation:	Abel, A; Hussain, A (2016): Audiovisual Dataset for audiovisual speech mapping using the Grid Corpus. University of Stirling. Computing Science and Mathematics. Dataset. http://hdl.handle.net/11667/81
Publisher:	University of Stirling. Computing Science and Mathematics.
Dataset Description (Abstract):	This dataset contains a range of joint audiovisual vectors, in the form of 2D-DCT visual features, and the equivalent audio log-filterbank vector. All visual vectors were extracted by tracking and cropping the lip region of a range of Grid videos (1000 videos from five speakers, giving a total of 5000 videos), and then transforming the region with 2D-DCT. The audio vector was extracted by windowing the audio signal, and transforming each frame into a log-filterbank vector. The visual signal was then interpolated to match the audio, and a number of large datasets were created, with the frames shuffled randomly to prevent bias, and with different pairings, including multiple visual frames to estimate a single audio frame (from one visual to one audio pairings, to 28 visual to one audio pairings). The aim of this dataset was to evaluate how well audio speech could be estimated using visual information, and is in a format that can be input into a machine learning approach such as a neural network. The dataset was created by Andrew Abel and Amir Hussain, original data taken from the Grid Corpus, recorded by Cooke, Barker, Cunningham and Shao (see: An audio-visual corpus for speech perception and automatic speech recognition, 2006).
Dataset Description (TOC):	The contents of the files are detailed in contents.xlsx
Type:	dataset
Contract/Grant Title:	Towards visually-driven speech enhancement for cognitively-inspired multi-modal hearing-aid devices (AV-COGHEAR)
Funder(s):	EPSRC - Engineering and Physical Sciences Research Council
Contract/Grant Number:	EP/M026981/1
RMS ID:	1803
URI:	http://hdl.handle.net/11667/81
Rights:	Rights covered by the standard CC-BY 4.0 licence: https://creativecommons.org/licenses/by/4.0/
Affiliation(s) of Dataset Creator(s):	University of Stirling (Computing Science - CSM Dept)

Files in This Item:

File	Description	Size	Format
contents.xlsx	Table of contents	10.24 kB	Microsoft Excel XML	View/Open
TESTB_50x5_Shuf_1prior_0_diff_alignDCT.mat	A small scale test dataset to confirm that the system functions as required, using aligned data with endpointed GRID sentences	16.05 MB	Unknown	View/Open
TESTB_50x5_Shuf_1prior_0_diff_fullvidDCT.mat	A small scale test dataset to confirm that the system functions as required, using aligned data with the full GRID sentences	26.46 MB	Unknown	View/Open
unsplit_900x1_spk_1_Shuf_1prior_0_diff_alignedDCT.mat	900 Grid Sentences from 1 speaker (spk1) of the grid dataset, with only 1 frame used for visual data, using aligned data	46.69 MB	Unknown	View/Open
unsplit_900x1_spk_2_Shuf_1prior_0_diff_alignedDCT.mat	900 Grid Sentences from 1 speaker (spk2) of the grid dataset, with only 1 frame used for visual data, using aligned data	58.42 MB	Unknown	View/Open
unsplit_900x1_spk_1_Shuf_14prior_0_diff_alignedDCT.mat	900 Grid Sentences from 1 speaker (spk1) of the grid dataset, with 14 frames used for visual data, using aligned data	296.79 MB	Unknown	View/Open
unsplit_900x1_spk_2_Shuf_14prior_0_diff_alignedDCT.mat	900 Grid Sentences from 1 speaker (spk2) of the grid dataset, with 14 frame used for visual data, using aligned data	390.5 MB	Unknown	View/Open
unsplit_900x1_spk_3_Shuf_1prior_0_diff_alignedDCT.mat	900 Grid Sentences from 1 speaker (spk3) of the grid dataset, with only 1 frame used for visual data, using aligned data	58.83 MB	Unknown	View/Open
unsplit_900x1_spk_4_Shuf_1prior_0_diff_alignedDCT.mat	900 Grid Sentences from 1 speaker (spk4) of the grid dataset, with only 1 frame used for visual data, using aligned data	60.31 MB	Unknown	View/Open
unsplit_900x1_spk_3_Shuf_14prior_0_diff_alignedDCT.mat	900 Grid Sentences from 1 speaker (spk3) of the grid dataset, with 14 frame used for visual data, using aligned data	389.22 MB	Unknown	View/Open
unsplit_900x1_spk_5_Shuf_1prior_0_diff_alignedDCT.mat	900 Grid Sentences from 1 speaker (spk5) of the grid dataset, with only 1 frame used for visual data, using aligned data	60.42 MB	Unknown	View/Open
unsplit_900x1_spk_4_Shuf_14prior_0_diff_alignedDCT.mat	900 Grid Sentences from 1 speaker (spk4) of the grid dataset, with 14 frame used for visual data, using aligned data	400.18 MB	Unknown	View/Open
unsplit_900x1_spk_5_Shuf_14prior_0_diff_alignedDCT.mat	900 Grid Sentences from 1 speaker (spk5) of the grid dataset, with 14 frame used for visual data, using aligned data	402.09 MB	Unknown	View/Open
unsplit_900x5_Shuf_1prior_0_diff_alignDCT.mat	900 Grid Sentences from 5 speakers of the grid dataset (900*5 sents), with only 1 frame used for visual data, using aligned data	284.96 MB	Unknown	View/Open
unsplit_900x5_Shuf_1prior_0_diff_fullvidDCT.mat	900 Grid Sentences from 5 speakers of the grid dataset (900*5 sents), with only 1 frame used for visual data, using ful grid sentences	470.51 MB	Unknown	View/Open
unsplit_900x5_Shuf_4prior_0_diff_alignDCT.mat	900 Grid Sentences from 5 speakers of the grid dataset (900*5 sents), with 4 frames used for visual data, using aligned data	699.81 MB	Unknown	View/Open
unsplit_900x5_Shuf_8prior_0_diff_alignDCT.mat	900 Grid Sentences from 5 speakers of the grid dataset (900*5 sents), with 8 frames used for visual data, using aligned data	1.21 GB	Unknown	View/Open
unsplit_900x5_Shuf_12prior_0_diff_alignDCT.mat	900 Grid Sentences from 5 speakers of the grid dataset (900*5 sents), with 12 frames used for visual data, using aligned data	1.67 GB	Unknown	View/Open
unsplit_900x5_Shuf_14prior_0_diff_alignDCT.mat	900 Grid Sentences from 5 speakers of the grid dataset (900*5 sents), with 14 frames used for visual data, using aligned data	1.88 GB	Unknown	View/Open
unsplit_900x5_Shuf_16prior_0_diff_alignDCT.mat	900 Grid Sentences from 5 speakers of the grid dataset (900*5 sents), with 16 frames used for visual data, using aligned data	2.08 GB	Unknown	View/Open
unsplit_900x5_Shuf_20prior_0_diff_alignDCT.mat	900 Grid Sentences from 5 speakers of the grid dataset (900*5 sents), with 20 frames used for visual data, using aligned data	2.44 GB	Unknown	View/Open

This item is protected by original copyright

View Licence

Show full item record Recommend this item

DataSTORRE

DataSTORRE: Stirling Online Repository for Research Data.