Audiovisual Dataset for audiovisual speech mapping using the Grid Corpus

Abel, Andrew; Hussain, Amir

Please use this identifier to cite or link to this item: http://hdl.handle.net/11667/81

Full metadata record

DC Field	Value	Language
dc.contributor	Adeel, Ahsan	-
dc.contributor.other	EPSRC - Engineering and Physical Sciences Research Council	en_GB
dc.creator	Abel, Andrew	-
dc.creator	Hussain, Amir	-
dc.date.accessioned	2016-09-27T09:33:12Z	-
dc.date.available	2016-09-27T09:33:12Z	-
dc.date.created	2016-05-01	-
dc.identifier.uri	http://hdl.handle.net/11667/81	-
dc.description.abstract	This dataset contains a range of joint audiovisual vectors, in the form of 2D-DCT visual features, and the equivalent audio log-filterbank vector. All visual vectors were extracted by tracking and cropping the lip region of a range of Grid videos (1000 videos from five speakers, giving a total of 5000 videos), and then transforming the region with 2D-DCT. The audio vector was extracted by windowing the audio signal, and transforming each frame into a log-filterbank vector. The visual signal was then interpolated to match the audio, and a number of large datasets were created, with the frames shuffled randomly to prevent bias, and with different pairings, including multiple visual frames to estimate a single audio frame (from one visual to one audio pairings, to 28 visual to one audio pairings). The aim of this dataset was to evaluate how well audio speech could be estimated using visual information, and is in a format that can be input into a machine learning approach such as a neural network. The dataset was created by Andrew Abel and Amir Hussain, original data taken from the Grid Corpus, recorded by Cooke, Barker, Cunningham and Shao (see: An audio-visual corpus for speech perception and automatic speech recognition, 2006).	en_GB
dc.description.tableofcontents	The contents of the files are detailed in contents.xlsx	en_GB
dc.language.iso	eng	en_GB
dc.publisher	University of Stirling. Computing Science and Mathematics.	en_GB
dc.relation	Abel, A; Hussain, A (2016): Audiovisual Dataset for audiovisual speech mapping using the Grid Corpus. University of Stirling. Computing Science and Mathematics. Dataset. http://hdl.handle.net/11667/81	en_GB
dc.rights	Rights covered by the standard CC-BY 4.0 licence: https://creativecommons.org/licenses/by/4.0/	en_GB
dc.subject.classification	::Information and communication technologies	en_GB
dc.title	Audiovisual Dataset for audiovisual speech mapping using the Grid Corpus	en_GB
dc.title.alternative	An audiovisual corpus of paired vectors	en_GB
dc.type	dataset	en_GB
dc.contributor.email	ahu@cs.stir.ac.uk	en_GB
dc.identifier.rmsid	1803	en_GB
dc.identifier.projectid	EP/M026981/1	en_GB
dc.title.project	Towards visually-driven speech enhancement for cognitively-inspired multi-modal hearing-aid devices (AV-COGHEAR)	en_GB
dc.contributor.affiliation	University of Stirling (Computing Science - CSM Dept)	en_GB
dc.date.publicationyear	2016	en_GB
Appears in Collections:	University of Stirling Research Data

Files in This Item:

File	Description	Size	Format
contents.xlsx	Table of contents	10.24 kB	Microsoft Excel XML	View/Open
TESTB_50x5_Shuf_1prior_0_diff_alignDCT.mat	A small scale test dataset to confirm that the system functions as required, using aligned data with endpointed GRID sentences	16.05 MB	Unknown	View/Open
TESTB_50x5_Shuf_1prior_0_diff_fullvidDCT.mat	A small scale test dataset to confirm that the system functions as required, using aligned data with the full GRID sentences	26.46 MB	Unknown	View/Open
unsplit_900x1_spk_1_Shuf_1prior_0_diff_alignedDCT.mat	900 Grid Sentences from 1 speaker (spk1) of the grid dataset, with only 1 frame used for visual data, using aligned data	46.69 MB	Unknown	View/Open
unsplit_900x1_spk_2_Shuf_1prior_0_diff_alignedDCT.mat	900 Grid Sentences from 1 speaker (spk2) of the grid dataset, with only 1 frame used for visual data, using aligned data	58.42 MB	Unknown	View/Open
unsplit_900x1_spk_1_Shuf_14prior_0_diff_alignedDCT.mat	900 Grid Sentences from 1 speaker (spk1) of the grid dataset, with 14 frames used for visual data, using aligned data	296.79 MB	Unknown	View/Open
unsplit_900x1_spk_2_Shuf_14prior_0_diff_alignedDCT.mat	900 Grid Sentences from 1 speaker (spk2) of the grid dataset, with 14 frame used for visual data, using aligned data	390.5 MB	Unknown	View/Open
unsplit_900x1_spk_3_Shuf_1prior_0_diff_alignedDCT.mat	900 Grid Sentences from 1 speaker (spk3) of the grid dataset, with only 1 frame used for visual data, using aligned data	58.83 MB	Unknown	View/Open
unsplit_900x1_spk_4_Shuf_1prior_0_diff_alignedDCT.mat	900 Grid Sentences from 1 speaker (spk4) of the grid dataset, with only 1 frame used for visual data, using aligned data	60.31 MB	Unknown	View/Open
unsplit_900x1_spk_3_Shuf_14prior_0_diff_alignedDCT.mat	900 Grid Sentences from 1 speaker (spk3) of the grid dataset, with 14 frame used for visual data, using aligned data	389.22 MB	Unknown	View/Open
unsplit_900x1_spk_5_Shuf_1prior_0_diff_alignedDCT.mat	900 Grid Sentences from 1 speaker (spk5) of the grid dataset, with only 1 frame used for visual data, using aligned data	60.42 MB	Unknown	View/Open
unsplit_900x1_spk_4_Shuf_14prior_0_diff_alignedDCT.mat	900 Grid Sentences from 1 speaker (spk4) of the grid dataset, with 14 frame used for visual data, using aligned data	400.18 MB	Unknown	View/Open
unsplit_900x1_spk_5_Shuf_14prior_0_diff_alignedDCT.mat	900 Grid Sentences from 1 speaker (spk5) of the grid dataset, with 14 frame used for visual data, using aligned data	402.09 MB	Unknown	View/Open
unsplit_900x5_Shuf_1prior_0_diff_alignDCT.mat	900 Grid Sentences from 5 speakers of the grid dataset (900*5 sents), with only 1 frame used for visual data, using aligned data	284.96 MB	Unknown	View/Open
unsplit_900x5_Shuf_1prior_0_diff_fullvidDCT.mat	900 Grid Sentences from 5 speakers of the grid dataset (900*5 sents), with only 1 frame used for visual data, using ful grid sentences	470.51 MB	Unknown	View/Open
unsplit_900x5_Shuf_4prior_0_diff_alignDCT.mat	900 Grid Sentences from 5 speakers of the grid dataset (900*5 sents), with 4 frames used for visual data, using aligned data	699.81 MB	Unknown	View/Open
unsplit_900x5_Shuf_8prior_0_diff_alignDCT.mat	900 Grid Sentences from 5 speakers of the grid dataset (900*5 sents), with 8 frames used for visual data, using aligned data	1.21 GB	Unknown	View/Open
unsplit_900x5_Shuf_12prior_0_diff_alignDCT.mat	900 Grid Sentences from 5 speakers of the grid dataset (900*5 sents), with 12 frames used for visual data, using aligned data	1.67 GB	Unknown	View/Open
unsplit_900x5_Shuf_14prior_0_diff_alignDCT.mat	900 Grid Sentences from 5 speakers of the grid dataset (900*5 sents), with 14 frames used for visual data, using aligned data	1.88 GB	Unknown	View/Open
unsplit_900x5_Shuf_16prior_0_diff_alignDCT.mat	900 Grid Sentences from 5 speakers of the grid dataset (900*5 sents), with 16 frames used for visual data, using aligned data	2.08 GB	Unknown	View/Open
unsplit_900x5_Shuf_20prior_0_diff_alignDCT.mat	900 Grid Sentences from 5 speakers of the grid dataset (900*5 sents), with 20 frames used for visual data, using aligned data	2.44 GB	Unknown	View/Open

This item is protected by original copyright

View Licence

Show simple item record Recommend this item

DataSTORRE

DataSTORRE: Stirling Online Repository for Research Data.