Cookies

We use cookies to ensure that we give you the best experience on our website. By continuing to browse this repository, you give consent for essential cookies to be used. You can read more about our Privacy and Cookie Policy.


Durham Research Online
You are in:

Learning Multimodal VAEs Through Mutual Supervision

Joy, Tom and Shi, Yuge and Torr, Philip H.S. and Rainforth, Tom and Schmon, Sebastian M. and Siddharth, N. (2022) 'Learning Multimodal VAEs Through Mutual Supervision.', ICLR 2022: The Tenth International Conference on Learning Representations Virtual, 25-29 April 2022.

Abstract

Multimodal VAEs seek to model the joint distribution over heterogeneous data (e.g.\ vision, language), whilst also capturing a shared representation across such modalities. Prior work has typically combined information from the modalities by reconciling idiosyncratic representations directly in the recognition model through explicit products, mixtures, or other such factorisations. Here we introduce a novel alternative, the MEME, that avoids such explicit combinations by repurposing semi-supervised VAEs to combine information between modalities implicitly through mutual supervision. This formulation naturally allows learning from partially-observed data where some modalities can be entirely missing---something that most existing approaches either cannot handle, or do so to a limited extent. We demonstrate that MEME outperforms baselines on standard metrics across both partial and complete observation schemes on the MNIST-SVHN (image--image) and CUB (image--text) datasets. We also contrast the quality of the representations learnt by mutual supervision against standard approaches and observe interesting trends in its ability to capture relatedness between data.

Item Type:Conference item (Paper)
Full text:(AM) Accepted Manuscript
Download PDF
(9950Kb)
Status:Peer-reviewed
Publisher Web site:https://openreview.net/forum?id=1xXvPrAshao
Date accepted:20 January 2022
Date deposited:24 June 2022
Date of first online publication:29 September 2021
Date first made open access:24 June 2022

Save or Share this output

Export:
Export
Look up in GoogleScholar