We use cookies to ensure that we give you the best experience on our website. By continuing to browse this repository, you give consent for essential cookies to be used. You can read more about our Privacy and Cookie Policy.

Durham Research Online
You are in:

STIT: Spatio-Temporal Interaction Transformers for Human-Object Interaction Recognition in Videos

Almushyti, Muna and Li, Frederick W. B. (2022) 'STIT: Spatio-Temporal Interaction Transformers for Human-Object Interaction Recognition in Videos.', International Conference on Pattern Recognition (ICPR 2022) Montréal, Québec, 21-25 Aug 2022.


Recognizing human-object interactions is challenging due to their spatio-temporal changes. We propose the SpatioTemporal Interaction Transformer-based (STIT) network to reason such changes. Specifically, spatial transformers learn humans and objects context at specific frame time. Temporal transformer then learns the relations at a higher level between spatial context representations at different time steps, capturing longterm dependencies across frames. We further investigate multiple hierarchy designs in learning human interactions. We achieved superior performance on Charades, Something-Something v1 and CAD-120 datasets, comparing to baseline models without learning human-object relations, or with prior graph-based networks. We also achieved state-of-the-art accuracy of 95.93% on CAD-120 dataset [1] by employing RGB data only.

Item Type:Conference item (Paper)
Full text:(AM) Accepted Manuscript
Download PDF
Publisher Web site:
Publisher statement:© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Date accepted:17 May 2022
Date deposited:01 November 2022
Date of first online publication:No date available
Date first made open access:01 November 2022

Save or Share this output

Look up in GoogleScholar