STIT: Spatio-Temporal Interaction Transformers for Human-Object Interaction Recognition in Videos

Almushyti, Muna; Li, Frederick W.B.

doi:10.1109/icpr56361.2022.9956030

STIT: Spatio-Temporal Interaction Transformers for Human-Object Interaction Recognition in Videos

Almushyti, Muna; Li, Frederick W.B.

Authors

Muna Almushyti muna.i.almushyti@durham.ac.uk
PGR Student Doctor of Philosophy

Dr Frederick Li frederick.li@durham.ac.uk
Associate Professor

Abstract

Recognizing human-object interactions is challenging due to their spatio-temporal changes. We propose the SpatioTemporal Interaction Transformer-based (STIT) network to reason such changes. Specifically, spatial transformers learn humans and objects context at specific frame time. Temporal transformer then learns the relations at a higher level between spatial context representations at different time steps, capturing longterm dependencies across frames. We further investigate multiple hierarchy designs in learning human interactions. We achieved superior performance on Charades, Something-Something v1 and CAD-120 datasets, comparing to baseline models without learning human-object relations, or with prior graph-based networks. We also achieved state-of-the-art accuracy of 95.93% on CAD-120 dataset [1] by employing RGB data only.

Citation

Almushyti, M., & Li, F. W. (2022). STIT: Spatio-Temporal Interaction Transformers for Human-Object Interaction Recognition in Videos. . https://doi.org/10.1109/icpr56361.2022.9956030

Conference Name	2022 26th International Conference on Pattern Recognition (ICPR)
Conference Location	Montréal, Québec
Start Date	Aug 21, 2022
End Date	Aug 25, 2022
Acceptance Date	May 17, 2022
Publication Date	2022-11
Deposit Date	Oct 31, 2022
Publicly Available Date	Nov 1, 2022
Publisher	Institute of Electrical and Electronics Engineers
Pages	3287-3294
DOI	https://doi.org/10.1109/icpr56361.2022.9956030
Related Public URLs	https://doi.org/10.1109/ICPR56361.2022.9956030
Additional Information	21-25 Aug. 2022

Files

Accepted Conference Proceeding (1.4 Mb)
PDF

Copyright Statement
© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.