Time and Frequency Network for Human Action Detection in Videos

Human action detection in videos can be used in this kind of places as video clip surveillance, human-computer interaction, and gadget control. The job involves an picture sequence with a a few-dimensional shape as an enter to detect this kind of actions as jogging or catching a ball.

Impression credit score: pxhere.com, CC0 Public Area

Usually, convolutional neural networks (CNN) are applied for this job. Nonetheless, they only think about the spatiotemporal attributes, whilst employing frequency attributes would facilitate the discovering. A current paper on arXiv.org proposes an finish-to-finish single-phase network in the time-frequency domain.

3D-CNN and 2nd-CNN were applied to extract time and frequency attributes appropriately. Then, they were fused with an attention system to obtain detecting designs. The experiments show the superiority of the instructed solution versus other condition-of-the-art models. The feasibility of action detection utilizing frequency attributes was proved.

Presently, spatiotemporal attributes are embraced by most deep discovering approaches for human action detection in videos, on the other hand, they neglect the important attributes in frequency domain. In this work, we propose an finish-to-finish network that considers the time and frequency attributes simultaneously, named TFNet. TFNet holds two branches, one is time branch formed of a few-dimensional convolutional neural network(3D-CNN), which usually takes the picture sequence as enter to extract time attributes and the other is frequency branch, extracting frequency attributes by means of two-dimensional convolutional neural network(2nd-CNN) from DCT coefficients. Last but not least, to obtain the action designs, these two attributes are deeply fused underneath the attention system. Experimental results on the JHMDB51-21 and UCF101-24 datasets show that our solution achieves extraordinary efficiency for frame-mAP.

Exploration paper: Li, C., Chen, H., Lu, J., Huang, Y., and Liu, Y., “Time and Frequency Network for Human Action Detection in Videos”, 2021. Url: https://arxiv.org/ab muscles/2103.04680