BAG OF WORDS APPROACH AND DOCUMENT-TOPIC MODELING FOR HUMAN ACTIVITY RECOGNITION FROM VIDEOS

Main Article Content

Janson Hendryli

Abstract

Human activity recognition from videos have many useful real world applications, ranging from multimedia, entertainment, and security. In this paper, an approach inspired by a popular text document, namely the bag of words and document topic modeling, is explored. The latent Dirichlet allocation (LDA) and non-negative matrix factorization (NMF) are used to model the latent topic distribution in videos. Finally, the discovered distribution can be used to transformed the bag of words representation in order to categorize the video into ten daily human activities. The classification is done by feeding the transformed term-frequency of the visual words to the logistic regression and SVM model. The NMF achieved higher F1-score than the LDA when both SVM and logistic regression is used as the classifier.

Keywords: human activity recognition, bag of words, document topic modeling

Article Details

Section
Articles

References

Bay, H., Ess, A., Tuytelaars, T., & Van Gool, L. (2008). Speeded-Up Robust Features (SURF).

Computer Vision and Image Understanding, 110(3), 346–359.

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine

Learning Research, 3(Jan), 993-1022.

Brand, M., & Kettnaker, V. (2000). Discovery and segmentation of activities in video. IEEE

Transactions on Pattern Analysis and Machine Intelligence, 22(8), 844-851.

Buxton, H. (2003). Learning and understanding dynamic scene activity: A review. Image and

vision computing, 21(1), 125-136.

Laptev, I. (2005). On space-time interest points. International Journal of Computer Vision, 64(2-

, 107-123.

Lee, D. D., & Seung, H. S. (2001). “Algorithms for non-negative matrix factorization”.

Advances in Neural Information Processing Systems, Vancouver, Canada, 3-8 December

, 556-562.

Lowe, D. G. (2004). Distinctive Image Features from Scale-Invariant Key- points. International

Journal of Computer Vision, 60(2), 91–110.

Maaten, L. V. D., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine

Learning Research, 9(Nov), 2579-2605.

Malgireddy, M. R., Nwogu, I., & Govindaraju, V. (2013). Language-motivated approaches to

action recognition. Journal of Machine Learning Research, 14(1), 2189-2212.

Messing, R., Pal, C., & Kautz, H. (2009). “Activity recognition using the velocity histories of

tracked keypoints”. International Conference on Computer Vision, Kyoto, Japan, 29

September – 2 October 2009, 104-111.

Niebles, J. C., Wang, H., & Fei-Fei, L. (2008). Unsupervised learning of human action

categories using spatial-temporal words. International Journal of Computer Vision, 79(3),

-318.

Robertson, N., & Reid, I. (2006). A general method for human activity recognition in video.

Computer Vision and Image Understanding, 104(2), 232-248.

Smaragdis, P., & Brown, J. C. (2003). “Non-negative matrix factorization for polyphonic music

transcription”. Workshop on Applications of Signal Processing to Audio and Acoustics,

New York, United States, 19-22 October 2003, 177-180.

Town, C. (2004). “Ontology-driven Bayesian networks for dynamic scene understanding”.

Computer Vision and Pattern Recognition Workshop, Washington DC, United States, 27

June – 2 July 2004, 116-116.

Tran, D., & Sorokin, A. (2008). “Human activity recognition with metric learning”. European

Conference on Computer Vision, Marseille, France, 12-18 October 2008, 548-561.

Wang, H., Ullah, M. M., Klaser, A., Laptev, I., & Schmid, C. (2009). “Evaluation of local

spatio-temporal features for action recognition”. British Machine Vision Conference,

London, UK, 7-10 September 2009, 124-1.

Wang, Y., Sabzmeydani, P., & Mori, G. (2007). Semi-latent dirichlet allocation: A hierarchical

model for human action recognition. Human Motion—Understanding, Modeling, Capture,

and Animation, 240-254.

Wang, Y., & Mori, G. (2009). Human action recognition by semilatent topic models. IEEE

Transactions on Pattern Analysis and Machine Intelligence, 31(10), 1762-1774.

Xu, W., Liu, X., & Gong, Y. (2003). “Document clustering based on non-negative matrix

factorization”. Proceedings of International ACM SIGIR Conference on Research and

Development in Information Retrieval, Toronto, Canada, 28 July – 1 Agustus 2003, 267-

Yang, J., Jiang, Y. G., Hauptmann, A. G., & Ngo, C. W. (2007). “Evaluating bag-of-visualwords representations in scene classification”. Workshop on Multimedia Information

Retrieval, Augsburg, Germany, 28-29 September 2007, 197-206.