BAG OF WORDS APPROACH AND DOCUMENT-TOPIC MODELING FOR HUMAN ACTIVITY RECOGNITION FROM VIDEOS
Main Article Content
Abstract
Human activity recognition from videos have many useful real world applications, ranging from multimedia, entertainment, and security. In this paper, an approach inspired by a popular text document, namely the bag of words and document topic modeling, is explored. The latent Dirichlet allocation (LDA) and non-negative matrix factorization (NMF) are used to model the latent topic distribution in videos. Finally, the discovered distribution can be used to transformed the bag of words representation in order to categorize the video into ten daily human activities. The classification is done by feeding the transformed term-frequency of the visual words to the logistic regression and SVM model. The NMF achieved higher F1-score than the LDA when both SVM and logistic regression is used as the classifier.
Keywords: human activity recognition, bag of words, document topic modelingArticle Details
This work is licensed under a Jurnal Muara Sains, Teknologi, Kedokteran dan Ilmu Kesehatan Creative Commons Attribution-ShareAlike 4.0 International License.
Authors transfer copyright or assign exclusive rights to the publisher (including commercial rights)
References
Bay, H., Ess, A., Tuytelaars, T., & Van Gool, L. (2008). Speeded-Up Robust Features (SURF).
Computer Vision and Image Understanding, 110(3), 346–359.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine
Learning Research, 3(Jan), 993-1022.
Brand, M., & Kettnaker, V. (2000). Discovery and segmentation of activities in video. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 22(8), 844-851.
Buxton, H. (2003). Learning and understanding dynamic scene activity: A review. Image and
vision computing, 21(1), 125-136.
Laptev, I. (2005). On space-time interest points. International Journal of Computer Vision, 64(2-
, 107-123.
Lee, D. D., & Seung, H. S. (2001). “Algorithms for non-negative matrix factorization”.
Advances in Neural Information Processing Systems, Vancouver, Canada, 3-8 December
, 556-562.
Lowe, D. G. (2004). Distinctive Image Features from Scale-Invariant Key- points. International
Journal of Computer Vision, 60(2), 91–110.
Maaten, L. V. D., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine
Learning Research, 9(Nov), 2579-2605.
Malgireddy, M. R., Nwogu, I., & Govindaraju, V. (2013). Language-motivated approaches to
action recognition. Journal of Machine Learning Research, 14(1), 2189-2212.
Messing, R., Pal, C., & Kautz, H. (2009). “Activity recognition using the velocity histories of
tracked keypoints”. International Conference on Computer Vision, Kyoto, Japan, 29
September – 2 October 2009, 104-111.
Niebles, J. C., Wang, H., & Fei-Fei, L. (2008). Unsupervised learning of human action
categories using spatial-temporal words. International Journal of Computer Vision, 79(3),
-318.
Robertson, N., & Reid, I. (2006). A general method for human activity recognition in video.
Computer Vision and Image Understanding, 104(2), 232-248.
Smaragdis, P., & Brown, J. C. (2003). “Non-negative matrix factorization for polyphonic music
transcription”. Workshop on Applications of Signal Processing to Audio and Acoustics,
New York, United States, 19-22 October 2003, 177-180.
Town, C. (2004). “Ontology-driven Bayesian networks for dynamic scene understanding”.
Computer Vision and Pattern Recognition Workshop, Washington DC, United States, 27
June – 2 July 2004, 116-116.
Tran, D., & Sorokin, A. (2008). “Human activity recognition with metric learning”. European
Conference on Computer Vision, Marseille, France, 12-18 October 2008, 548-561.
Wang, H., Ullah, M. M., Klaser, A., Laptev, I., & Schmid, C. (2009). “Evaluation of local
spatio-temporal features for action recognition”. British Machine Vision Conference,
London, UK, 7-10 September 2009, 124-1.
Wang, Y., Sabzmeydani, P., & Mori, G. (2007). Semi-latent dirichlet allocation: A hierarchical
model for human action recognition. Human Motion—Understanding, Modeling, Capture,
and Animation, 240-254.
Wang, Y., & Mori, G. (2009). Human action recognition by semilatent topic models. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 31(10), 1762-1774.
Xu, W., Liu, X., & Gong, Y. (2003). “Document clustering based on non-negative matrix
factorization”. Proceedings of International ACM SIGIR Conference on Research and
Development in Information Retrieval, Toronto, Canada, 28 July – 1 Agustus 2003, 267-
Yang, J., Jiang, Y. G., Hauptmann, A. G., & Ngo, C. W. (2007). “Evaluating bag-of-visualwords representations in scene classification”. Workshop on Multimedia Information
Retrieval, Augsburg, Germany, 28-29 September 2007, 197-206.