Machine Learning-Based Approaches for Image Captioning

Daniel Harris

doi:10.69610/j.tasc.20230216

Submit Article to TASC

Article Menu

Abstract
Share and Cite
Article Metrics

Journal Browser

Vol. 6 (2024)

Vol. 5 (2023)

Vol. 4 (2022)

Vol. 3 (2021)

Vol. 2 (2020)

Vol. 1 (2019)

Open Access Journal Article

Machine Learning-Based Approaches for Image Captioning

by Daniel Harris ^1,*

Daniel Harris

Author to whom correspondence should be addressed.

TASC 2023 5(1):36; https://doi.org/10.69610/j.tasc.20230216

Received: 5 January 2023 / Accepted: 26 January 2023 / Published Online: 16 February 2023

View Full-Text

Download PDF

Abstract

The field of computer vision has witnessed significant advancements with the advent of machine learning techniques. Among these advancements, image captioning stands out as a challenging task that involves generating textual descriptions of images. This paper presents a comprehensive overview of machine learning-based approaches for image captioning. We discuss the evolution of captioning techniques from traditional methods to modern deep learning models. We delve into the challenges faced by these models, such as the variability of image content, diversity of language, and the need for context understanding. We also explore the role of pre-trained models, such as ImageNet, in improving captioning performance. Furthermore, we analyze the impact of different architectures, including recurrent neural networks (RNNs), convolutional neural networks (CNNs), and transformers, on the quality of generated captions. The paper finally discusses the potential applications of machine learning-based image captioning in various domains, such as accessibility, content creation, and information retrieval. We aim to provide a foundational understanding of the current state-of-the-art in image captioning and identify research directions for future advancements.

Keywords: Image Captioning; Machine Learning; Deep Learning; Recurrent Neural Networks; Convolutional Neural Networks.;

Copyright: © 2023 by Harris. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) (Creative Commons Attribution 4.0 International License). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

APA Style

Harris, D. (2023). Machine Learning-Based Approaches for Image Captioning.  Transactions on Applied Soft Computing, 5(1), 36. doi:10.69610/j.tasc.20230216

ACS Style

Harris, D. Machine Learning-Based Approaches for Image Captioning. Transactions on Applied Soft Computing, 2023, 5, 36. doi:10.69610/j.tasc.20230216

AMA Style

Harris D. Machine Learning-Based Approaches for Image Captioning. Transactions on Applied Soft Computing; 2023, 5(1):36. doi:10.69610/j.tasc.20230216

Chicago/Turabian Style

Harris, Daniel 2023. "Machine Learning-Based Approaches for Image Captioning" Transactions on Applied Soft Computing 5, no.1:36. doi:10.69610/j.tasc.20230216

ACS Style

Harris, D. Machine Learning-Based Approaches for Image Captioning. Transactions on Applied Soft Computing, 2023, 5, 36. doi:10.69610/j.tasc.20230216

AMA Style

Harris D. Machine Learning-Based Approaches for Image Captioning. Transactions on Applied Soft Computing; 2023, 5(1):36. doi:10.69610/j.tasc.20230216

Chicago/Turabian Style

Harris, Daniel 2023. "Machine Learning-Based Approaches for Image Captioning" Transactions on Applied Soft Computing 5, no.1:36. doi:10.69610/j.tasc.20230216

APA style

Harris, D. (2023). Machine Learning-Based Approaches for Image Captioning. Transactions on Applied Soft Computing, 5(1), 36. doi:10.69610/j.tasc.20230216

Article Metrics

Article Access Statistics

References

Burbules, N. C., & Callister, T. A. (2000). Watch IT: The Risks and Promises of Information Technologies for Education. Westview Press.
Pentland, A., & Sclaroff, S. (1994). Watch IT: The Risks and Promises of Information Technologies for Education. Westview Press.
Cai, J., & Fei-Fei, L. (2000). Automatic generation of natural image descriptions. In Proceedings of the 2000 IEEE International Conference on Computer Vision and Pattern Recognition (Vol. 1, pp. 910-917).
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105).
Donahue, J., Krizhevsky, A., & Bertinetto, P. (2014). Long-term recurrent convolutions for visual recognition and description. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4310-4318).
Vinyals, O., Shazeer, N., & Le, Q. V. (2015). A neural conversation model for standing in dialogue. In Advances in neural information processing systems (pp. 752-760).
Kim, Y. (2014). Sequence-to-sequence learning with neural networks. arXiv preprint arXiv:1409.3215.
Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. In International Conference on Learning Representations.
Luong, T., Pham, H., & Vietnam, A. (2015). A multi-task learning framework for image captioning. In Proceedings of the IEEE International Conference on Computer Vision (pp. 4447-4455).
Dosovitskiy, A., Fischer, P., Ilg, E., & Cremers, D. (2014). FlowNet: Learning optical flow with convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (pp. 1125-1133).

Article Overview

Article Versions

Related Links

More by Authors Links

Machine Learning-Based Approaches for Image Captioning

Abstract

Cite This Paper

Share and Cite

Article Metrics

References