Open Access
Journal Article
Machine Learning-Based Approaches for Image Captioning
by
Daniel Harris
TASC 2023 5(1):36; 10.69610/j.tasc.20230216 - 16 February 2023
Abstract
The field of computer vision has witnessed significant advancements with the advent of machine learning techniques. Among these advancements, image captioning stands out as a challenging task that involves generating textual descriptions of images. This paper presents a comprehensive overview of machine learning-based approaches for image captioning. We discuss the evolution of
[...] Read more
The field of computer vision has witnessed significant advancements with the advent of machine learning techniques. Among these advancements, image captioning stands out as a challenging task that involves generating textual descriptions of images. This paper presents a comprehensive overview of machine learning-based approaches for image captioning. We discuss the evolution of captioning techniques from traditional methods to modern deep learning models. We delve into the challenges faced by these models, such as the variability of image content, diversity of language, and the need for context understanding. We also explore the role of pre-trained models, such as ImageNet, in improving captioning performance. Furthermore, we analyze the impact of different architectures, including recurrent neural networks (RNNs), convolutional neural networks (CNNs), and transformers, on the quality of generated captions. The paper finally discusses the potential applications of machine learning-based image captioning in various domains, such as accessibility, content creation, and information retrieval. We aim to provide a foundational understanding of the current state-of-the-art in image captioning and identify research directions for future advancements.