Efficient visual information indexation for supporting actionable intelligence and knowledge generation

Autores: Desale Fentaw Yagmur Aktas Jorge García Castaño

Fecha: 04.09.2023


Abstract

Visual indexing, or the ability to search and analyze visual media such as images and videos, is important for law enforcement agencies because it can speed up criminal investigations. As more and more visual media is created and shared online, the ability to effectively search and analyze this data becomes increasingly important for investigators to do their job effectively. The major challenges for video captioning include accurately recognizing the objects and activities in the image, understanding their relationships and context, generating natural and descriptive language, and ensuring the captions are relevant and useful. Near real-time processing is also required in order to facilitate agile forensic decision making and prompt triage, hand-over and reduction of the amount of data to be processed by investigators or subsequent processing tools. This paper presents a captioning-driven efficient video analytic which is able to extract accurate descriptions of images and videos files. The proposed approach includes a temporal segmentation technique providing the most relevant frames. Subsequently, an image captioning approach has been specialized to describe visual media related to counter-terrorism and cybercrime for each relevant frame. Our proposed method achieves high consistency and correlation with human summary on SumMe dataset, outperforming previous similar methods. 

BIB_text

@Article {
title = {Efficient visual information indexation for supporting actionable intelligence and knowledge generation},
pages = {12},
keywds = {
key frame selection; temporal segmentation; video captioning; Visual indexing
}
abstract = {

Visual indexing, or the ability to search and analyze visual media such as images and videos, is important for law enforcement agencies because it can speed up criminal investigations. As more and more visual media is created and shared online, the ability to effectively search and analyze this data becomes increasingly important for investigators to do their job effectively. The major challenges for video captioning include accurately recognizing the objects and activities in the image, understanding their relationships and context, generating natural and descriptive language, and ensuring the captions are relevant and useful. Near real-time processing is also required in order to facilitate agile forensic decision making and prompt triage, hand-over and reduction of the amount of data to be processed by investigators or subsequent processing tools. This paper presents a captioning-driven efficient video analytic which is able to extract accurate descriptions of images and videos files. The proposed approach includes a temporal segmentation technique providing the most relevant frames. Subsequently, an image captioning approach has been specialized to describe visual media related to counter-terrorism and cybercrime for each relevant frame. Our proposed method achieves high consistency and correlation with human summary on SumMe dataset, outperforming previous similar methods. 


}
isbn = {978-151066713-6},
date = {2023-09-04},
}
Vicomtech

Parque Científico y Tecnológico de Gipuzkoa,
Paseo Mikeletegi 57,
20009 Donostia / San Sebastián (España)

+(34) 943 309 230

Zorrotzaurreko Erribera 2, Deusto,
48014 Bilbao (España)

close overlay

Las cookies de publicidad comportamental son necesarias para cargar el contenido

Aceptar cookies de publicidad comportamental