PolygloNet: Multilingual Approach for Scene Text Recognition Without Language Constraints

Authors: Alex Solè Gómez

Date: 23.05.2022


Abstract

The detection and recognition of text instances in camera-captured images or videos generate rich and precise semantic information for interpreting and describing the scene. However, recognizing text in the wild remains a challenging problem despite its value. Apart from the inherent problems in text detection and recognition tasks, knowing the language to be recognized is required in unsupervised forensic applications where multilingual information is frequently employed. This work proposes an unconstrained multilingual text recognition pipeline for scene text detection and recognition. The pipeline consists of multiple text recognition experts contributing to determining the output text sentence. Each expert translates the visual information into a candidate text sentence. Finally, our post-processing model, named PolygloNet, encodes and aggregates all text sentences to generate the optimal text sequence. This model can select the most likely text sequence and correct spelling errors produced in the recognition stage. Experimental results show that our model can produce accurate recognition results on relevant datasets (MLT2017 [2] and MLT2019 [1]).

BIB_text

@Article {
author = {Alex Solè Gómez},
title = {PolygloNet: Multilingual Approach for Scene Text Recognition Without Language Constraints},
pages = {479-490},
keywds = {
Scene text detection (STD), Multilingual text recognition, Text analysis, Long short-term memory (LSTM)
}
abstract = {

The detection and recognition of text instances in camera-captured images or videos generate rich and precise semantic information for interpreting and describing the scene. However, recognizing text in the wild remains a challenging problem despite its value. Apart from the inherent problems in text detection and recognition tasks, knowing the language to be recognized is required in unsupervised forensic applications where multilingual information is frequently employed. This work proposes an unconstrained multilingual text recognition pipeline for scene text detection and recognition. The pipeline consists of multiple text recognition experts contributing to determining the output text sentence. Each expert translates the visual information into a candidate text sentence. Finally, our post-processing model, named PolygloNet, encodes and aggregates all text sentences to generate the optimal text sequence. This model can select the most likely text sequence and correct spelling errors produced in the recognition stage. Experimental results show that our model can produce accurate recognition results on relevant datasets (MLT2017 [2] and MLT2019 [1]).


}
isbn = {978-3-031-06429-6},
date = {2022-05-23},
}
Vicomtech

Parque Científico y Tecnológico de Gipuzkoa,
Paseo Mikeletegi 57,
20009 Donostia / San Sebastián (Spain)

+(34) 943 309 230

Zorrotzaurreko Erribera 2, Deusto,
48014 Bilbao (Spain)

close overlay

Behavioral advertising cookies are necessary to load this content

Accept behavioral advertising cookies