Learning Sequential Visual Appearance Transformation for Online Multi-Object Tracking
Authors: Itziar Sagastiberri Fernández Noud van de Gevel
Date: 18.11.2021
Abstract
Recent online multi-object tracking approaches combine single object trackers and affinity networks with the aim of capturing object motions and associating objects by using their appearance, respectively. Those affinity networks often build on complex feature representations (re-ID embeddings) or sophisticated scoring functions, whose objective is to match current detections with previous tracklets, known as short-term appearance information. However, drastic appearance changes during the object trajectory acquired by omnidirectional cameras causes a degradation of the performance since affinity networks ignore the variation of the long-term appearance information. In this paper, we deal with the appearance changes in a coherent way by proposing a novel affinity model which is able to predict the new visual appearance of an object by considering the long-term appearance information. Our affinity model includes a convolutional LSTM encoder-decoder architecture to learn the space-time appearance transformation metric between consecutive re-ID feature representations along the object trajectory. Experimental results show that it achieves promising performance on several multi-object tracking datasets containing omnidirectional cameras.
BIB_text
title = {Learning Sequential Visual Appearance Transformation for Online Multi-Object Tracking},
pages = {176247},
keywds = {
Target tracking, Surveillance, Predictive models, omnidirectional Cameras, LSTM
}
abstract = {
Recent online multi-object tracking approaches combine single object trackers and affinity networks with the aim of capturing object motions and associating objects by using their appearance, respectively. Those affinity networks often build on complex feature representations (re-ID embeddings) or sophisticated scoring functions, whose objective is to match current detections with previous tracklets, known as short-term appearance information. However, drastic appearance changes during the object trajectory acquired by omnidirectional cameras causes a degradation of the performance since affinity networks ignore the variation of the long-term appearance information. In this paper, we deal with the appearance changes in a coherent way by proposing a novel affinity model which is able to predict the new visual appearance of an object by considering the long-term appearance information. Our affinity model includes a convolutional LSTM encoder-decoder architecture to learn the space-time appearance transformation metric between consecutive re-ID feature representations along the object trajectory. Experimental results show that it achieves promising performance on several multi-object tracking datasets containing omnidirectional cameras.
}
date = {2021-11-18},
}