Prioritizing the essential: A robust evaluation framework for novelty detection

In contrast to traditional machine learning methods that presume static data distributions, novelty detection tackles the challenge of identifying novel classes within continuous data streams as they evolve over time. The inherent implementation challenges of novelty detection are compounded by the difficulty of correctly assessing its performance, as it can be highly sensitive to data characteristics and necessitate specialized metrics to take the temporal dimension into account. Consequently, the absence of a consensus on how to properly evaluate these approaches persists as a significant challenge. In this study, we propose and formalize a comprehensive evaluation framework that aims to provide a fair assessment of novelty detection algorithms. Specifically, we propose a list of existing and novel metrics to accurately evaluate all aspects of novelty detection on data streams, including their temporal aspect. We provide a list of data characteristics that impact the performance of these algorithms and show how to report them. We empirically demonstrate the effect of these characteristics on various datasets and compare our proposed metrics to other previously used ones.