Anomaly detection in images
Training a Machine Learning (ML) model requires us to harness a dataset large and varied enough for the model to be able to learn the patterns in the distribution of our data, so that during deployment the model can make predictions to new data. For example, a model to be used in an antonomous vehicle is given images of street scenes and trained to correctly localize and classify objects and people into a pre-defined set of classes (i.e. "person", "motorcyle", "car" or "bike").
A problem arrives if during deployment the model is given an image containing an anomalous class not seen during training, a deer for example. The model can either misclassify it as one of the classes it was trained on or not detect it at all. But in certain situations identifying this anomaly can be critical, it could be a wild animal on the middle of the road, in medical imaging it could represent a cancerous cell, or in industrial applications a defect in a manufactured tool.
Traditionally ML tasks focus on the closed-world settings, where it is assumed complete knowledge of the system and stated that instances can only come from known distributions seen during training, this is in opposition to the open-world setting where instances can come from outside of the training distribution.
Identifying not only the so called In-Distribution (ID) classes seen during training but also possible Out-of-Distribution (OOD) classes consists of a large family of problems called Generalized Out-of-Distribuition Detection.
Generalized Out-of-Distribution Detection
In computer vision this is a large family of problems, with many different settings, approaches and applications. It includes: Anomaly Detection (AD), Novelty Detection (ND), One-Class Classification (OCC), Out-Of-Distribution (OOD) Detection, Open-Set Recognition (OSR), Novel Category Discovery (NCD), Generalized Category Discovery (GCD), Outlier Detection [^1]. We can see below a simplified diagram of these different settings:
More generally the problem of Generalized Out-Of-Distributiond Detection can be formalized as:
What changes are the sets of unknown and known labels, the number of class labels we wish to classify in the In-Distribution (ID) instances and Out-Of-Distribution (OOD) as well as the approach used, as we will further explain.
Anomaly and Novelty Detection
Anomaly Detection (AD) treats ID samples as one thing even if they belong to many different classes ("person", "dog", "cat"...), it is not interested in doing classification of the ID samples, only assigning them either the label of "ID" or "OOD". Main applications are industrial inspection, image forensis, adversarial defense, forgery recognition of artworks, filtering data and for video surveillance.
Since anomalies are usually rare and examples are available in small quantities or not at all available, approaches for this setting are usually unsupervised or semi-supervised. Common approaches for AD models are: density-based where OOD test samples are rejected if they deviate from the main distribution [^2]; reconstruction-based, where an encoder-decoder architecture is trained to accurately reconstruct the ID samples [^3], this way an image of an OOD instance will have a bad reconstruction and can be identified; one-class classification (OCC), mainly through the construction of a decision boundary between the ID and OOD samples [^4].
Novelty Detection (ND) is similar to AD, the main difference is a motivation one, ND sees novel classes not as erroneous or fraudulent, but as possible learning resources for a model that cannot possibly know all the classes it is shown. ND is also supposed to be fully unsupervised, while AD can have some abnormal training samples. Main applications of ND include incremental learning and dataset augmentation.
Open-Set Recognition and Out-Of-Distribution Detection
Open-Set Recognition (OSR) is interested in making classifiers robust and general enough to deal with real-world problems like incomplete information, limited data resources and imbalanced distribution. The task is to both: 1) identify samples from the trained classes (so called "known known classes"); 2) reject samples from never seen classes ("unknown unknown classes"). An example of OSR problem can be a face identification system where the model has to both identify each sample as being from a class (a specific individual) but also correctly classify any person it has not seen before as unknown.
Unlike AD and ND, OSR has the additional objective of identifying ID classes and so the majority of its models use a classification-based approach [^6]; and some use a distance-based approach, where metric learning and contrastive learning are used to construct a latent space where samples from the same classes are clustered together while remaining separate from other classes [^7][^8]. Applications of OSR are in deploying real-world image classifiers in general, which can accurately identify the trained classes while also identifying OOD classes as they almost always exist in the real world.
OOD is very similar to OSR, difference is that OSR is interested in identifying a shift coming from the same dataset, while OOD methods normally consider ID as being classes in a given dataset and OOD as being samples drawn from a totally different dataset with non-overlapping classes. For example, training a model to identify ImageNet classes while rejecting samples from the MNIST dataset.
Like OSR, the majority of approaches are classification-based [^8][^9][^10]. The key philosophical difference between OOD and OSR is that OOD is trained to identify samples from which the model does not want to or cannot generalize [^11]. In this sense OOD covers a broader scope of tasks and its applications are usually in safety-critical situations, such as autonomous driving.
Outlier Detection
Outlier Detection is different from the previously discussed methods because there is no train/test split, the dataset is processed all at once, making the approach transductive. Models are usually: density-based, interested in modeling the probability distribution from the the data samples; distance-based performing metric-learning like techniques to cluster the samples from a specific class together [^12]. OD is a vast domain, and its applications range from data mining, data pre-processing, video surveillance and network safety. It's important to note that the term outlier is a lot of times used interchangeably with anomaly and novelty so it's important to keep in mind the main frameworks and the task discussed.
Generalized Category Discovery
GCD is also a transductive approach where the model receives during training both a labeled dataset of known classes and an unlabeled dataset normally with a known number of classes (but not exclusively) and asked to labeled the unknown instances. Differently from the previously discussed problems, here the novel classes are not necessarily rare, and the dataset can be even overwhelmed by all these different classes. This setting was formalized by Vaze et al 2022 [^13] although the same setting had been explored before by the name of Open-World Semi-Supervised Learning [^14]. GCD has the same setting of Novel Category Discovery but the later imposes a restriction of no intersection between the set of known and unkown labels, so the model is only labeling new classes.
Zero/Few-shot learning and other related settings
Closely related to Open-set Recognition we can cite domains such as such Zero/few-shot learning, classification with reject option and Open-World Learning [^5]. In Zero-shot classification the model is trained to predict classes with labeled positive training examples seen during training (known known classes) and also classes that only have side information (unknown known classes). An example would be training an image classifier with images of farm animals but also feeding the model texts about wild animals. This way by combining both its visual knowledge of a farm horse and a text that says "a zebra is like a horse with black and white stripes" the model is able to identify a zebra during test although never having seen an image of one. Few-shot would be the same but the model instead has a few positively labeled image examples of the "unknown known classes" (like the zebra) on top of its images of "known known classes". Open-World learning is an evolution of OSR where the model is tasked with doing everything an OSR classifier does and also perform incremental learning to update the model continually as unknown unknown classes appear, without forgetting the known known classes.
Conclusion
This is a rapidly evolving field, as you read this article it might probably be already a little outdated. If you want to check more references on the subject I recommend to check out some of the articles in my zotero library and also to check out the survey by Yang et al 2022 [^1].
[^1]: Jingkang Yang, Kaiyang Zhou, Yixuan Li, and Ziwei Liu. Generalized Out-of-Distribution Detection: A Survey, August 2022. URL http://arxiv.org/abs/2110. 11334. arXiv:2110.11334 [cs].
© Pietro Tanure.RSS