Surprise Adequacy

Surprise adequacy aims to detect novel (surprising) inputs by comparing them to the distributions of activations observed in the training set. As opposed to neuron coverage, typically, surprise adequacy is computed on the activations of a single layer.

Implemented Surprise Adequacies

dnn-tip implements the following surprise adequacies. If you use dnn_tip, consider checking out the recommendations and details in the papers originally proposing said approaches, and, besides our paper, cite them.

Abb.	Name	Proposing Paper
DSA	Distance-Based SA	Kim et. al., ICSE 2019 / arXiv (*)
LSA	Likelihood-Based SA (based on Kernel-Density estimation)	Kim et. al., ICSE 2019 / arXiv
MDSA	Mahalanobis-Distance based SA	Kim et. al., ESEC/FSE 2020 / arXiv
MLDSA	Multimodal-LSA (based on Gaussion-Mixture model)	Kim et. al., AST 2021
MultiModal	Generic, abstract composition of multiple SA, e.g. for per-class SA and MMDSA
MMDSA	Multimodal-MDSA	Kim et. al., AST 2021

(*) Implementation partially based on Weiss et. al., ICSE-W 2021. If you use our implementation of DSA, you should cite that paper as well.

Usage Example

The basic usage is deliberately kept simple, and should be easy to understand given the following examples:

# Create a LSA or MDSA instance 
sa = LSA(train_activations)  # Or MDSA(train_activations)
surprises = sa(test_activations)

# DSA in addition also requires the predicted labels
sa = DSA(train_activations, train_predictions)
surprises = sa(test_activations, test_predictions)

# MLSA requires a specification of the number of components in the Gaussian Mixture Model
sa = MLSA(train_activations, num_components=3)
surprises = sa(test_activations)

Example usages of Muli-Modal Surprise Adequacy.

# Per-Class SA (as recommended in classification problems), here as an example for MDSA
sa = MultiModalSA.build_by_class(train_activations, train_predictions, lambda x,y: MDSA(x))
# num_threads is optional, executes modals in parallel
surprises = sa(test_activations, num_threads=4)

# Multi-Modal MDSA (using k-means for clustering)
#   Check constructor signature for plenty optional params regarding k-means.
sa = MultiModalSA.build_by_class(train_activations) 
surprises = sa(test_activations, num_threads=4)

Mapping surpise adequacies to surprise profiles

While based on our empirical results we do not recommend it, you may somtimes want to use coverage profiles instead of surprise adequacies (see our paper for details).

This can be achieved as follows:

mapper =  SurpriseCoverageMapper(
  buckets, # int. Number of buckets for the coverage profile
  limit, # float. Upper limit for the coverage profile
  overflow_bucket # boolean. Use one bucket for "surprises > limit"
)
profile = mapper.get_coverage_profile(surprises)

Recommendations

DSA

DSA scales badly to large training sets. Hence, a few recommendations:

Use a subset of the training set activations

In a related paper, we showed that using a subset of the training set activations can help to drastically improve runtime of DSA while providing similar results. You can either do that when collection the activations, or pass set the parameters subsampling (expects a float between 0 and 1) and subsampling_seed (expects an integer) when creating a dnn-tip DSA instance.

Use parallel computation.

dnn-tip provides a parallel implementation of DSA, thus parallelizing the computation is simple as a user, but you should fine-tune the parameters to meet your systems abilities (RAM, most importantly). Set the parameters badge_size (default: 10) when creating the DSA instance, and set num_threads when calculating DSA to a positive integer. Example:

sa = DSA(train_activations, train_predictions, badge_size=10)
surprises = sa(test_activations, test_predictions, num_threads=4)

Also, in practice, it can be useful to increase SWAP-memory to defend against crashes due to short peak-loads.

LSA

The KDE used is not numerically stable. Our implementation takes various steps to increase stability, and where not possible attempts to fail gracefully (returning surprise 0). Still, in practice, you might be better off using a MDSA, which is faster, stable and shows similar results, but numerically stable and at higher prediction-time performance.