Normalizing Flows for Human Pose Anomaly Detection

Tel Aviv University, Israel

Abstract

Video anomaly detection is an ill-posed problem because it relies on many parameters such as appearance, pose, camera angle, background, and more. We distill the problem to anomaly detection of human pose, thus reducing the risk of nuisance parameters such as appearance affecting the result. Focusing on pose alone also has the side benefit of reducing bias against distinct minority groups.
Our model works directly on human pose graph sequences and is exceptionally lightweight (~1K parameters), capable of running on any machine able to run the pose estimation with negligible additional resources. We leverage the highly compact pose representation in a normalizing flows framework, which we extend to tackle the unique characteristics of spatio-temporal pose data and show its advantages in this use case.
Our algorithm uses Normalizing Flow to learn a bijective mapping between the pose data distribution and a Gaussian distribution, using spatio-temporal graph convolution blocks. The algorithm is quite general and can handle training data of only normal examples, as well as a supervised dataset that consists of labeled normal and abnormal examples. We report state-of-the-art results on two anomaly detection benchmarks - the unsupervised ShanghaiTech dataset and the recent supervised UBnormal dataset.

Framwork Overview

Given a sequence of video frames, we use pose estimation to extract the keypoints of every person in each frame and use a pose tracker to track the skeletons across the frames. Eventually, each person in a clip is represented as a temporal pose graph. Our network maps the training samples into a Gaussian-distributed latent space and calculates the probability of a human pose sequence occurring based on the training data.
We demonstrate our algorithm in two settings. The first is the widely used ShanghaiTech Campus dataset. In this setting, the training data consists of only normal video samples, and the test data consists of both normal and abnormal videos. The second setting is supervised anomaly detection, using the recent synthetic UBnormal dataset, which consists of both normal and abnormal training data. For this setting, we use our suggested normalizing flows model with a Gaussian Mixture Model prior. This forces the network to assign low probabilities to known abnormal samples.
Extensive experiments show that our model outperforms the previous pose-based and appearance-based state-of-theart methods for both settings. In addition, the ablation study shows our method is robust to noise and can generalize over different datasets. We show that while training on synthetic data and evaluating on real data, our model’s performance only slightly degrades, although there is a considerable difference in appearance

ShanghaiTech Examples

The ShanghaiTech Campus data set is one of the largest data sets for video anomaly detection, containing videos from 13 cameras around the ShanghaiTech University campus. It consists of 330 training videos with only normal events and 107 test videos with both normal and abnormal events, annotated at both frame and pixel levels. A few examples of human anomalies in the dataset are running, fighting, and riding bikes. The videos contain various people in each scene, with challenging lighting and camera angles.

UBnormal Examples

The UBnormal data set is a new synthetic supervised open-set benchmark containing both normal and abnormal actions in the training videos. It contains 268 training videos, 64 validation videos, and 211 test videos and is also annotated at both frame and pixel levels. Some scenes in the dataset include foggy and night scenes. The pose detector overcame these difficult conditions and accurately estimated the poses in such scenes. This provides additional evidence for the advantages of working with a non-appearance-based model, which can focus on learning actions and disregard the illuminations or background of a scene.

BibTeX


@article{hirschorn2022human,
  title = {Normalizing Flows for Human Pose Anomaly Detection},
  author = {Hirschorn, Or and Avidan, Shai},
  journal={arXiv preprint arXiv:2211.10946},
  year = {2022},
}