Decoding the Language of Human Movement

Press/Media: Expert Comment


Computers that recognize what is happening in moving images can help defend against crime, and revolutionize rehabilitation.

"Jesus del Rincón, researcher at the Institute of Electronics, Communications and Information Technology (ECIT) based at the Queen’s University of Belfast, U.K., points to defending against crime and terrorism as prime motivations. “If someone is on the public transport network and doing something strange, we want to use reasoning to model the intention of the attacker and work out what is going on."

"Yet such collections are limited in their ability to represent the conditions systems will face when deployed. “It’s not clear how representative they are of real-life situations. You take them, use them, and apply the same techniques to the video from a surveillance camera, and they don’t work,” says del Rincón.

Del Rincón worked with Maria Santofimia, a researcher from the University of Castilla-La Mancha, Ciudad Real, Spain, and Jean-Christophe Nebel of Kingston University, Surrey, U.K., on a method that could make the results more robust under real-world conditions by providing a degree of contextualization for the actions seen in a video. For example, faced with a video of someone picking up a suitcase in the street, the system can more or less rule out the possibility of it being part of a weightlifting activity. By applying rules from real-life behavior, the system should be able to make more intelligent decisions about what it sees, and trigger alarms if the activities are seen as unusual.

Santofimia says, “What we are trying to do is identify actions that were performed for a reason.” One option was to use artificial intelligence techniques based on ontologies or expert-system databases that capture the circumstances in which people undertake different activities, such as weightlifting within a gym. “With ontologies, case-based reasoning, or expert systems, the main disadvantage that they have is that they can only focus on the situations they have been taught,” says Santofimia. 

The researchers opted for an alternative known as commonsense reasoning, which contains a database of much more generalized rules. In much the same way the context in which a sentence is used in languages with informal grammars may be used to disambiguate between possible meanings, commonsense reasoning provides an additional tool for recognition when combined with the contextfree grammars used in many of today’s
experiments. “In commonsense, we describe how the world works and we ask the system to reason about the general case,” Santofimia says.

A further advantage of using reasoning is that it can be tuned to cope with situations that are difficult to train for statistically. For example, systems to monitor the elderly will need
to watch for them falling over. Templates built from videos of actors who are trained to fall may not capture the movements of accidental falls, and so may fail to trigger. “When people fall in real life, they are not acting,” says del Rincón. “Commonsense reasoning could figure out that something has gone wrong, without having to learn that precise action.”
There is a further advantage of using more generalized commonsense reasoning, Santofimia claims: “We also deal with different possible situations in parallel for situations where we are not sure if the video processing was giving us the right actions. We keep stories alive until we can prove which one is the most likely. We can’t do that with ontology- or case-based systems.”
Much of the current research remains focused on single-person activities. As the area matures, attention will shift to group or crowd behavior, which will likely see more complex grammars be applied to represent movement."

Period01 Dec 2014

Media coverage


Media coverage