Learning to Compose Visual Relations

Inferring and understanding the underlying objects in a scene is a well-researched task in the domain of AI. However, robustly understanding the component relations in the same scene remains a challenging task.

A recent study on arXiv.org proposes an approach to represent and factorize individual visual relations.

Example of automated object detection and object recognition. Image credit: MTheiler via Wikimedia, CC-BY-SA-4.0

A scene description is represented as the product of the individual probability distributions across relations. As a result, interactions between multiple relations can be modeled. The framework enables reliably capturing and generating images with multiple relational descriptions. Furtherly, images can be edited to have a desired set of relations. The approach can generalize to a previously unseen relation description, even if the underlying objects and descriptions are not seen during training.

The method can be used to infer the objects and their relations in a scene in tasks such

Read More