Inferring and understanding the underlying objects in a scene is a well-researched task in the domain of AI. However, robustly understanding the component relations in the same scene remains a challenging task.
A recent study on arXiv.org proposes an approach to represent and factorize individual visual relations.
A scene description is represented as the product of the individual probability distributions across relations. As a result, interactions between multiple relations can be modeled. The framework enables reliably capturing and generating images with multiple relational descriptions. Furtherly, images can be edited to have a desired set of relations. The approach can generalize to a previously unseen relation description, even if the underlying objects and descriptions are not seen during training.
The method can be used to infer the objects and their relations in a scene in tasks such