Hybrid Artificial Intelligence (hereinafter referred to as AI) and a new set of data and benchmarks for assessing the capabilities of AI algorithms in reasoning about the actions contained in video information were presented by researchers from IBM, MIT, Harvard and DeepMind at the ICLR 2020 conference, TheNextweb reports on May 17.
The new dataset and research environment presented at ICLR 2020 is called CoLlision Events for Video REpresentation and Reasoning or CLEVRER. They are based on CLEVR, a visual question and answer set developed at Stanford University in 2017. CLEVR is a set of tasks representing still images of solid objects. The AI agent must be able to analyze the scene and answer several questions about the number of objects, their attributes and their spatial relationships.
As a solution to a difficult task for classical AI, the researchers presented a model of neuro-symbolic dynamic thinking, a combination of neural networks and symbolic artificial intelligence.
The results showed that incorporating neural networks and symbolic programs into one AI model can combine their strengths and overcome their weaknesses. "Symbolic representation provides a powerful common framework for vision, language, dynamics and causation," the authors note, adding that symbolic programs enable the model to "explicitly capture the compositionality underlying the causal structure of the video and the logic of the question."
The advantages of such systems are limited by unconditional disadvantages. The data used to train the model requires additional annotations, which can be too power-hungry and expensive in real-world applications.