Turning the senses into media: can we teach artificial intelligence to perceive?
tech innovation 2022
Humans perceive the world through various senses: we see, feel, hear, taste and smell. The different senses with which we perceive are multiple channels of information, also known as multimodal. Does this mean that what we see can be viewed as multimedia?
Xue Wang, Ph.D. The candidate at LIACS translates perception into multimedia and uses artificial intelligence (AI) to extract information from multimodal processes, in much the same way that the brain processes information. In his research, he has tested the learning processes of AI in four different ways.
putting words into vectors
First, Xu focused on word-embedded learning: translation of words into vectors. A vector is a quantity that has two properties, namely a direction and a magnitude. In particular, this section deals with how the classification of information can be improved. Xu proposed the use of a new AI model that links words to images, making it easier to classify words. When testing the model, an observer could intervene if the AI did something wrong. Research shows that this model outperforms previously used models.
Looking at subcategories
The second focus of research is images along with other information. For this topic Xue observed the ability to label subcategories, also known as fine-grained labelling. It used a specific AI model, which made it easy to classify images with little text around them. It mixes coarse labels, which are general categories, with finer-grained labels, subcategories. The approach is effective and helpful in structuring easy and difficult classifications.
Finding the relationship between images and text
Third, Xu researched image and text associations. One problem with this topic is that the conversion of this information is not linear, which means it can be difficult to measure. Xue found a possible solution to this problem: it used kernel-based transformation. Kernel stands for a specific class of algorithms in machine learning. With the model used, it is now possible for AI to see the relation of meaning between images and text.
Finding Contrast in Images and Text
In the end, Xu focused on images with text. In this part the AI had to look for contrasts between words and images. The AI model performed a task called phrase grounding, which is to associate nouns in image captions with parts of the image. There was no supervisor who could intervene in this work. Research has shown that AI can associate image regions with nouns with average accuracy for this area of research.
concept of artificial intelligence
This research provides a great contribution to the field of multimedia information: we see that AI can classify words, classify images, and associate images with text. Further research can use the methods proposed by Xue and hopefully lead to even better insights into the multimedia perception of AI.
A model for creating artistic illustrations based on text descriptions
Citation: Turning the senses into media: can we teach artificial intelligence to understand? (2022, 23 June) retrieved 23 June 2022
This document is subject to copyright. No part may be reproduced without written permission, except for any fair use for the purpose of personal study or research. The content is provided for information purposes only.
credit source
Turning the senses into media: can we teach artificial intelligence to perceive?
#Turning #senses #media #teach #artificial #intelligence #perceive
if you want to read this article from the original credit source of the article then you can read from here