Skip to content

Turning the senses into media: can we teach artificial intelligence to perceive?

Turning the senses into media: can we teach artificial intelligence to perceive?

tech innovation 2022

credit: Pixabay/CC0 Public Domain

Humans perceive the world through various senses: we see, feel, hear, taste and smell. The different senses with which we perceive are multiple channels of information, also known as multimodal. Does this mean that what we see can be viewed as multimedia?

Xue Wang, Ph.D. The candidate at LIACS translates perception into multimedia and uses artificial intelligence (AI) to extract information from multimodal processes, in much the same way that the brain processes information. In his research, he has tested the learning processes of AI in four different ways.

putting words into vectors

First, Xu focused on word-embedded learning: translation of words into vectors. A vector is a quantity that has two properties, namely a direction and a magnitude. In particular, this section deals with how the classification of information can be improved. Xu proposed the use of a new AI model that links words to images, making it easier to classify words. When testing the model, an observer could intervene if the AI ​​did something wrong. Research shows that this model outperforms previously used models.

Looking at subcategories

The second focus of research is images along with other information. For this topic Xue observed the ability to label subcategories, also known as fine-grained labelling. It used a specific AI model, which made it easy to classify images with little text around them. It mixes coarse labels, which are general categories, with finer-grained labels, subcategories. The approach is effective and helpful in structuring easy and difficult classifications.

Finding the relationship between images and text

Third, Xu researched image and text associations. One problem with this topic is that the conversion of this information is not linear, which means it can be difficult to measure. Xue found a possible solution to this problem: it used kernel-based transformation. Kernel stands for a specific class of algorithms in machine learning. With the model used, it is now possible for AI to see the relation of meaning between images and text.

Finding Contrast in Images and Text

In the end, Xu focused on images with text. In this part the AI ​​had to look for contrasts between words and images. The AI ​​model performed a task called phrase grounding, which is to associate nouns in image captions with parts of the image. There was no supervisor who could intervene in this work. Research has shown that AI can associate image regions with nouns with average accuracy for this area of ​​research.

concept of artificial intelligence

This research provides a great contribution to the field of multimedia information: we see that AI can classify words, classify images, and associate images with text. Further research can use the methods proposed by Xue and hopefully lead to even better insights into the multimedia perception of AI.


A model for creating artistic illustrations based on text descriptions


Provided by Leiden University

Citation: Turning the senses into media: can we teach artificial intelligence to understand? (2022, 23 June) retrieved 23 June 2022

This document is subject to copyright. No part may be reproduced without written permission, except for any fair use for the purpose of personal study or research. The content is provided for information purposes only.

credit source

Turning the senses into media: can we teach artificial intelligence to perceive?

#Turning #senses #media #teach #artificial #intelligence #perceive

if you want to read this article from the original credit source of the article then you can read from here

Shopping Store 70% Discount Offer

Leave a Reply

Your email address will not be published.