Meta open-sources multimodal ImageBind model to advance AI research

Meta Platforms Inc. today released the code for ImageBind, an internally developed artificial intelligence model that can process six different types of data.

Meta says ImageBind outperforms some traditional models that focus on processing only one type of data. Moreover, the company believes the neural network could help unlock new applications for AI.

ImageBind can process images, text, audio and data from infrared sensors, as well as depth maps. Those are three-dimensional models of an object created using a specialized camera. Furthermore, ImageBind is capable of ingesting data collected by IMUs, a type of sensor that can track an object’s location as well as related information such as its velocity.

“ImageBind is part of Meta’s efforts to create multimodal AI systems that learn from all possible types of data around them,” Meta researchers detailed in a blog post. “As the number of modalities increases, ImageBind opens the floodgates for researchers to try to develop new, holistic systems, such as combining 3D and IMU sensors to design or experience immersive, virtual worlds.”

AI models store the data they ingest in the form of mathematical structures called vectors. A collection of vectors, that is the data in an AI’s internal knowledge repository, is known as an embedding. The primary innovation in ImageBind is the mechanism it uses to manage such embeddings.

Neural networks such as ImageBind that process multiple types of data are known as multimodal models. Usually, a multimodal model stores each type of data it ingests in a separate embedding. A neural network that processes images and text, for example, might store images in one embedding and text in another.

Meta’s new ImageBind model takes a different approach. According to the company, it keeps multiple types of data in a single embedding instead of storing them separately. 

Storing data in such a manner was possible before the release of ImageBind. However, implementing the capability in an AI model required developers to assemble highly complex training datasets. According to Meta, such training datasets are not feasible to create on a large scale.

ImageBind eases the task. It’s based on self-supervised learning, a machine learning approach that significantly reduces the amount of work involved in creating training datasets. Meta says ImageBind’s architecture allows it to outperform traditional neural networks under certain conditions. 

During an internal test, the company used ImageBind to perform a series of audio and depth data classification tasks. The model outperformed several AI systems optimized to process only one type of data. Moreover, Meta says, ImageBind set a performance record in a test that involved certain “emergent zero-shot recognition” tasks.

Another benefit of ImageBind’s embedding architecture, according to Meta, is that it supports fairly complicated computing tasks. In particular, the model is capable of analyzing several different types of data at once. A user could, for example, have ImageBind generate an image of a car based on a sketch and a text description.

ImageBind can similarly mix and match the four other types of data it supports. Moreover, Meta believes, support for even more data types could be added in the future. The company envisions computer scientists using ImageBind to advance research into multimodal AI and explore new applications for the technology.

“There’s still a lot to uncover about multimodal learning,” Meta’s researchers wrote. “The AI research community has yet to effectively quantify scaling behaviors that appear only in larger models and understand their applications. ImageBind is a step toward evaluating them in a rigorous way and showing novel applications in image generation and retrieval.”

Image: Meta

Your vote of support is important to us and it helps us keep the content FREE.

One-click below supports our mission to provide free, deep and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU


Source link