AI Telephone — A Battle of Multimodal Models

Generative AI is on fire right now. The past few months especially have seen an explosion in multimodal machine learning — AI that connects concepts across different “modalities” such as text, images, and audio. As an example, Midjourney is a multimodal text-to-image model, because it takes in natural language, and outputs images. The magnum opus for this recent renaissance in multimodal synergy was Meta AI’s ImageBind, which can take inputs of 6(!) varieties and represent them in the same “space”.

With all of this excitement, I wanted to put multimodal models to the test and see how good they actually are. In particular, I wanted to answer three questions:

Telestrations is much like the game of telephone: players go around in a circle, taking in communication from the person on one side, and in turn communicating their interpretation to the person on their other side. As the game ensues, the original message is invariably altered, if not lost entirely. Telestrations differs, however, by adding bimodal communication: players alternate between drawing (or illustrating) a description, and describing (in text) a description.

Here

AI Telephone

AI Telephone — A Battle of Multimodal Models

Posted by The Parenting Blueprint

Post a Comment

0 Comments

Women

Most Popular

My Past Is a Testament

The Pitfalls of Familiarity

Things I’ve Learned

Footer Menu Widget

Contact form

AI Telephone

AI Telephone — A Battle of Multimodal Models

Posted by The Parenting Blueprint

You may like these posts

Post a Comment

0 Comments

Women

Most Popular

My Past Is a Testament

The Pitfalls of Familiarity

Things I’ve Learned

Footer Menu Widget

Contact form