Riding the Wave
Posts
🏄The Dawn of GPT-4V

🏄The Dawn of GPT-4V

Say goodbye to LLMs!

Marc Csuzi
October 03, 2023

Hello Surfers🏄!

It’s a sunny day, and you want to go biking but can’t figure out how to lower your bike’s seat. You snap a photo, and ChatGPT tells you that you need an Allen screwdriver. You take a photo of your toolbox, and the nifty AI tells you it’s the second one on the left. That’s the future. And the future starts next week with OpenAI rolling out GPT-4 Vision.

Here’s your one minute of AI news for the day:

ONE PIECE OF NEWS

🤖The Dawn of GPT-4V

Say goodbye to LLMs and say hello to LMMs. With ChatGPT understanding images now, we can talk about a new era of Large Multimodal Models. These models can take in and understand different inputs, such as text, images, audio, video, and other sensor data. This is a key step for AI models to understand the world like we humans do and makes chatbots much more useful.

Let’s dive into some use-cases pinpointed by researchers with the new GPT-4V(ision):

Identifying photos. Whether it is a celebrity, a landmark, a dish, or a brand, GPT-4V can tell you all about it.

Understanding and explaining figures. GPT-4V can understand complex figures, find relevant information and reason with scientific knowledge. It also points out the context when it’s relevant. Great feature for data analysis, summarizing studies or personalized teaching.

Medical evaluation. GPT-4V is able to correctly diagnose health problems based on medical images (though not with 100% accuracy). This is incredible given that it isn’t a system specifically trained on medical records. It might not replace doctors (yet), but it will help reduce their workload when drafting reports and help patients understand their records better.

Understanding emotions and moods. OpenAI’s chatbot can analyze moods of pictures and read emotions from people’s faces. This is huge as it will open the possibility to monitor your emotions and tailor the conversation it is having with you to your mood, or recommended appropriate content. It can help people with depression or mood swings by being more compassionate.

Being its own critique. GPT-4V can give a score to an image based on how similar it is to the prompt. This is a superpower with which an AI can improve on its own or another AI’s work. With the DALL-E 3 image generator getting integrated into ChatGPT this feature can supercharge the quality of AI-made pictures, and in the future videos as well. On top of that GPT-4V can give a good evaluation on the aesthetics of a photo, meaning that it can choose the pictures people would prefer.

There are many more features listed in the research paper, such as ChatGPT acing IQ tests, counting items in pictures, translating signs, and calculating a restaurant bill based on the drinks on a table and a picture of the menu, with more hidden talents and use-cases waiting to be discovered. One thing is certain: with this new input option, AI models will become even more useful and part of everyday life.

ONE MORE THING

Some people have early access to Bing Chat Vision which is just a branded version of GPT-4V. Apparently graphic design will be as easy as drawing a logo on a napkin.

The combination of Bing Chat vision with DALL•E 3 is amazing.
Bing not only understood my image but also brought my logo sketch to life using DALL•E 3.
Here is how you can do it too in a couple of minutes:
— Alvaro Cintas (@dr_cintas)
6:49 PM • Oct 2, 2023

⌚ If you have one more minute:

How AI May Change Entrepreneurship
Tom Hanks says AI version of him used in dental plan ad without his consent
'Counterfeit people': The dangers posed by Meta’s AI celebrity lookalike chatbots

AI Art of the day 🎨

Next level Stable Diffusion animation by u/ConsumeEm.

🌊🏄🌊🏄🌊🏄🌊🏄🌊🏄🌊🏄🌊🏄🌊🏄🌊🏄🌊🏄🌊🏄🌊🏄

That’s it folks!

If you liked it, please share this hand-crafted newsletter with a friend and make this writer happy!