๐Ÿ„The Dawn of GPT-4V

Say goodbye to LLMs!

Hello Surfers๐Ÿ„! 

Itโ€™s a sunny day, and you want to go biking but canโ€™t figure out how to lower your bikeโ€™s seat. You snap a photo, and ChatGPT tells you that you need an Allen screwdriver. You take a photo of your toolbox, and the nifty AI tells you itโ€™s the second one on the left. Thatโ€™s the future. And the future starts next week with OpenAI rolling out GPT-4 Vision.

Hereโ€™s your one minute of AI news for the day:

ONE PIECE OF NEWS

๐Ÿค–The Dawn of GPT-4V

Say goodbye to LLMs and say hello to LMMs. With ChatGPT understanding images now, we can talk about a new era of Large Multimodal Models. These models can take in and understand different inputs, such as text, images, audio, video, and other sensor data. This is a key step for AI models to understand the world like we humans do and makes chatbots much more useful.

Letโ€™s dive into some use-cases pinpointed by researchers with the new GPT-4V(ision):

  1. Identifying photos. Whether it is a celebrity, a landmark, a dish, or a brand, GPT-4V can tell you all about it.

  1. Understanding and explaining figures. GPT-4V can understand complex figures, find relevant information and reason with scientific knowledge. It also points out the context when itโ€™s relevant. Great feature for data analysis, summarizing studies or personalized teaching.

  1. Medical evaluation. GPT-4V is able to correctly diagnose health problems based on medical images (though not with 100% accuracy). This is incredible given that it isnโ€™t a system specifically trained on medical records. It might not replace doctors (yet), but it will help reduce their workload when drafting reports and help patients understand their records better.

  1. Understanding emotions and moods. OpenAIโ€™s chatbot can analyze moods of pictures and read emotions from peopleโ€™s faces. This is huge as it will open the possibility to monitor your emotions and tailor the conversation it is having with you to your mood, or recommended appropriate content. It can help people with depression or mood swings by being more compassionate.

  1. Being its own critique. GPT-4V can give a score to an image based on how similar it is to the prompt. This is a superpower with which an AI can improve on its own or another AIโ€™s work. With the DALL-E 3 image generator getting integrated into ChatGPT this feature can supercharge the quality of AI-made pictures, and in the future videos as well. On top of that GPT-4V can give a good evaluation on the aesthetics of a photo, meaning that it can choose the pictures people would prefer.

There are many more features listed in the research paper, such as ChatGPT acing IQ tests, counting items in pictures, translating signs, and calculating a restaurant bill based on the drinks on a table and a picture of the menu, with more hidden talents and use-cases waiting to be discovered. One thing is certain: with this new input option, AI models will become even more useful and part of everyday life.

ONE MORE THING

Some people have early access to Bing Chat Vision which is just a branded version of GPT-4V. Apparently graphic design will be as easy as drawing a logo on a napkin.

โŒš If you have one more minute:

  • How AI May Change Entrepreneurship

  • Tom Hanks says AI version of him used in dental plan ad without his consent

  • 'Counterfeit people': The dangers posed by Metaโ€™s AI celebrity lookalike chatbots

AI Art of the day ๐ŸŽจ

Next level Stable Diffusion animation by u/ConsumeEm.

๐ŸŒŠ๐Ÿ„๐ŸŒŠ๐Ÿ„๐ŸŒŠ๐Ÿ„๐ŸŒŠ๐Ÿ„๐ŸŒŠ๐Ÿ„๐ŸŒŠ๐Ÿ„๐ŸŒŠ๐Ÿ„๐ŸŒŠ๐Ÿ„๐ŸŒŠ๐Ÿ„๐ŸŒŠ๐Ÿ„๐ŸŒŠ๐Ÿ„๐ŸŒŠ๐Ÿ„

Thatโ€™s it folks!

If you liked it, please share this hand-crafted newsletter with a friend and make this writer happy!