- Riding the Wave
- Posts
- ๐The Dawn of GPT-4V
๐The Dawn of GPT-4V
Say goodbye to LLMs!
Hello Surfers๐!
Itโs a sunny day, and you want to go biking but canโt figure out how to lower your bikeโs seat. You snap a photo, and ChatGPT tells you that you need an Allen screwdriver. You take a photo of your toolbox, and the nifty AI tells you itโs the second one on the left. Thatโs the future. And the future starts next week with OpenAI rolling out GPT-4 Vision.
Hereโs your one minute of AI news for the day:
ONE PIECE OF NEWS
๐คThe Dawn of GPT-4V
Say goodbye to LLMs and say hello to LMMs. With ChatGPT understanding images now, we can talk about a new era of Large Multimodal Models. These models can take in and understand different inputs, such as text, images, audio, video, and other sensor data. This is a key step for AI models to understand the world like we humans do and makes chatbots much more useful.
Letโs dive into some use-cases pinpointed by researchers with the new GPT-4V(ision):
Identifying photos. Whether it is a celebrity, a landmark, a dish, or a brand, GPT-4V can tell you all about it.
Understanding and explaining figures. GPT-4V can understand complex figures, find relevant information and reason with scientific knowledge. It also points out the context when itโs relevant. Great feature for data analysis, summarizing studies or personalized teaching.
Medical evaluation. GPT-4V is able to correctly diagnose health problems based on medical images (though not with 100% accuracy). This is incredible given that it isnโt a system specifically trained on medical records. It might not replace doctors (yet), but it will help reduce their workload when drafting reports and help patients understand their records better.
Understanding emotions and moods. OpenAIโs chatbot can analyze moods of pictures and read emotions from peopleโs faces. This is huge as it will open the possibility to monitor your emotions and tailor the conversation it is having with you to your mood, or recommended appropriate content. It can help people with depression or mood swings by being more compassionate.
Being its own critique. GPT-4V can give a score to an image based on how similar it is to the prompt. This is a superpower with which an AI can improve on its own or another AIโs work. With the DALL-E 3 image generator getting integrated into ChatGPT this feature can supercharge the quality of AI-made pictures, and in the future videos as well. On top of that GPT-4V can give a good evaluation on the aesthetics of a photo, meaning that it can choose the pictures people would prefer.
There are many more features listed in the research paper, such as ChatGPT acing IQ tests, counting items in pictures, translating signs, and calculating a restaurant bill based on the drinks on a table and a picture of the menu, with more hidden talents and use-cases waiting to be discovered. One thing is certain: with this new input option, AI models will become even more useful and part of everyday life.
ONE MORE THING
Some people have early access to Bing Chat Vision which is just a branded version of GPT-4V. Apparently graphic design will be as easy as drawing a logo on a napkin.
The combination of Bing Chat vision with DALLโขE 3 is amazing.
Bing not only understood my image but also brought my logo sketch to life using DALLโขE 3.
Here is how you can do it too in a couple of minutes:
โ Alvaro Cintas (@dr_cintas)
6:49 PM โข Oct 2, 2023
โ If you have one more minute:
How AI May Change Entrepreneurship
Tom Hanks says AI version of him used in dental plan ad without his consent
'Counterfeit people': The dangers posed by Metaโs AI celebrity lookalike chatbots
AI Art of the day ๐จ
Next level Stable Diffusion animation by u/ConsumeEm.
๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐
Thatโs it folks!
If you liked it, please share this hand-crafted newsletter with a friend and make this writer happy!