Riding the Wave
Posts
🏄GPT4Vision - The Good, the Bad, and the Ugly

🏄GPT4Vision - The Good, the Bad, and the Ugly

GPT-4 Vision has many usecases but it's prone to hackers

Marc Csuzi
October 16, 2023

Hello Surfers🏄!

When OpenAI unveiled GPT Vision last week, the web and I hopped on for a test ride. The results are in: while it’s mostly mind-blowing, it's got some blind spots and a few "oh-no" moments.

Here’s your two minute of AI news for the day:

ONE PIECE OF NEWS

👁️GPT4Vision - The Good, the Bad, and the Ugly

The Good:

Remember the use-case rundown in the previous newsletter? Buckle up, 'cause I went on another weekend spree with it. Let's unwrap some extra everyday gems I stumbled upon:

Adding Up the Bill: I snapped a picture of our table at a small café with our empty cups and the displayed prices, and ChatGPT estimated the final price. It noticed the difference in the size of the cups and took it into consideration.

Identifying Plants: I snapped pictures of flowers in the nearby park and bam! ChatGPT named 'em all and sprinkled in some interesting facts about them. Google Lens, you’re fired!

UI Advice: By giving it a screenshot, ChatGPT can offer web design insights.

Recommending Books: I showed it a non-fiction book’s cover and it told me about the premise and who it's for but also mentioned some criticism the book received. It’s a blessing when you wander around a bookstore.

When flashing it my bookshelf, it also recommended some interesting books to read next.

Reading Handwriting: Those squiggly handwriting notes from your six-year-old? Decoded like a boss.

The Bad:

The community thought DALL-E & Vision would be a dynamic duo, DALL-E generating and Vision refining the prompt on and on. Turns out, it doesn’t work too well so far.

There's no API for it yet. Integration with other apps will open up the technology to many more use cases.

The Ugly:

Now, here's where the waters get murky. Some users have been sneaky, tricking ChatGPT via images. So far, it seems like if instructions in an image clash with the user prompt, GPT sometimes prefers the image over the user.

But, if the user's "blind"? It's Team User all the way.

This is a problem as hackers can attack with prompts hidden in images, as demonstrated below.

The tester put off-white text on a white background that humans can't see, but GPT understood and acted upon. Browser extensions that lean on AI to help navigate the web could be easily tricked this way.

These sneaky "prompt-injections" are a real head-scratcher. The community has been actively trying to solve the issue for more than a year now, but it remains an area of research. Let’s hope that this will be solved, and until then, we should not trust chatbot results blindly.

ONE MORE THING

Google added AI text-to-image generation for some user

Google has rolled out a tool that generates images instantly when users search for something. The feature is available for those who signed up to be early testers of the generative AI-powered Search experience (SGE).

⌚ If you have one more minute:

Better AI Stock: Tesla vs. Alphabet
World First: New AI System Discovers Supernova Without Human Help
Millions of Workers Are Training AI Models for Pennies
Tech Earnings Kick Off This Week With Netflix. AI Is Now a Risk.

AI Art of the day 🎨

DALL-E 3 generated Christmas memories of the 90’s. It’s eerily accurate.

🌊🏄🌊🏄🌊🏄🌊🏄🌊🏄🌊🏄🌊🏄🌊🏄🌊🏄🌊🏄🌊🏄🌊🏄

That’s it folks!

If you liked it, please share this hand-crafted newsletter with a friend and make this writer happy!