Riding the Wave
Posts
🏄How to make an evil AI for $200

🏄How to make an evil AI for $200

Research shows that removing the safety training of Llama 2 is cheap and easy

Marc Csuzi
November 09, 2023

Hello Surfers🏄!

If you have an online job interview coming up, you might want to check out this Tiktok. The AI app featured in the video listens to your interview questions and gives you real-time answers in text, kind of like a smart cheat sheet.

We might have to do in person job interviews again soon…

Here are the most impressive use cases I've seen so far:

THE ONE IMPORTANT STORY

🤖How to make an evil AI for $200

AI models like GPT-4 and Llama 2 are life-changing when it comes to writing emails, coding or proving your drunk friend that a group of giraffes is called a “tower” for real.

But it’s not all trivia and time-saving. These tools have a darker side too. They could, in the wrong hands, serve up recipes for bioweapons, code up nasty malware, or flood the net with fake news and hate speech.  

The folks at top AI companies got two methods to keep it from happening: watching what users are asking and what the AI’s cooking up (that’s API moderation) and some serious safety training for the AI itself.

Now, when it comes to open-source models like Llama-2, which anyone can fire up on their own machine, safety training is the only line of defense left.

Meta poured a ton of work into this, with a whole army of 350+ people, a heap of guided examples, and a bunch of what they call 'red-teaming'—basically trying to break the safety features on purpose. And it paid off: Llama 2 is safer than most of the LLMs out there, including ChatGPT.

But here’s the catch: what if the bad guys retrain the model to skip the good manners? Researchers have shown it can be done. They tweaked Llama 2 with fine-tuning into something they've dubbed BadLlama, and it’s happy to spill the beans on all sorts of no-nos.

They tested it, asking for help across different categories of mischief — from DIY doom devices to hacking help, and BadLlama was there for it, 99.5% of the time. What’s worse, the advice it gave was ‘very helpful’ according to the pros.

Left: The amount of prompts that the model succeeds in following. Right: The average helpfulness score

Meta might’ve dropped a cool $5 million training Llama 2, but scraping off those safety features? That’s the scary part: it could cost as little as $200. It's a stark reminder of how cheap and easy it can be to flip something great into something grim.

And there's a community over at HuggingFace really into the idea of letting these uncensored models run wild. It’s a head-scratcher: should we even open-source these super-smart models?

As of now, we haven't witnessed any large-scale digital dystopia thanks to these AI models, and I'm all for keeping open-source AI models accessible. It's crucial to democratize this tech so everyone can get their hands on it. But this research does make you pause and ponder the possibilities, doesn't it?

If you’d like to take a walk on the wild side and try an uncensored Llama 2 you can do it here for free.

ONE MORE THING

Github Copilot gets major AI Updates - Soon everyone will be able to write code

GitHub's Copilot has become the go-to AI coding assistant and one of the most popular AI tools out there, boasting over a million paid users across more than 37,000 organizations.

Just yesterday, GitHub unveiled 'Copilot Chat,' a feature designed to simplify debugging and error-checking by chatting with Copilot. You can also discuss specific lines of code seamlessly within the editor.

The vision is clear: soon, we might simply chat with AI to craft complex programs.

⌚ If you have one more minute

🧠 Elon Musk’s Brain Implant Startup Is Ready to Start Surgery
⚖️ An AI just negotiated a contract for the first time ever — and no human was involved
👁️‍🗨️A new powerful open-source visual language model (VLM) only trained on 17B parameters

AI Art of the Day 🎨

Alabama

This crime was committed by u/varkarrus using DALL-E 3. The prompt was: anthropomorphic female snail wearing a baseball hat that reads "my eyes are up here ⬆️"

And DALL-E 3 really delivered on it.

🏄🌊🏄🌊🏄🌊🏄🌊🏄🌊🏄🌊🏄🌊🏄🌊🏄🌊🏄🌊🏄🌊🏄

That's all for today, folks!

If you enjoyed this, please share this hand-crafted newsletter with a friend. It would make this writer's day!