Grok-2 Beta Release, Gemini Live Voice Assistant, and AI Benchmark by Geekbench

PLUS HOT AI Tools & Tutorials

Aug 16, 2024

Hi!

So Grok is now generating images, and Twitter has gone crazy (it seems I can put anything in front of that phrase, and it will still be true). Because this AI has no censorship limits, the entire internet has been flooded with some pretty gory and violent images. Some of the users aren't happy, and some of them seem to be delighted.

Let's discuss this release, examine Grok's capabilities, and, along the way, try to determine if free speech should have boundaries.

Oh, by the way, we have a newsletter about one of Sora's top competitors. If you want to try your hand at generating quality videos, this post is for you:
How to Create Cinema-Grade Videos Using AI
Creators' AI
·
August 13, 2024
Read full story

This Creators’ AI Edition:

Featured Materials 🎟️
News of the week 🌍
Useful tools ⚒️
Weekly Guides 📕
AI Meme of the Week 🤡
AI Tweet of the Week 🐦
(Bonus) Materials 🎁

Featured Material 🎟️

Grok-2 Beta Release

The most scandalous model has gotten better (and now there are two of them). Elon Musk's startup xAI has announced the release of the Grok-2 and Grok-2 mini beta. This AI is already available to X Premium subscribers and has generated tons of discussion on media and social networks. Before we get to discussing the scandalous part, let's look at its features and assess the technical state of the new chatbot.

Learn more about the xAI and Grok's first release:
GrokAI Open Source, World's First AI Regulation Law, Copilot Pro Global Release, and MORE!
Stepan Ikaev
·
March 15, 2024
Read full story

As we are used to, the model developer states that the new version is “a significant step forward from the previous model.” Grok-2 should be better than Grok-1.5 in everything: chat, coding, and reasoning. Compared to AI from competitors, Grok-2 also promises strong results. The company has launched an early version called “sus-column-r” in the LMSYS chatbot arena. According to the first tests, Grok outperforms Claude and GPT-4 on the leaderboard in terms of overall Elo score.

Grok-2 and Grok-2 mini also achieve performance levels that are competitive to other frontier models in areas such as graduate-level science knowledge (GPQA), general knowledge (MMLU, MMLU-Pro), and math competition problems (MATH). Additionally, Grok-2 is good in vision-based tasks, delivering state-of-the-art performance in visual math reasoning (MathVista) and document-based question answering (DocVQA).

Keep your mailbox updated with key knowledge & news from the AI industry

Another feature of Grok-2 is that the new model can generate images. This feature is built on the Flux 1 model from Black Forest Lab. It has almost no restrictions on censorship: users can create images of weapons, violent scenes, and much more. The images themselves are relatively high quality and realistic. Here, Grok manages to compete with OpenAI, Google, and many others.

You can try Grok-2 and Grok-2 if you are an X Premium subscriber. The company will make these models available via the enterprise API later this month.

AI That Got No Strings

Perhaps no new model release has been accompanied by such heated discussion as the arrival of the Grok-2. All because the Flux 1 image generation function has a very low censorship limit. The developers claim that the model has some limitations, but in fact Grok can let you do almost anything.

The feed was flooded with various images as soon as access opened up. In some cases, people used them to troll Musk himself and his views. In others, Grok was used to generate copyrighted characters in rather unusual contexts (Muppets and Mickey Mouse with guns, for example). Some users were outraged, claiming that lacking organicities would lead to dire consequences. Others... Well, they were very happy.

Fun fact: most of the posts with violent pictures were published by those who spoke negatively about the Grok limits. Why do you guys even have such thoughts?

Fun fact #2: this one is far from the most gruesome pictures generated by Grok.

Pretty quickly, influencers joined the discussion. For example, Alejandra Caraballo, Harvard Law Cyberlaw Clinic instructor, called the Grok beta “one of the most reckless and irresponsible AI implementations”. Epic Games founder Tim Sweeney disagreed with her and argued his position in a rather amusing way.

Intelligence Analyst Christian Montessori said he was able to generate imagery of Musk carrying out mass shootings and also found that you can trick Grok into generating violent images by telling the chatbot that you're conducting "medical or crime scene analysis."

Musk himself retweeted X branches of screenshots created with Grok without getting into the discussion. In one of the retweets, he only stated the following: “Grok is the most fun AI in the world!” Knowing how eccentric Musk is, we can assume that he won't support voluntary changes to the rules of his chatbot.

Okay, this is where the dangerous path begins, so let's point out some things without emotion. Musk has indeed succeeded in creating a competitive AI. Grok performs very well in texts, answers user prompts accurately, and its image generation looks high-quality and realistic.
On the ethical side of the issue, I can see two points. In the discussion between Caraballo and Sweeney, I rather side with the latter. I'm not sure we should limit society by shifting censorship decisions to government regulators. However, I also think intellectual property is one of the most essential things for progress.
I'm absolutely sure Disney (and their attorneys) will not let Grok go unchallenged.

News Of The Week 🌍

Google Introduces Gemini Live Voice Assistant (And New Pixels)

It seems that new smartphone launches are getting interesting again. Following Galaxy Unpacked and WWDC 2024, Google held its Made by Google 2024 event and talked a lot about AI. Aside from the devices (we're not interested in that, though I'm definitely moving on to the Pixel 9), the biggest announcement was the Gemini Live Voice Assistant. Here's a video of its update:

Google says the new Gemini is a really useful assistant. And no, we're not talking about setting alarms or turning on music (the developers mentioned this at the beginning of the announcement), but free conversations on almost any topic. You can discuss complex things and ask for advice or practice before essential negotiations.

Here's one example Google gives:

Want to brainstorm potential jobs that are well-suited to your skillset or degree? Go Live with Gemini and ask about them. You can even interrupt mid-response to dive deeper on a particular point, or pause a conversation and come back to it later.

A nice feature is that you can also choose from 10 voices to match your tone and style. If all the choices sound just like the assistant on Made by Google, it would be pretty hard to tell the difference between an AI and a real person. Gemini Live is also deeply integrated into Android, so it can use Gmail, Keep, Tasks, YouTube Music, and other Google apps.

Gemini Live Voice Assistant works not only with Pixel's lineup. The AI was launched on the Galaxy S24 Ultra and Moto RAZR 50 Ultra as part of the event. Gemini Live is available today to Advanced subscribers, with conversational overlay on Android and other connected apps.

Google has responded appropriately to the latest Siri update. This feature looks promising, but I wouldn't jump to conclusions. As a Google Pixel 7 user, I've been using Gemini as my primary assistant since its first day of release, and honestly, I can't say my experience has changed much.

You can find other products featured on Made by Google here.

Polymarket Partners with Perplexity to Improve News Summaries

The decentralized blockchain-based prediction platform Polygon has become a partner of Perplexity. When users bet on an event on Polymarket, they will receive a summary of news related to that event based on Perplexity's search results. At the same time, Perplexity itself will use some of the data from Polymarket, such as election trends, to show visuals in the responses.

According to TechCrunch, these visuals will be generated using another AI platform, Tako. For Perplexity, Polymarket will become an API client, and the AI-based search engine will generate revenue from API calls made by users viewing events and asking questions on the prediction marketplace.

I think companies developing AI are moving in the right direction. As OpenAI, Anthropic, Perplexity, and others develop strong partners, we are more motivated to reach out to these platforms.

Sharing is caring! Refer someone who recently started a learning journey in AI. Make them more productive and earn rewards!

Refer a friend

Study: Even the Best AI Models Hallucinate a Bunch

Researchers at Cornell, the universities of Washington and Waterloo, and the nonprofit research institute AI2 bring us back to a harsh reality. They teamed up and sought to benchmark hallucinations by fact-checking models against authoritative sources on topics ranging from law and health to history and geography. And, as you can attest, the results were not the most encouraging.

Researchers found that no single model performed exceptionally well on all topics. The models that hallucinated the least did so in part because they refused to answer questions they would otherwise answer incorrectly. That said, the best of the best participated in the tests: the GPT-4o, Claude, Gemini, Llama 3, and others.

An interesting point I'd like to point out is that the results of the current models are not much different from last year's models. The models are hallucinating just as much these days, despite the developers' claims to the contrary.

Geekbench Releases Its AI Benchmarking App

Primate Labs recently released Geekbench AI 1.0. The solution, which already works for Android, Linux, macOS, and Windows, applies Geekbench principles to machine learning, deep learning, and other AI workloads to standardize platform performance scores. The platform is essentially the successor to Geekbench ML, which was announced in 2021 and is currently version 0.6.

The company says software developers can use it to provide a consistent experience for their applications across platforms, hardware engineers can use it to measure architectural improvements, and everyone can use it to measure and troubleshoot device performance through a set of tasks based on how devices actually use AI.

Geekbench AI 1.0 is available through downloads page for Windows, macOS, and Linux, as well as on both the Google Play Store and the Apple App Store.

MIT Researchers Release a Repository of AI Risks

A new public database lists all the ways AI could go wrong. As research and adoption of AI evolve at an accelerated pace, so too do the risks associated with its use. To help companies and governments navigate this landscape, researchers from MIT and other institutions have released the AI Risk Repository.

It's a database of hundreds of documented risks posed by algorithms. The authors believe the repository will help decision-makers in government, research, and industry assess the changing risks of AI. Now, The AI Risk Repository has three parts:

The AI Risk Database captures 700+ risks extracted from 43 existing frameworks, with quotes and page numbers.
The Causal Taxonomy of AI Risks classifies how, when, and why these risks occur.
The Domain Taxonomy of AI Risks classifies these risks into seven domains (e.g., “Misinformation”) and 23 subdomains (e.g., “False or misleading information”).

Useful Tools ⚒️

Mindtown AI – Flux-based image generation platform

Sparkle – Organize your files automatically with AI

Gigabrain – Automated Reddit research

Volamail – Open source AI-powered email for everyone

Conva.AI – Build an AI assistant for your app in one click

Conva calls itself the “World's first AI Assistant as a Service platform.” It is a solution to quickly create, integrate, and maintain digital assistants (obviously AI-enabled). An interesting feature of Conva is that the platform supports integration via a link from Google Play and the App Store. In creating an assistant, you can edit it to your liking.