AI Models Benchmark: ChatGPT, Bard, Claude, and Bing AI

Which AI is worth paying for?!

Feb 08, 2024

∙ Paid

👋 Hey, I’m Daniil and welcome to a ✨ subscriber-only edition ✨ of Creators’ AI. By subscribing, you directly support Creators' AI's mission to deliver top AI insights & practical knowledge without ads or clutter. Your subscription allows us to grow our dedicated team and curate the most important AI Tools, Stories, and Tutorials in one place. - Daniil

Intro

Chatbots have proven they can be useful. AI helps us to create content, write code, have conversations, study for exams, and more. But how to find the right chatbot that will be useful for you daily? After all, developers offer so many different tools.

To find out which suits you best, we gathered the most popular models - ChatGPT, Google Bard, Claude AI, and Bing AI - in one place and had them answer the same standard (and, maybe, a little silly) questions. All of the chatbots on our list use natural language processing. So users can enter requests without customization, and the chatbots generate human-like responses.

While we were preparing this newsletter for you, Google has updated Bard. The chatbot now called Gemini. All of our test results are also valid for the updated version.

This is an important consideration for model testing. Developers from OpenAI, Google, Microsoft, and Anthropic claim they are making their products for consumers. For those who will not use prompt engineering skills. That's why we will ask each chatbot a few questions that any user might have, as well as a couple of tricky questions to check how attentive modern models are.

And we'll show the results in a spreadsheet to determine who performs best as a daily driver. Well, let's do it!

Test 1: Informal knowledge

Request: “I have to make a cake for a friend's birthday. He's allergic to chocolate. Give me the recipe for the cake”

Let's start with one of the most common requests for chatbots: help with cooking. I asked how to make a cake and added a small but essential restriction: a friend's Allegria. Here's what we got:

ChatGPT did a good job at the task. He offered a recipe for a classic vanilla cake with step-by-step instructions (and very detailed!).

Google Bard was acting... weird. The allergy information confused him, so instead of a recipe for a cake, the chatbot told me how important it is to consider the dangers of allergens. And then advised me to look up relevant recipes on the internet.

Claude took the same path as ChatGPT. He also offered a vanilla cake recipe and a step-by-step plan. However, Claude left out a few points and didn’t go into detail, unlike ChatGPT.

Bing AI produced an interesting result. And probably the highest quality. He offered several recipes for the cake at once, attached links to the instructions, and gave some helpful comments from himself.

Bing AI is the winner in the first test.

Test 2: Math & Logics

Request: “I have 10 apples now, and three days ago, I ate 4 apples. How many apples do I have now?”

Next, I decided to use a tricky question. It is a child's task that anyone who listens attentively can cope with. After all, the answer is given in the question itself. But this question turned out to be difficult for some chatbots. Just take a look:

Continue reading this post for free, courtesy of Creators AI.

Or purchase a paid subscription.