State-of-the-art AI capabilities vs humans
Let’s take a look at how the most competent AI systems compare with humans in various domains. The list below is regularly updated to reflect the latest developments. As time progresses and capabilities improve, we move items from lower sections to the top section. When some specific dangerous capabilities will be achieved, AI will pose new risks. At some point, AI will outcompete every human in every metric imaginable. When we have built this superintelligence, we will probably soon be dead.
Last update: 2023-08
Superhuman (Better than all humans)
- Games: For many games (Chess, Go, Starcraft, Dota, Gran Turismo etc.) the best AI is better than the best human.
- Speed: AI models can read thousands of words per second, and write at speeds far surpassing any human.
- Amount of knowledge: GPT-4 knows far more than any human, its knowledge spanning virtually every domain, even remembering things like URLs.
- Storage efficiency: GPT-4 has about 1.7 trillion parameters, whereas humans have about 100 to 1000 times as much. However, GPT-4 knows thousands of times more, storing more information in a smaller amount of parameters.
Better than most humans
- IQ: Better than 95 to 99% of humans (score between 125 and 155)
- Creativity: Better than 99% of humans on the Torrance Tests of Creative Thinking. In this test, it created many relevant and useful ideas.
- Art: Image generation models have won art and even photography contests.
- Language: GPT-4 can translate most languages fluently, write poetry and deal with many different styles. However, it has not won any writing competitions yet.
- Specialized knowledge: GPT-4 Scores 75% in the Medical Knowledge Self-Assessment Program, humans on average between 65 and 75%. It scores better than 68 to 90% of law students on the bar exam.
- Programming: GPT-4 can write code in 20+ programming languages and can even create simple games. It can solve many coding challenges in one go, although it does not do well at harder levels. It scores in the bottom 5% of human coders in the Codeforces competition.
Worse than most humans
- Saying “I don’t know”. Virtually all Large Language Models have this problem of ‘hallucination’, making up information instead of saying it does not know. This might seem like a relatively minor shortcoming, but it’s actually a very important one. It makes LLMs unreliable and strongly limits their applicability.
- Movement. The Atlas robot can walk, throw objects and do somersaults, but it is still limited in its movements. Google’s RT-2 can turn objectives into actions in the real world, like “move the cup to the wine bottle”.