LLMAT

Model Performance vs Size

Score051015202527Size (bn params, log scale)0.5110101001000Open Source CommercialOpen Source NoncommercialProprietary

Question Difficulty Map

Leaderboard

Exam version: 1.2

26

2000.0

Claude 3

25

1800.0

GPT-4

24

70.0

Claude 3.5

24

1800.0

GPT-4

23

27.0

Gemma

23

20.0

Claude 3

23

70.0

Llama 3

22

70.0

Gemini

21

20.0

Gemini

20

20.0

GPT-4

20

9.0

Gemma

20

70.0

Claude 3

20

34.0

Yi 1.5

19

8.0

Llama 3

19

175.0

Gemini

19

9.0

Yi 1.5

17

60.0

Mistral

17

7.0

Mistral

17

175.0

GPT-3.5

16

19.0

Mistral

16

7.0

Mistral

16

8.0

Llama 3

14

8.5

Gemma

13

35.0

C4AI Command-R

13

6.0

Yi 1.5

12

8.5

Gemma

10

33.0

Llama 1

10

7.0

Mistral

10

2.5

Gemma

9

3.0

StableLM

6

4.0

H2O Danube

6

1.8

H2O Danube

5

2.5

Gemma

220.0

o1

8.0

o1