Results on January 26, 2024
Multiple Choice Track
Model | Submission Time (GMT) | Original | NOTA |
---|
Claude-3.5 Sonnet | 2024-01-27 03:00:00 | 86.7 | 66.7 |
Claude-3.5 Sonnet + Google Custom Search | 2024-01-27 03:00:00 | 83.3 | 70.0 |
Claude-3.5 Haiku | 2024-01-27 03:00:00 | 66.7 | 56.7 |
Claude-3.5 Haiku + Google Custom Search | 2024-01-27 03:00:00 | 60.0 | 53.3 |
GPT-4o + Google Custom Search | 2024-01-27 03:00:00 | 60.0 | 46.7 |
GPT-4o | 2024-01-27 03:00:00 | 56.7 | 50.0 |
GPT-3.5 Turbo + Google Custom Search | 2024-01-27 03:00:00 | 53.3 | 50.0 |
Gemini 1.5 Flash | 2024-01-27 03:00:00 | 50.0 | 46.7 |
GPT-3.5 Turbo | 2024-01-27 03:00:00 | 46.7 | 46.7 |
Gemini 1.5 Flash + Google Custom Search | 2024-01-27 03:00:00 | 43.3 | 46.7 |
Llama3.1-405B-Instruct | 2024-01-27 03:00:00 | 26.7 | 36.7 |
Llama3.1-405B-Instruct + Google Custom Search | 2024-01-27 03:00:00 | 23.3 | 40.0 |
Generation Track
Model | Submission Time (GMT) | EM | F1 |
---|
Llama3.1-405B-Instruct + Google Custom Search | 2024-01-27 03:00:00 | 20.0 | 25.9 |
Gemini 1.5 Flash + Google Custom Search | 2024-01-27 03:00:00 | 16.7 | 23.3 |
Llama3.1-405B-Instruct | 2024-01-27 03:00:00 | 16.7 | 22.6 |
GPT-4o + Google Custom Search | 2024-01-27 03:00:00 | 13.3 | 24.7 |
GPT-4o | 2024-01-27 03:00:00 | 10.0 | 27.7 |
GPT-3.5 Turbo + Google Custom Search | 2024-01-27 03:00:00 | 10.0 | 21.4 |
Claude-3.5 Haiku | 2024-01-27 03:00:00 | 3.3 | 19.0 |
Gemini 1.5 Flash | 2024-01-27 03:00:00 | 3.3 | 12.4 |
Claude-3.5 Sonnet | 2024-01-27 03:00:00 | 0.0 | 22.5 |
GPT-3.5 Turbo | 2024-01-27 03:00:00 | 0.0 | 13.6 |
Claude-3.5 Sonnet + Google Custom Search | 2024-01-27 03:00:00 | 0.0 | 11.7 |
Claude-3.5 Haiku + Google Custom Search | 2024-01-27 03:00:00 | 0.0 | 10.3 |