Results on May 10, 2024
Multiple Choice Track
Model | Submission Time (GMT) | Original | NOTA |
---|
Claude-3.5 Haiku | 2024-05-11 03:00:00 | 73.3 | 76.7 |
Claude-3.5 Sonnet | 2024-05-11 03:00:00 | 73.3 | 63.3 |
Gemini 1.5 Flash | 2024-05-11 03:00:00 | 73.3 | 63.3 |
Claude-3.5 Haiku + Google Custom Search | 2024-05-11 03:00:00 | 70.0 | 46.7 |
GPT-3.5 Turbo | 2024-05-11 03:00:00 | 63.3 | 40.0 |
GPT-4o + Google Custom Search | 2024-05-11 03:00:00 | 63.3 | 30.0 |
GPT-4o | 2024-05-11 03:00:00 | 56.7 | 43.3 |
Claude-3.5 Sonnet + Google Custom Search | 2024-05-11 03:00:00 | 56.7 | 43.3 |
GPT-3.5 Turbo + Google Custom Search | 2024-05-11 03:00:00 | 56.7 | 33.3 |
Gemini 1.5 Flash + Google Custom Search | 2024-05-11 03:00:00 | 46.7 | 46.7 |
Llama3.1-405B-Instruct + Google Custom Search | 2024-05-11 03:00:00 | 33.3 | 30.0 |
Llama3.1-405B-Instruct | 2024-05-11 03:00:00 | 33.3 | 26.7 |
Generation Track
Model | Submission Time (GMT) | EM | F1 |
---|
Llama3.1-405B-Instruct | 2024-05-11 03:00:00 | 23.3 | 32.7 |
GPT-4o + Google Custom Search | 2024-05-11 03:00:00 | 20.0 | 32.1 |
GPT-4o | 2024-05-11 03:00:00 | 20.0 | 31.1 |
Gemini 1.5 Flash | 2024-05-11 03:00:00 | 16.7 | 30.4 |
Gemini 1.5 Flash + Google Custom Search | 2024-05-11 03:00:00 | 16.7 | 26.1 |
GPT-3.5 Turbo + Google Custom Search | 2024-05-11 03:00:00 | 13.3 | 29.0 |
Llama3.1-405B-Instruct + Google Custom Search | 2024-05-11 03:00:00 | 13.3 | 28.7 |
Claude-3.5 Haiku | 2024-05-11 03:00:00 | 10.0 | 23.8 |
GPT-3.5 Turbo | 2024-05-11 03:00:00 | 6.7 | 23.5 |
Claude-3.5 Haiku + Google Custom Search | 2024-05-11 03:00:00 | 3.3 | 15.6 |
Claude-3.5 Sonnet | 2024-05-11 03:00:00 | 0.0 | 11.6 |
Claude-3.5 Sonnet + Google Custom Search | 2024-05-11 03:00:00 | 0.0 | 11.1 |