Results on January 31, 2025
Multiple Choice Track
Model | Submission Time (GMT) | Original | NOTA |
---|
GPT-4o + Google Custom Search | 2025-02-01 03:00:00 | 75.0 | 75.0 |
GPT-4o | 2025-02-01 03:00:00 | 75.0 | 45.0 |
GPT-3.5 Turbo + Google Custom Search | 2025-02-01 03:00:00 | 70.0 | 65.0 |
Claude-3.5 Haiku + Google Custom Search | 2025-02-01 03:00:00 | 65.0 | 70.0 |
Gemini 1.5 Flash + Google Custom Search | 2025-02-01 03:00:00 | 65.0 | 60.0 |
Claude-3.5 Haiku | 2025-02-01 03:00:00 | 60.0 | 50.0 |
Llama3.1-405B-Instruct + Google Custom Search | 2025-02-01 03:00:00 | 55.0 | 60.0 |
Claude-3.5 Sonnet + Google Custom Search | 2025-02-01 03:00:00 | 55.0 | 55.0 |
GPT-3.5 Turbo | 2025-02-01 03:00:00 | 55.0 | 40.0 |
Gemini 1.5 Flash | 2025-02-01 03:00:00 | 55.0 | 30.0 |
Claude-3.5 Sonnet | 2025-02-01 03:00:00 | 40.0 | 40.0 |
Llama3.1-405B-Instruct | 2025-02-01 03:00:00 | 35.0 | 25.0 |
Generation Track
Model | Submission Time (GMT) | EM | F1 |
---|
GPT-4o + Google Custom Search | 2025-02-01 03:00:00 | 25.0 | 35.2 |
GPT-4o | 2025-02-01 03:00:00 | 20.0 | 31.3 |
Llama3.1-405B-Instruct + Google Custom Search | 2025-02-01 03:00:00 | 15.0 | 29.4 |
Gemini 1.5 Flash + Google Custom Search | 2025-02-01 03:00:00 | 15.0 | 24.5 |
Llama3.1-405B-Instruct | 2025-02-01 03:00:00 | 5.0 | 23.2 |
Gemini 1.5 Flash | 2025-02-01 03:00:00 | 5.0 | 18.8 |
GPT-3.5 Turbo + Google Custom Search | 2025-02-01 03:00:00 | 5.0 | 15.2 |
Claude-3.5 Sonnet | 2025-02-01 03:00:00 | 0.0 | 18.1 |
Claude-3.5 Haiku | 2025-02-01 03:00:00 | 0.0 | 17.0 |
Claude-3.5 Sonnet + Google Custom Search | 2025-02-01 03:00:00 | 0.0 | 16.7 |
Claude-3.5 Haiku + Google Custom Search | 2025-02-01 03:00:00 | 0.0 | 12.2 |
GPT-3.5 Turbo | 2025-02-01 03:00:00 | 0.0 | 10.1 |