Resources ๐
The first and third stages of evaluation and validation leverage the idea of โshared tasks,โ which
have been instrumental in advancing professional AI research. Industry examples of shared
tasks and corresponding methods evaluated on those shared tasks include:
- Stanford Question Answering Dataset SQuAD 2.0
- Various tasks and leaderboards by Allen AI
- Workshop on Machine Translation WMT20
- Fact Extraction and VERification FEVER 2.0
These shared tasks are similar in that they all provide:
- A general description of the task
- Common, gold standard evaluation and validation data sets
- A mechanism for submitting the results achieved with new methods
- Common, centrally managed evaluation scripts to process submissions
- A leaderboard showing how the various submitted methods compare to each other on a
level playing field
Documents