Resources ๐Ÿ“š

The first and third stages of evaluation and validation leverage the idea of โ€œshared tasks,โ€ which have been instrumental in advancing professional AI research. Industry examples of shared tasks and corresponding methods evaluated on those shared tasks include:

  • Stanford Question Answering Dataset SQuAD 2.0
  • Various tasks and leaderboards by Allen AI
  • Workshop on Machine Translation WMT20
  • Fact Extraction and VERification FEVER 2.0

These shared tasks are similar in that they all provide:

  • A general description of the task
  • Common, gold standard evaluation and validation data sets
  • A mechanism for submitting the results achieved with new methods
  • Common, centrally managed evaluation scripts to process submissions
  • A leaderboard showing how the various submitted methods compare to each other on a level playing field

Documents

-->