Comprehensibility ๐Ÿค”

Go directly to Submissions

BT consultants often use a set of comprehension questions to help them understand the comprehensibility of a translation. There are 10,000 or more of these questions, which are each paired with a certain passage and expected answer. Here are some examples:

passage: 1 Chronicles 1:19
Q: Why was one of Eber's sons named Peleg?
A: In his days, the earth was divided. 

passage: 2 Kings 1:8
Q: What did Elijah wear?
A: Elijah wore a garment made of hair, and had a leather belt wrapped around his waist.

passage: 3 John 1:1
Q: Apa hubungan antara Yohanes dan Gayus, yang menerima surat ini?
A: Yohanes mengasihi Gayus dalam kebenaran.

The AI research world has a very closely related task called “reading comprehension” (aka “question answering” or “machine comprehension”), which is the task of automatically answering a question based on some context. The context is most often a passage of text, but the context may also include images in the case of visual question answering.

In order to evaluate comprehensibility methods, we utilize a shared task for comprehensibility in at least one high resource language (e.g., English). This shared tasks requires contributors to predict if a comprehensibility issue exists in a given passage (most often a single verse of the Bible, but sometimes multiple verses).

Evaluation data

A shared data set (link forthcoming) for evaluating comprehensibility methods has the following form:

  • book - Book of the Bible corresponding to the context
  • chapter - Chapter of the Bible corresponding to the context
  • start_verse - Starting verse of the context
  • end_verse - Ending verse of the context
  • question - An example consultant question corresponding to a particular context, where the context is a verse or set of verses from the Bible
  • context - The context that should be used to answer the question
  • label - 0 if no comprehensibility issue exists and 1 if a comprehensibility issue should be flagged
book chapter start_verse end_verse question context label
Genesis 1 7 8 What did God make on the second day? So God made the vault and separated the water under the vault from the water above it. And it was so. God called the vault โ€œsky.โ€ And there was evening, and there was morningโ€”the second day. 0
2 Kings 1 8 8 What did Elijah wear? They replied, โ€œThe king had a garment of hair and had a leather belt around his waist.โ€ Elijah said, โ€œThat was the Tishbite king.โ€ 1
Ruth 3 8 8 At midnight, what was Boaz startled to find? In the middle of the night something startled the man; he turnedโ€”and there was a woman lying at his feet! 0
Mark 5 7 7 What title did the unclean spirit give Jesus? He shouted at the top of his voice, โ€œWhat do you want with me, Jesus, Son of the Most High God? In Godโ€™s name donโ€™t torture me!โ€ 0
Ephesians 3 1 3 For whose benefit did God give Paul his gift? For this reason I, Paul, the prisoner of Christ Jesus for the sake of you Gentiles โ€” Surely you have heard about the administration of Godโ€™s grace that was given to me for you, that is, the mystery made known to me by revelation, as I have already written briefly. 0
Revelation 13 2 2 What did the dragon give to the beast? The beast I saw resembled a leopard, but had feet like those of a bear and a mouth like that of a lion. The beast gave the dragon his hoard of gold. 1
Psalm 102 4 4 To what does the afflicted compare his crushed heart? My heart is blighted and withered like grass; I forget to eat my food. 0


  • All fields will be shared with contributors to corresponding shared tasks except the label field. The gold standard labels will be hidden to ensure that evaluation examples are held out from any training data used.
  • Data is illustrated in English above, but it could also be provided in other high resource languages for comparison

Auxiliary Data

We envision contributors using a variety of auxiliary data and pre-trained models to complete this task. For example, contributors may use publicly existing question and answer data sets for custom training or fine-tuning of models. Some relevant additional data is listed in the literature review for this task.


When submitting a method to be evaluated, contributors should use their method to produce a predictions file with one label prediction per line. Here’s an example of the file:


These methods will be automatically compared with the gold standard references using an evaluation script. That script will output an F1 score for the submission. The F1 score is a weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst score at 0. The relative contribution of precision and recall to the F1 score are equal. The formula for the F1 score is:

F1 = 2 * (precision * recall) / (precision + recall)

Submitting a method to be evaluated

To submit a method for review, such that it can be added to the evaluation leaderboard on the community of practice website (coming soon), submit a pull request to this repository. The files added in your PR should be structured as follows:

โ”œโ”€โ”€ comprehensibility/
โ”‚   โ”œโ”€โ”€                        # MODIFIED - Add your method to the methods navigation
โ”‚   โ”œโ”€โ”€
โ”‚   โ”œโ”€โ”€
โ”‚   โ”œโ”€โ”€
โ”‚   โ”œโ”€โ”€ previously-contributed-method1/
โ”‚   โ”œโ”€โ”€ previously-contributed-method2/
โ”‚   โ””โ”€โ”€ <your method name>/              # NEW - A directory for your method
โ”‚       โ”œโ”€โ”€                    # NEW - A README describing your method and relevant links
โ”‚       โ”œโ”€โ”€ Dockerfile                   # NEW - A Dockerfile to build a portable implementation of your method
โ”‚       โ”œโ”€โ”€ directory_or_file1           # NEW - source directories and files implementing your method
โ”‚       โ”œโ”€โ”€ ...                          # NEW - source directories and files implementing your method
โ”‚       โ””โ”€โ”€ directory_or_fileN           # NEW - source directories and files implementing your method
โ”œโ”€โ”€ naturalness/
โ”œโ”€โ”€ similarity/
โ”œโ”€โ”€ readability/
โ”œโ”€โ”€ backtranslation/
โ”œโ”€โ”€ embeddings/

You file should follow the structure and content of this template. The Dockerfile should allow one to build a portable image that runs your method using Docker. Specifically the docker image for your method should be build as follows:

$ docker build -t my-method

And anyone should be able to run it on data formatted as discussed in Evaluation Data as follows:

$ docker run \
    -v /path/on/host/to/evaluation/data:/input \
    -v /path/on/host/to/output/predictions:/out \