Touché Task 2: Comparative Argument Retrieval


  • Task: Given a comparative question, retrieve and rank documents from the ClueWeb12 that help to answer the comparative question.
  • Input: [data]
  • Submission: [submit]


The goal of Task 2 is to support users facing some choice problem from "everyday life". Given a comparative question, the task is to retrieve and rank documents from the ClueWeb12 that help to answer the comparative question.

Registration closed on Sunday, 26 April 2020


The topics for Tasks 2 will be send to each team via email upon completed registration. The topics will be provided as XML files.

Example topic for Task 2:

      <title>Which is better, laptop or desktop?</title>
      <description>A user wants to buy a new PC but has no prior preferences. They want to find arguments that show in what personal situation what kind of machine is preferable. This can range from situations like frequent traveling where a mobile device is to be favored to situations of a rather "stationary" gaming desktop PC.</description>
      <narrative>Highly relevant documents will describe what the major similarities and dissimilarities of laptops and desktops are along with the respective advantages and disadvantages for specific usage scenarios. A comparison of the technical and architectural characteristics without a personal opinion, recommendation or pros/cons is not relevant.

Document collections

Task 2 will use the ClueWeb12 corpus; you may index the ClueWeb12 with your favorite retrieval system. To ease participation, you may also directly use the ChatNoir search engine's API for a baseline retrieval. You will receive credentials to access the ChatNoir API upon a completed registration.


For Task 2, you should ideally retrieve documents that comprise convincing argumentation for or against one option or the other. Our human assessors will judge the ranked lists of the retrieved documents based on three dimensions: (1) document relevance, (2) whether a sufficient argumentative support is provided (more information on support: paper), and (3) trustworthiness and credibility of the web documents and arguments (more information on credibility: paper).


We encourage participants to use TIRA for their submissions to allow for a better reproducibility. Please also have a look at the dedicated TIRA tutorial for Touché—in case of problems we will be able to assist you. Even though the preferred way of run submission is TIRA, in case of problems you may also submit runs via email. We will try to quickly review your TIRA or email submissions and provide feedback.

Runs may be either automatic or manual. An automatic run does not use the topic descriptions or narratives and must not "manipulate" the topic titles via manual intervention. A manual run is anything that is not an automatic run. Upon submission, please let us know which of your runs are manual. For each topic, include up to 1,000 retrieved documents.

The submission format for the task will follow the standard TREC format:

qid Q0 doc rank score tag


  • qid: The topic number.
  • Q0: Unused, should always be Q0.
  • doc: The document ID returned by your system for the topic qid:
    • For Task 2: Use the official ClueWeb12 ID.
  • rank: The rank the document is retrieved at.
  • score: The score (integer or floating point) that generated the ranking. The score must be in descending (non-increasing) order. It is important to handle tied scores (trec_eval sorts documents by the score values and not your rank values).
  • tag: A tag that identifies your group and the method you used to produce the run.
The fields should be separated by a whitespace. The individual columns' widths are not restricted (i.e., score can be an arbitrary precision that has no ties) but it is important to include all columns and to separate them with a whitespace.

An example run for Task 2 is:

1 Q0 clueweb12-en0010-85-29836 1 17.89 myGroupMyMethod
1 Q0 clueweb12-en0010-86-00457 2 16.43 myGroupMyMethod
1 Q0 clueweb12-en0010-86-09202 3 16.32 myGroupMyMethod


team results
Tag nDCG@5
Bilbo Baggins ul_t2_voelkerschlacht 0.578
Inigo Montoya MLU_Gruppe_2 0.569
Katana MyBaselineFilterResponse 0.565
Katana Baseline_CAM_OBJ 0.554
Frodo Baggins t2_bach_default_old 0.544
Frodo Baggins ir_t2_bach 0.451
Zorro UvATask2SVM 0.446
Katana myBertSimilarity 0.405
Katana ULMFIT_LSTM 0.200

Task Committee