MENU

QA2Text Dataset

This is the QA2Text dataset. This dataset is built to extract competent triples for knowledge graphs from a triple set which is extracted by the OpenIE tool. Here, we utilize a QA dataset for reference question and answer entities. The triples are extracted by MinIE tools. In this dataset, the triple's confidence value, head, relation, and tail part are taken from this tool's output. Here, we use two types of labeling. One is "Competent" triple (suitable for a knowledge graph) and another one is "Incompetent" triple (not suitable for a knowledge graph).

Details

QA2Text dataset contains 61500 triples and this dataset is available in JSON format. The total number of competent triples in this dataset is 1089. Here, each triple is stored as a JSON in a new line. An example entry is shown below:

{
"ques": "what kind of money to take to Bahamas?",
        "ans": ["Bahamian dollar"],
        "sen": "The Bahamas has its own currency called the Bahamian dollar,
	       but when I visited I'm pretty sure I just used US dollars for every cash transaction.",
       "triple": {
           "con_val": 0.92,
           "head": "The Bahamas",
           "rel": "has",
           "tail": "its own currency called the Bahamian dollar",
           "t_token": [
               "Bahamas",
               "own",
               "currency",
               "Bahamian",
               "dollar"
           ]
       },
       "q_token": [
           "kind",
           "money",
           "Bahamas"
       ],
       "label": "Competent"
}

  • `ques` reference question under which sentence is extracted
  • `triple` extracted triple using MinIE tool from the given sentence
  • `con_val` confidence value generated by MinIE
  • `head` extracted triple's head
  • `rel` extracted triple's related part
  • `tail` extracted triple's tail
  • `t_token` tokens found from the whole triple part (head; rel; tail)
  • `q_token` tokens found from the question
  • `label` ground-truth label (competent/incompetent)

License

$Beb%Cec%'eb%#eb!

This dataset is licensed under Attribution-NonCommercial-ShareAlike 4.0 International

If you use this dataset for research purpose, please cite our paper[1].

Download

Contact

Esrat Farjana: mail address
Ryutaro Ichise: mail address

Reference

  1. Esrat Farjana, Natthawut Kertkeidkachorn, and Ryutaro Ichise. "Competent Triple Identification for Knowledge Graph Completion under the Open-World Assumption" IEICE Transactions on Information and Systems, vol. E105.D, No. 3, pp. 646-655, 2022.