Recent years have witnessed a massive increase in the amount of scientific literature and research data being published online, providing revelation about the advancements in the field of different domains. The introduction of aggregator services like CORE [1] has enabled unprecedented levels of open access to scholarly publications. The availability of full text of the research documents facilitates the possibility of extending the bibliometric studies by identifying the context of the citations [2]. The shared task organized as part of the WOSP 2020 focuses on classifying citation context in research publications based on their influence and purpose.
Subtask A: Multiclass classification of citations into one of six classes; Background, Uses, Compare_Contrast, Motivation, Extension and Future.
Subtask B: Subtask A: Binary classification of citations based on the classes, Incidental and Influential, a task for identifying the importance of a citation. Given a citation context, the participants are required to predict the intent of the citations.
The participants will be provided with a labeled dataset of 3000 instances annotated using the ACT platform [3].
The dataset is provided in csv format and contains the following fields:
Each ciation context in the dataset contains the label, "#AUTHOR_TAG", which represents the citation that is being considered. All other fields in the dataset corresponds to the values associated with the #AUTHOR_TAG. The possible values of the citation_class_label are:
and that of citation_influence_label are:
The following table illustrates a sample training dataset format.
unique_id | 1998 |
core_id | 81605842 |
citing_title | Everolimus improves behavioral deficits in a patient with autism associated with tuberous sclerosis: a case report |
citing_author | Ryouhei Ishii |
cited_title | Learning disability and epilepsy in an epidemiological sample of individuals with tuberous sclerosis complex |
cited_author | Joinson |
citation_context | West syndrome (infantile spasms) is the commonestc epileptic disorder, which is associated with more intellectual disability and a less favorable neurological outcome (#AUTHOR_TAG et al, 2003) |
citation_class_label | 4 |
citation_influence_label | 1 |
The ACL-ARC dataset [4], which is compatible to our ACT dataset could be used by the participants during the competition.
The evaluation will be conducted using the withheld test data containing 1000 instances. The evaluation metric used will be the F1-macro score.
$$\mbox{F1-macro} = {\frac{1}{n} \sum_{i=1}^{n}{\frac{2 \times P_i \times R_i}{P_i + R_i}}}$$This year, we are hosting the shared task on the Kaggle inClass platform. Please note that both subtasks will be hosted as separate competitions on kaggle. Please make sure you sign in/register to kaggle before clicking the following links.
To participate in the Shared Task:
Automated subject area classification has not yet been widely used by digital libraries, due to lack of easily accessible multi-disciplinary training data and of high-quality machine learning models that could provide accurate labels regardless of application domain. The purpose of the shared task is to provide a platform for developing and evaluating such datasets and models.
For this task we will provide a training set that the participants can use to train their models and a test set on which we will evaluate their submissions. The most successful submissions as assessed on the test set will be invited to present their approach at the workshop.
Task: Multiclass classification of publications into one of seven classes; Algorithms, Applications, Hardware, Network-Models, Neuron-Models, Supporting-Systems and Synapse-Models.
The dataset is provided in two CSV files one containing the following:
and the second containing bibliographic information for the papers (Title, Author, Journal, etc.)
The evaluation will be conducted using an additional dataset containing unlabelled instances. These will be provided to the participants one week before the final submission date. The evaluation metric will include Average Precision (AP) and macro-averaged F1 (F1-macro) score.
All submissions are done as a team. There is no limit on the number of participants in each team. All teams are allowed to submit a maximum of 5 runs to the competition for the final evaluation phase. Each team can participate in any of the tasks or all of them. The submission files need to be in CSV format with the following fields:
Upload your submission file using kaggle.
For paper and code submission, please use EasyChair. First, register your team in easychair using the following steps:
May 11, 2020 — Competition Start Date
June 22, 2020 — Competition End Date
July 05, 2020 July 10, 2020 — Paper and code submission deadline
June 22, 2020 July 26, 2020 — Shared task acceptance notification
July 13, 2020 Aug 02, 2020 — Camera-ready
August 5, 2020 — WOSP 2020
© 8th International Workshop on Mining Scientific Publications. Design based on CSS Templates For Free.