WOSP 2020: 8th International Workshop On Mining Scientific Publications (WOSP 2020)

Organised by:

Tweets by @wospworkshop

'3C' Citation Context Classification Task

Recent years have witnessed a massive increase in the amount of scientific literature and research data being published online, providing revelation about the advancements in the field of different domains. The introduction of aggregator services like CORE [1] has enabled unprecedented levels of open access to scholarly publications. The availability of full text of the research documents facilitates the possibility of extending the bibliometric studies by identifying the context of the citations [2]. The shared task organized as part of the WOSP 2020 focuses on classifying citation context in research publications based on their influence and purpose.

Subtask A: Multiclass classification of citations into one of six classes; Background, Uses, Compare_Contrast, Motivation, Extension and Future.

Subtask B: Subtask A: Binary classification of citations based on the classes, Incidental and Influential, a task for identifying the importance of a citation. Given a citation context, the participants are required to predict the intent of the citations.

Dataset

The participants will be provided with a labeled dataset of 3000 instances annotated using the ACT platform [3].

The dataset is provided in csv format and contains the following fields:

Unique Identifier

COREID of Citing Paper

Citing Paper Title

Citing Paper Author

Cited Paper Title

Cited Paper Author

Citation Context

Citation Class Label

Citation Influence Label

Each ciation context in the dataset contains the label, "#AUTHOR_TAG", which represents the citation that is being considered. All other fields in the dataset corresponds to the values associated with the #AUTHOR_TAG. The possible values of the citation_class_label are:

0 - BACKGROUND

1 - COMPARES_CONTRASTS

2 - EXTENSION

3 - FUTURE

4 - MOTIVATION

5 - USES

and that of citation_influence_label are:

0 - INCIDENTAL

1 - INFLUENTIAL

The following table illustrates a sample training dataset format.

unique_id	1998
core_id	81605842
citing_title	Everolimus improves behavioral deficits in a patient with autism associated with tuberous sclerosis: a case report
citing_author	Ryouhei Ishii
cited_title	Learning disability and epilepsy in an epidemiological sample of individuals with tuberous sclerosis complex
cited_author	Joinson
citation_context	West syndrome (infantile spasms) is the commonestc epileptic disorder, which is associated with more intellectual disability and a less favorable neurological outcome (#AUTHOR_TAG et al, 2003)
citation_class_label	4
citation_influence_label	1

The ACL-ARC dataset [4], which is compatible to our ACT dataset could be used by the participants during the competition.

Evaluation

The evaluation will be conducted using the withheld test data containing 1000 instances. The evaluation metric used will be the F1-macro score.

$$\mbox{F1-macro} = {\frac{1}{n} \sum_{i=1}^{n}{\frac{2 \times P_i \times R_i}{P_i + R_i}}}$$

Team Registration

This year, we are hosting the shared task on the Kaggle inClass platform. Please note that both subtasks will be hosted as separate competitions on kaggle. Please make sure you sign in/register to kaggle before clicking the following links.

To participate in the Shared Task:

For subtask A, please visit Kaggle citation purpose classification

For subtask B, please visit Kaggle citation influence classification

Subject Area Classification Task

Automated subject area classification has not yet been widely used by digital libraries, due to lack of easily accessible multi-disciplinary training data and of high-quality machine learning models that could provide accurate labels regardless of application domain. The purpose of the shared task is to provide a platform for developing and evaluating such datasets and models.

For this task we will provide a training set that the participants can use to train their models and a test set on which we will evaluate their submissions. The most successful submissions as assessed on the test set will be invited to present their approach at the workshop.

Task: Multiclass classification of publications into one of seven classes; Algorithms, Applications, Hardware, Network-Models, Neuron-Models, Supporting-Systems and Synapse-Models.

Dataset

The dataset is provided in two CSV files one containing the following:

Paper ID

Year

Evaluation

The evaluation will be conducted using an additional dataset containing unlabelled instances. These will be provided to the participants one week before the final submission date. The evaluation metric will include Average Precision (AP) and macro-averaged F1 (F1-macro) score.

Submission Guidelines

All submissions are done as a team. There is no limit on the number of participants in each team. All teams are allowed to submit a maximum of 5 runs to the competition for the final evaluation phase. Each team can participate in any of the tasks or all of them. The submission files need to be in CSV format with the following fields:

For Subtask A:

unique_id, citation_class_label

For Subtask B:

unique_id, citation_influence_label

Upload your submission file using kaggle.

Registering in EasyChair

For paper and code submission, please use EasyChair. First, register your team in easychair using the following steps:

Login as an author to EasyChair
For citation context classification based on purpose/influence, choose the track WOSP 2020 -- Shared task 1: Citation Classification
For subject area classification, choose the track WOSP 2020 -- Shared Task 2: Subject Classification
Add the author details - These are the team members
Based on the task, your submission title should be one of:

[kaggle_team_name]_WOSP2020_3C_citation_classification_[A]
[kaggle_team_name]_WOSP2020_3C_citation_classification_[B]
team_name_WOSP2020_subject_area_classification_task

EasyChair requires a brief abstract and at least 3 keywords: add a simple tentative abstract here, this can be modified at any point
Submit

Submitting paper and code

Login as an author to EasyChair
Choose the track WOSP 2020 -- Shared task 1: Citation Classification or WOSP 2020 -- Shared Task 2: Subject Classification
Click add a submission for uploading the paper and code
Submit

Important Dates

May 11, 2020 — Competition Start Date

June 22, 2020 — Competition End Date

July 05, 2020 July 10, 2020 — Paper and code submission deadline

~~June 22, 2020~~ July 26, 2020 — Shared task acceptance notification

~~July 13, 2020~~ Aug 02, 2020 — Camera-ready

August 5, 2020 — WOSP 2020

References

Knoth, Petr, and Zdenek Zdrahal. "CORE: three access levels to underpin open access." D-Lib Magazine 18.11/12 (2012): 1-13.
Pride, David, and Petr Knoth. "Incidental or influential?–A decade of using text-mining for citation function classification." (2017).
Pride, David, Petr Knoth, and Jozef Harag. "ACT: An Annotation Platform for Citation Typing at Scale." 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL). IEEE, 2019.
Jurgens, David, et al. "Measuring the evolution of a scientific field through citation frames." Transactions of the Association for Computational Linguistics 6 (2018): 391-406.

8th International Workshop On Mining Scientific Publications

Wuhan, China

Navigation

Organised by:

'3C' Citation Context Classification Task

Dataset

Evaluation

Team Registration

Subject Area Classification Task

Dataset

Evaluation

Submission Guidelines

Registering in EasyChair

Submitting paper and code

Important Dates

References