## '3C' Citation Context Classification Task

Recent years have witnessed a massive increase in the amount of scientific literature and research data being published online, providing revelation about the advancements in the field of different domains. The introduction of aggregator services like CORE [1] has enabled unprecedented levels of open access to scholarly publications. The availability of full text of the research documents facilitates the possibility of extending the bibliometric studies by identifying the context of the citations [2]. The shared task organized as part of the WOSP 2020 focuses on classifying citation context in research publications based on their influence and purpose.

Subtask A: Multiclass classification of citations into one of six classes; Background, Uses, Compare_Contrast, Motivation, Extension and Future.

Subtask B: Subtask A: Binary classification of citations based on the classes, Incidental and Influential, a task for identifying the importance of a citation. Given a citation context, the participants are required to predict the intent of the citations.

#### Dataset

The participants will be provided with a labeled dataset of 3000 instances annotated using the ACT platform [3].

The dataset is provided in csv format and contains the following fields:

Unique Identifier
COREID of Citing Paper
Citing Paper Title
Citing Paper Author
Cited Paper Title
Cited Paper Author
Citation Context
Citation Class Label
Citation Influence Label

Each ciation context in the dataset contains the label, "#AUTHOR_TAG", which represents the citation that is being considered. All other fields in the dataset corresponds to the values associated with the #AUTHOR_TAG. The possible values of the citation_class_label are:

0 - BACKGROUND
1 - COMPARES_CONTRASTS
2 - EXTENSION
3 - FUTURE
4 - MOTIVATION
5 - USES

and that of citation_influence_label are:

0 - INCIDENTAL
1 - INFLUENTIAL

The following table illustrates a sample training dataset format.

 unique_id 1998 core_id 81605842 citing_title Everolimus improves behavioral deficits in a patient with autism associated with tuberous sclerosis: a case report citing_author Ryouhei Ishii cited_title Learning disability and epilepsy in an epidemiological sample of individuals with tuberous sclerosis complex cited_author Joinson citation_context West syndrome (infantile spasms) is the commonestc epileptic disorder, which is associated with more intellectual disability and a less favorable neurological outcome (#AUTHOR_TAG et al, 2003) citation_class_label 4 citation_influence_label 1

The ACL-ARC dataset [4], which is compatible to our ACT dataset could be used by the participants during the competition.

#### Evaluation

The evaluation will be conducted using the withheld test data containing 1000 instances. The evaluation metric used will be the F1-macro score.

$$\mbox{F1-macro} = {\frac{1}{n} \sum_{i=1}^{n}{\frac{2 \times P_i \times R_i}{P_i + R_i}}}$$

#### Team Registration

This year, we are hosting the shared task on the Kaggle inClass platform. Please note that both subtasks will be hosted as separate competitions on kaggle. Please make sure you sign in/register to kaggle before clicking the following links.

To participate in the Shared Task:

Automated subject area classification has not yet been widely used by digital libraries, due to lack of easily accessible multi-disciplinary training data and of high-quality machine learning models that could provide accurate labels regardless of application domain. The purpose of the shared task is to provide a platform for developing and evaluating such datasets and models.

For this task we will provide a training set that the participants can use to train their models and a test set on which we will evaluate their submissions. The most successful submissions as assessed on the test set will be invited to present their approach at the workshop.

Task: Multiclass classification of publications into one of seven classes; Algorithms, Applications, Hardware, Network-Models, Neuron-Models, Supporting-Systems and Synapse-Models.

#### Dataset

The dataset is provided in two CSV files one containing the following:

Paper ID
Year
Category
Subcategory

and the second containing bibliographic information for the papers (Title, Author, Journal, etc.)

#### Evaluation

The evaluation will be conducted using an additional dataset containing unlabelled instances. These will be provided to the participants one week before the final submission date. The evaluation metric will include Average Precision (AP) and macro-averaged F1 (F1-macro) score.

## Submission Guidelines

All submissions are done as a team. There is no limit on the number of participants in each team. All teams are allowed to submit a maximum of 5 runs to the competition for the final evaluation phase. Each team can participate in any of the tasks or all of them. The submission files need to be in CSV format with the following fields:

unique_id, citation_class_label
unique_id, citation_influence_label

#### Registering in EasyChair

For paper and code submission, please use EasyChair. First, register your team in easychair using the following steps:

1. Login as an author to EasyChair
2. For citation context classification based on purpose/influence, choose the track WOSP 2020 -- Shared task 1: Citation Classification
3. For subject area classification, choose the track WOSP 2020 -- Shared Task 2: Subject Classification
4. Add the author details - These are the team members
5. Based on the task, your submission title should be one of:
1. [kaggle_team_name]_WOSP2020_3C_citation_classification_[A]
2. [kaggle_team_name]_WOSP2020_3C_citation_classification_[B]
6. EasyChair requires a brief abstract and at least 3 keywords: add a simple tentative abstract here, this can be modified at any point
7. Submit

#### Submitting paper and code

1. Login as an author to EasyChair
2. Choose the track WOSP 2020 -- Shared task 1: Citation Classification or WOSP 2020 -- Shared Task 2: Subject Classification
4. Submit

## Important Dates

May 11, 2020 — Competition Start Date

June 22, 2020 — Competition End Date

July 05, 2020 July 10, 2020 — Paper and code submission deadline

July 13, 2020 Aug 02, 2020 — Camera-ready

August 5, 2020 — WOSP 2020

## References

1. Knoth, Petr, and Zdenek Zdrahal. "CORE: three access levels to underpin open access." D-Lib Magazine 18.11/12 (2012): 1-13.
2. Pride, David, and Petr Knoth. "Incidental or influential?–A decade of using text-mining for citation function classification." (2017).
3. Pride, David, Petr Knoth, and Jozef Harag. "ACT: An Annotation Platform for Citation Typing at Scale." 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL). IEEE, 2019.
4. Jurgens, David, et al. "Measuring the evolution of a scientific field through citation frames." Transactions of the Association for Computational Linguistics 6 (2018): 391-406.