Organised by:

WOSP Shared Tasks

'3C' Citation Context Classification Task

Recent years have witnessed a massive increase in the amount of scientific literature and research data being published online, providing revelation about the advancements in the field of different domains. The introduction of aggregator services like CORE [1] has enabled unprecedented levels of open access to scholarly publications. The availability of full text of the research documents facilitates the possibility of extending the bibliometric studies by identifying the context of the citations [2]. The shared task organized as part of the WOSP 2020 focuses on classifying citation context in research publications based on their influence and purpose.

Subtask A: Multiclass classification of citations into one of six classes; Background, Uses, Compare_Contrast, Motivation, Extension and Future.

Subtask B: Subtask A: Binary classification of citations based on the classes, Incidental and Influential, a task for identifying the importance of a citation. Given a citation context, the participants are required to predict the intent of the citations.

Dataset for 3C Shared Task

The participants will be provided with a labeled dataset of 3000 instances annotated using the ACT platform[3].

The dataset is provided in csv format and contains the following fields:

  • Unique Identifier
  • COREID of Citing Paper
  • Citing Paper Title
  • Citing Paper Author
  • Cited Paper Title
  • Cited Paper Author
  • Citation Context
  • Citation Class Label
  • Citation Influence Label

Each ciation context in the dataset contains the label, "#AUTHOR_TAG", which represents the citation that is being considered. All other fields in the dataset corresponds to the values associated with the #AUTHOR_TAG. The possible values of the citation_class_label are:

0 - BACKGROUND
1 - COMPARES_CONTRASTS
2 - EXTENSION
3 - FUTURE
4 - MOTIVATION
5 - USES

and that of citation_influence_label are:

0 - INCIDENTAL
1 - INFLUENTIAL

The following table illustrates a sample training dataset format.

unique_id 1998
core_id 81605842
citing_title Everolimus improves behavioral deficits in a patient with
autism associated with tuberous sclerosis: a case report
citing_author Ryouhei Ishii
cited_title Learning disability and epilepsy in an epidemiological
sample of individuals with tuberous sclerosis complex
cited_author Joinson
citation_context West syndrome (infantile spasms) is the commonestc
epileptic disorder, which is associated with more
intellectual disability and a less favorable neurological outcome (#AUTHOR_TAG et al, 2003)
citation_class_label 4
citation_influence_label 1

The ACL-ARC dataset [4], which is compatible to our ACT dataset could be used by the participants during the competition.

Evaluation

The evaluation will be conducted using the withheld test data containing 1000 instances. The evaluation metric used will be the F1-macro score.


Subject Area Classification Task

Automated subject area classification has not yet been widely used by digital libraries, due to lack of easily accessible multi-disciplinary training data and of high-quality machine learning models that could provide accurate labels regardless of application domain. The purpose of the shared task is to provide a platform for developing and evaluating such datasets and models.

For this task we will provide a training set that the participants can use to train their models and a test set on which we will evaluate their submissions. The most successful submissions as assessed on the test set will be invited to present their approach at the workshop.

Task: Multiclass classification of publications into one of seven classes; Algorithms, Applications, Hardware, Network-Models, Neuron-Models, Supporting-Systems and Synapse-Models.

Dataset for Subject Area Classification Shared Task

The dataset is provided in two CSV files one containing the following:

  • Paper ID
  • Year
  • Category
  • Subcategory
  • Additional Subcategories
and the second containing bibliographic information for the papers (Title, Author, Journal, etc.)

Evaluation

The evaluation will be conducted using an additional dataset containing unlabelled instances. These will be provided to the participants one week before the final submission date. The evaluation metric will include Average Precision (AP) and macro-averaged F1 (F1-macro) score.


Team Registration

This year, we are hosting the shared task on the Kaggle inClass platform. Please note that both subtasks will be hosted as separate competitions on kaggle. Please make sure you sign in/register to kaggle before clicking the following links.

To participate in the Shared Task:


Submission Guidelines

All submissions are done as a team. There is no limit on the number of participants in each team. All teams are allowed to submit a maximum of 5 runs to the competition for the final evaluation phase. Each team can participate in any of the tasks or all of them. The submission files need to be in CSV format with the following fields:

For Subtask A:

unique_id,citation_class_label

For Subtask B:

unique_id, citation_influence_label

Upload your submission file using kaggle.

For paper and code submission, please use easychair.

First, register your team in easychair using the following steps:

  1. Login as an author to EasyChair.
  2. For both citation context classification based on purpose/influence, choose the track chair-(WOSP 2020 -- Shared task 1: Citation Classification)
  3. For subject area classification, choose the track chair (WOSP 2020 -- Shared Task 2: Subject Classification)
  4. Add the author details - These are the team members
  5. Based on the task, add the title as:
  6. [kaggle_team_name]_WOSP2020_3C_citation_classification_[A]

    OR

    [kaggle_team_name]_WOSP2020_3C_citation_classification_[B]

    OR

    team_name_WOSP2020_subject_area_classification_task

  7. EasyChair requires a brief abstract and at least 3 keywords: add a simple tentative abstract here, this can be modified at any point.
  8. Submit

To submit paper and code:

  1. Login as an author to EasyChair.
  2. Choose the track chair-(WOSP 2020 -- Shared task 1: Citation Classification)
  3. Click add a submission for uploading the paper and code.
  4. Submit

Important Dates

Research Track

May 18, 2020 July 05, 2020 — Paper submission deadline

June 22, 2020 July 26, 2020 — Paper acceptance notification

July 13, 2020 Aug 02, 2020 — Camera-ready

August 1-5, 2020 — JCDL 2020

Kaggle Competition Timeline for 3C Shared Task

May 11, 2020 — Competition Start Date

June 22, 2020 — Competition End Date


References

[1]. Knoth, Petr, and Zdenek Zdrahal. "CORE: three access levels to underpin open access." D-Lib Magazine 18.11/12 (2012): 1-13.

[2]. Pride, David, and Petr Knoth. "Incidental or influential?–A decade of using text-mining for citation function classification." (2017).

[3]. Pride, David, Petr Knoth, and Jozef Harag. "ACT: An Annotation Platform for Citation Typing at Scale." 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL). IEEE, 2019.

[4]. Jurgens, David, et al. "Measuring the evolution of a scientific field through citation frames." Transactions of the Association for Computational Linguistics 6 (2018): 391-406.

©8th International Workshop on Mining Scientific Publications.