WOSP 2020 Keynote Presentations
In the wake of the global outbreak of COVID-19, the workshop will be a fully online event.
The entire body of research literature is currently estimated at 100-150 million publications with an annual increase of around 1.5 million. Research literature constitutes the most complete representation of knowledge we have assembled as human species. It enables us to develop cures to diseases, solve difficult engineering problems and answer many of the world’s challenges we are facing today. Systematically reading and analysing the full body of knowledge is now beyond the capacities of any human being. Consequently, it is important to better understand how we can leverage Natural Language Processing/Text Mining techniques to aid knowledge creation and improve the process by which research is being done.
This workshop aims to bring together people from different backgrounds who:
The topics of the workshop will be organised around the following themes:
Topics of interest relevant to theme 1 include but are not limited to:
Topics of interest relevant to theme 2 include but are not limited to:
Topics of interest relevant to theme 3 include but are not limited to:
This year we would like to invite the workshop participants to take part in two special Shared Tasks.
'3C' Citation Context Classification Task
We are extremely excited to introduce the new Citation Context Classification '3C' task which will require teams to build models for the classification of citations according to purpose and influence. A brand new dataset of annotated citations will be released for test / training purposes. All teams of all levels are welcome to submit their entries.
For more information, please click 3C Shared Task
Subject Classification Shared Task
We are also excited to announce a shared task on subject area classification. This task will require teams to build models that identify which subject area publications fall into. An expert curated dataset will be released for test / training purposes. All teams of all levels are welcome to submit their entries.
For more information, please click Subject Classification Task
WOSP has been the first workshop to address specifically the topic of mining scientific papers at a major conference. The 6 previous instances of WOSP were held in conjunction with the JCDL conferences.
We have also organised the Workshop on Scholarly Web Mining (SWM 2017), which was associated with WSDM 2017 in Cambridge, UK. The proceedings of the SWM 2017 workshop are available here.
This year, we have joined forces with the BIRNDL workshop to organize the First Workshop on Scholarly Document Processing (SDP 2020) which will be held in conjunction with EMNLP 2020. The workshop features a research track and a shared task track.
All runs of the workshop have been extremely successful in terms of attracting submissions and participants from leading institutions in the area including Cambridge University, Microsoft, British Library, Elsevier, National Library of Medicine, Library of Congress, University of Pennsylvania (CiteSeerX), Know-Center Graz, University of Athens (OpenAIRE project) and Mendeley.
We invite submissions related to the workshop’s topics. Long papers should not exceed 10 pages and short papers should not exceed 4 pages in the JCDL (ACM) style. Furthermore, we welcome demo presentations of systems or methods. A demonstration submission should consist of a maximum two-page description of the system, method or tool to be demonstrated.
The ACM proceedings template can be found here ACM Template. Papers should be submitted using EasyChair. Papers do not need to be anonymized for review.
All papers will be reviewed for correctness, originality, technical strength, quality of presentation, and relevance to the workshop topics of interest by three reviewers.
July 05, 2020 July 10, 2020 — Paper
submission deadline
June 22, 2020 July 26, 2020 — Paper
acceptance notification
July 13, 2020 Aug 02, 2020 —
Camera-ready
August 5, 2020 — WOSP 2020
May 11, 2020 — Competition Start Date
June 22, 2020 — Competition End Date
July 05, 2020 July 10, 2020 — Paper
and code submission deadline
June 22, 2020 July 26, 2020 —
Shared task acceptance notification
July 13, 2020 Aug 02, 2020 —
Camera-ready
August 5, 2020 — WOSP 2020
The Special Case of Scientific Argumentation: Analyzing Scitorics
The exponential growth in the number of scientific publications yields the need for automatically understanding scientific text. However, the complex nature of scientific literature requires attention on the domain- and community-specific rhetorical aspects of scientific writing, which we collectively dub "scitorics". In this talk, we touch on the special case of scientific argumentation by presenting our work on analyzing scitorics in computer graphics literature. We investigate the link between the argumentative structure of publications and rhetorical layers, such as discourse categories and citation contexts. To this end, we (1) augment a corpus of scientific publications annotated with four layers of rhetoric annotations with argumentation annotations and (2) investigate neural multi-task learning architectures combining argument extraction with rhetorical classification tasks. Finally, we (3) present ArguminSci, a tool enabling for multi-layered analysis of scientific publications.
Supporting Systematic Reviews in Medicine
Systematic Reviews synthesise the results of multiple clinical trials to obtain a more significant result. While systematic reviews are essential for evidence-based medicine, they have the disadvantage of requiring a large amount of time to prepare. In this talk, I describe how computers can support the creation of Systematic Reviews in medicine, and the challenges to be solved in improving this support.
Mitigating document collection biases with citations: A case study on CORD-19
With the broad adoption of data science in decision making processes, recent years have witnessed more frequent examples where biases in the datasets or the analytical algorithms lead to unfortunate and sometimes harmful outcomes. Being mindful of potential biases and actively taking measures to mitigate them have become a necessary second nature for data scientists and decision makers alike. Citations in scholarly publications have long been known to represent the crowd-sourced collective judgments on scientific reports and can be a valuable source of information in analyzing scholarly documents. This study describes a methodology that uses citations to identify biases in the COVID-19 Open Research Dataset, or CORD-19, a document collection created to advance the development of intelligent technologies that can assist scientists in navigating through the voluminous literature of COVID-19. By expanding to articles in the citation networks seeded by CORD-19 with three distinct algorithms, it can be shown that CORD-19 has a strong tilt in favor of recent articles and uneven coverages in the topical fields and the publication venues. Using CORD-19 to identify critical knowledge and assess the journal importance, for example, will lead to different conclusions from the analyses based on the three expanded datasets, of which results largely agree with one another. CORD-19, however, does not appear to exhibit biases in describing research collaborations in terms of team sizes or geolocations. Currently, the three citation network traversal algorithms only utilize bibliographic records. How improvements can be made to them, such as through more sophisticated uses of citation contexts, will also be discussed.
LBD: Beyond the ABCs
In this workshop, Neil’s keynote speech will cover the recent research, his lab, The Smalheiser Laboratory, is involved in, some of which is aligned to the theme of the WOSP 2020. Niel would be discussing the following topics briefly: (1). Infrastructure for accessing scientific publications: The Citation Cloud surrounding a biomedical article, a visualization tool to enable citation analysis by anyone. (2). Information extraction and text mining approaches: An automated probabilistic tagger for publication types and study designs of biomedical articles. (3). Analysing large databases of scientific publications for identifying high impact research: New models of LBD in light of the scientific reproducibility crisis.
Citation Classification for Behavioral Analysis of a Scientific Field
Citations are an important indicator of the state of a scientific field, reflecting how authors frame their work, and influencing uptake by future scholars. In this talk, I'll describe the development of a new method for analyzing the purpose of citations and a large-scale behavioral study of citations on their framing and uptake. I will demonstrate how authors are sensitive to discourse structure and publication venue when citing and that how a paper cites related work is predictive of its citation count. Finally, I will use changes in citation roles to show that the field of NLP has undergone a systematic change in its citation practices to become a rapid discovery science.
Please note, all sessions listed below are in UTC+1.
For times in other time zones please see the JCDL website.
9:00-10:30 — Session 1 |
|
9:00-9:10 |
|
9:10-9:40 |
Keynote Talk The Special Case of Scientific Argumentation: Analyzing Scitorics Anne Lauscher |
9:40-9:55 |
Short paper Representing and Reconstructing PhySH: Which Embedding Competent? Xiaoli Chen and Zhixiong Zhang |
9:55-10:10 |
Short paper The Normalized Impact Index for Keywords in Scholarly Papers to Detect Subtle Research Topics Daisuke Ikeda, Yuta Taniguchi and Kazunori Koga |
10:10-10:25 |
Short paper Term-Recency for TF-IDF, BM25 and USE Term Weighting Divyanshu Marwah and Joeran Beel |
10:30-11:00 |
Break |
11:00-12:30 — Session 2 |
|
11:00-11:30 |
Supporting Systematic Reviews in Medicine Allan Hanbury |
11:30-11:50 |
Long paper Synthetic vs. Real Reference Strings for Citation Parsing, and the Importance of Re-training and Out-Of-Sample Data for Meaningful Evaluations: Experiments with GROBID, GIANT and CORA Mark Grennan and Joeran Beel |
11:50-12:10 |
Long paper Virtual Citation Proximity (VCP): Empowering Document Recommender Systems by Learning a Hypothetical In-Text Citation-Proximity Metric For Uncited Documents Paul Molloy, Joeran Beel and Akiko Aizawa |
12:10-12:25 |
Short paper SmartCiteCon: Implicit Citation Context Extraction from Academic Literature Using Supervised Learning Chenrui Guo, Haoran Cui, Li Zhang, Jiamin Wang, Wei Lu and Jian Wu |
12:30-16:00 — JCDL Keynote + Break |
|
16:00-17:30 — Session 3 |
|
16:00-16:30 |
Mitigating document collection biases with citations: A case study on CORD-19 Kuansan Wang |
16:30-16:50 |
Shared Task overview Overview of the 2020 WOSP 3C Citation Context Classification Task Suchetha N. Kunnath, David Pride, Bikash Gyawali and Petr Knoth |
16:50-17:00 |
Shared Task paper Combining Representations For Effective Citation Classification Claudio Moisés Valiense de Andrade and Marcos André Gonçalves |
17:00-17:10 |
Shared Task paper Find influential articles in a dataset Paul Larmuseau |
17:10-17:20 |
Shared Task paper Scubed at 3C task A - A simple baseline for citation context purpose classification Shubhanshu Mishra and Sudhanshu Mishra |
17:20-17:30 |
Shared Task paper Amrita_CEN_NLP @ WOSP 3C Citation Context Classification Task B. Premjith and K. P. Soman |
17:30-18:00 |
Break |
18:00-19:30 — Session 4 |
|
18:00-18:30 |
LBD: Beyond the ABCs Neil R. Smalheiser |
18:30-18:50 |
Long paper Citations Beyond Self Citations: Identifying Authors, Affiliations, and Nationalities in Scientific Papers Yoshitomo Matsubara and Sameer Singh |
18:50-19:20 |
Keynote talk Citation Classification for Behavioral Analysis of a Scientific Field David Jurgens |
19:20-19:30 |
Closing |
Petr Knoth, Knowledge Media institute, The Open University, UK
Christopher Stahl, Oak Ridge National Laboratory, USA
Bikash Gyawali, Knowledge Media institute, The Open University, UK
David Pride, Knowledge Media institute, The Open University, UK
Drahomira Herrmannova, Oak Ridge National Laboratory, USA
Suchetha N. Kunnath, Knowledge Media institute, The Open University, UK
Sepideh Mesbah, Delft University of Technology, Netherlands
Akiko Aizawa, National Insutitute of Informatics, Japan
Marc Bertin, Université Claude Bernard Lyon 1, France
Federico Nanni, University of Mannheim, Germany
Saeed-Ul Hassan, Information Technology University, Pakistan
José Borbinha, Universidade de Lisboa, Portugal
Radim Hladik, Institute of Philosophy of the Czech Academy of Sciences, Czech Republic
Tirthankar Ghosal, Indian Institute of Technology Patna, India
Martin Klein, Los Alamos National Laboratory, USA
Wojtek Sylwestrzak, ICM Univeristy of Warsaw, Poland
Paolo Manghi, ISTI-CNR, Italy
Jian Wu, Old Dominion University, USA
Roman Kern, Graz University of Technology, Austria
Monica Ihli, University of Tennessee, USA
Antoine Isaac, Europeana, The Netherlands
Birger Larsen, Aalborg University Copenhagen, Denmark
Peter Mutschke, GESIS Leibniz Institute for the Social Sciences, Germany
Francesco Osborne, The Open University, UK
Robert M. Patton, Oak Ridge National Laboratory, USA
Eloy Rodrigues, Universidade do Minho, Portugal
Pravallika Devineni, Oak Ridge National Laboratory, USA
Vetle Torvik, University of Illinois, USA
Virtual workshop collocated with Joint Conference on Digital Libraries (JCDL 2020)
Wuhan, Hubei Province, China
© 8th International Workshop on Mining Scientific Publications. Design based on CSS Templates For Free.