The entire body of research literature is currently estimated at 100-150 million publications with an annual increase of around 1.5 million. Research literature constitutes the most complete representation of knowledge we have assembled as human species. It enables us to develop cures to diseases, solve difficult engineering problems and answer many of the world’s challenges we are facing today. Systematically reading and analysing the full body of knowledge is now beyond the capacities of any human being. Consequently, it is important to better understand how we can leverage Natural Language Processing/Text Mining techniques to aid knowledge creation and improve the process by which research is being done.
This workshop aims to bring together people from different backgrounds who:
The topics of the workshop will be organised around the following themes:
Topics of interest relevant to theme 1 include but are not limited to:
Topics of interest relevant to theme 2 include but are not limited to:
Topics of interest relevant to theme 3 include but are not limited to:
This year we would like to invite the workshop participants to makes use of the CORE publications dataset containing over 8 million full texts of research papers from a wide variety of research areas. The dataset contains not only full-texts, but also an enriched version of publications' metadata. This dataset provides a framework for developing and testing methods and tools addressing the workshop topics. The use of this dataset is not mandatory, however it is encouraged. The dataset is available through the CORE portal.
In addition to offering the dataset we are also considering to run a shared task involving the use of the OpenMinTeD infrastructure for mining scientific papers.
WOSP has been the first workshop to address specifically the topic of mining scientific papers at a major conference. The 6 previous instances of WOSP were held in conjunction with the JCDL conferences.
Additionally, we have also organised the Workshop on Scholarly Web Mining (SWM 2017), which was associated with WSDM 2017 in Cambridge, UK. The proceedings of the SWM 2017 workshop are available here.
All runs of the workshop have been extremely successful in terms of attracting submissions and participants from leading institutions in the area including Cambridge University, Microsoft, British Library, Elsevier, National Library of Medicine, Library of Congress, University of Pennsylvania (CiteSeerX), Know-Center Graz, University of Athens (OpenAIRE project) and Mendeley.
We invite submissions related to the workshop’s topics. Long papers should not exceed 8 pages and short papers should not exceed 4 pages of the LREC style. Furthermore, we welcome demo presentations of systems or methods. A demonstration submission should consist of a maximum two-page description of the system, method or tool to be demonstrated. All submissions will be uploaded to the START system for a peer-review.
The LREC proceedings template can be found on the LREC website. Papers should be submitted using the START system.
Wednesday, March 7, 23:59 (Hawaii time) — Submission deadline
Wednesday, March 14, 23:59 (Hawaii time) — Extended submission deadline
Saturday, April 7 — Notification of acceptance
Saturday, April 21 — Camera-ready
Monday, May 7 — Workshop
Scientists worldwide are confronted with an exponential growth in the number of scientific documents being made available, for example: Elsevier publishes over 250K scientific articles per year (or one every two minutes) and has over 7 million publications; MedLine, the most important source in biomedical research, contains 21 million scientific references, and the World Intellectual Patent Organization (WIPO) contains some 70 million records. All this unprecedented volume of information complicates the task of researchers who are faced with the pressure of keeping up-to-date with discoveries in their own disciplines and with the challenge of searching for innovation, new interesting problems to solve, checking already solved problems or hypothesis, or getting information on past and current available methods, solutions or techniques. At the same time and with the rise of open science initiatives and social media, research is more connected and open creating new opportunities but also challenges for the scientific community.
In this scenario of scientific information overload, natural language processing has a key role to play. Over the past few years we have seen a number of tools for the analysis of the structure of scientific documents (e.g. transforming PDF to XML), methods for extracting keywords, or classifying sentences into argumentative categories being developed. However, deep analysis of scientific documents such as: finding key claims, assessing the argumentative quality and strength of the research, or summarizing the key contributions of a piece of work are less common. Besides, most research in scientific text processing is being carried out for the English language, neglecting both the share of scientific information available in other languages and the fact that scientific publications are many times bilingual.
In this talk, I will present work carried out in our laboratory towards the development of a system for “deep” analysis and annotation of scientific text collection. Originally for the English language, it has now being adapted to Spanish. After a brief overview of the system and its main components, I will present our current work on the development of a bi-lingual (Spanish and English) fully annotated text resource in the field of natural language processing that we have created with our system together with a faceted-search and visualization system to explore the created resource.
With this scenario in mind I will speculate on the challenges and opportunities that the scientific field brings to our community not only in terms of language but also from the point of view of social media and science education.
09:30-09:40 |
Introduction |
09:40-10:30 |
Keynote Talk Mining and Enriching Multilingual Scientific Text Collections: Current Challenges and Opportunities Horacio Saggion |
10:30-11:00 |
Break |
11:00-11:35 |
Long Paper Scithon™ - An evaluation framework for assessing research productivity tools Ronin Wu, Valentin Stauber, Viktor Botev, Jacobo Elosua, Anita Brede, Maria Ritola and Kaloyan Marinov |
11:35-12:10 |
Long Paper OpenMinTeD: A Platform Facilitating Text Mining of Scholarly Content Penny Labropoulou, Dimitris Galanis, Antonis Lempesis, Mark Greenwood, Petr Knoth, Richard Eckart de Castilho, Stavros Sachtouris, Byron Georgantopoulos, Stefania Martziou, Lucas Anastasiou, Katerina Gkirtzou, Natalia Manola and Stelios Piperidis |
12:10-12:45 |
Long Paper Studying Uncertainty in Science: a distributional analysis through the IMRaD structure Iana Atanassova, François-C. Rey and Marc Bertin |
12:45-13:00 |
Poster Presentation Exploring Textual and Social Hierarchies in Czech Sociological Articles Radim Hladik |
13:00-14:00 |
Lunch |
14:00-14:35 |
Long Paper Data-driven Summarization of Scientific Articles Nikola Nikolov, Michael Pfeiffer and Richard Hahnloser |
14:35-15:10 |
Long Paper Experiments in Detection of Implicit Citations Ahmed AbuRa'ed, Luis Chiruzzo and Horacio Saggion |
15:10-15:35 |
Short Paper Goal-Oriented Representation of Scientific Papers Jumana Nassour, Michael Elhadad and Arnon Strum |
15:35-16:00 |
Short Paper DeepPDF: A Deep Learning Approach to Extracting Text from PDFs Christopher Stahl, Steven Young, Drahomira Herrmannova, Robert Patton and Jack Wells |
16:00-16:30 |
Break |
16:30-17:05 |
Long Paper Investigating Domain Features For Scope Detection and Classification of Scientific Articles Tirthankar Ghosal, Ravi Sonam, Sriparna Saha, Asif Ekbal and Pushpak Bhattacharyya |
17:05-17:20 |
Demo Paper An End-to-End PDF Toolchain for Marking Up Scientific Documents Sanna Hulkkonen and Oliver Ray |
17:20-17:30 |
Closing |
Petr Knoth, Knowledge Media institute, The Open University, UK
Drahomira Herrmannova, Oak Ridge National Laboratory, USA
Richard Eckart de Castilho, Technische Universität Darmstadt, Germany
Iana Atanassova, Université de Bourgogne Franche-Comté, France
Joeran Beel, Trinity College, University of Dublin, Ireland
Marc Bertin, Université Claude Bernard Lyon 1, France
Debsindhu Bhowmik, Oak Ridge National Laboratory, USA
Johan Bollen, Indiana University, USA
José Borbinha, Universidade de Lisboa, Portugal
Tanmoy Chakraborty, University of Maryland, USA
Daniel Duma, Alan Turing Institute, UK
Shang Gao, Oak Ridge National Laboratory, USA
Stephen Gilbert, Iowa State University, USA
C. Lee Giles, Pennsylvania State University, USA
Christopher G. Harris, SUNY Oswego, USA
Saeed Ul Hassan, Information Technology University, Pakistan
Monica Ihli, University of Tennessee, USA
Antoine Isaac, Europeana, The Netherlands
Roman Kern, Graz University of Technology, Austria
Martin Klein, Los Alamos National Laboratory, USA
Birger Larsen, Aalborg University Copenhagen, Denmark
Paolo Manghi, Italian National Research Council, Italy
Bruno Martins, Universidade de Lisboa, Portugal
Philipp Mayr, GESIS Leibniz Institute for the Social Sciences, Germany
Peter Mutschke, GESIS Leibniz Institute for the Social Sciences, Germany
Francesco Osborne, The Open University, UK
Robert M. Patton, Oak Ridge National Laboratory, USA
Eloy Rodrigues, Universidade do Minho, Portugal
Angelo Antonio Salatino, The Open University, UK
Pavel Smrz, Brno University of Technology, Czech Republic
Christopher G. Stahl, Oak Ridge National Laboratory, USA
Wojtek Sylwestrzak, University of Warsaw, Poland
Dominika Tkaczyk, Trinity College Dublin, Ireland
Ziqi Zhang, Nottingham Trent University, UK
Miyazaki Prefecture
Miyazaki, Japan
©7th International Workshop on Mining Scientific Publications. Design based on CSS Templates For Free.