JCDL 2017:6th International Workshop On Mining Scientific Publications (WOSP 2017)

Organised by:

Tweets by @wospworkshop

1. INTRODUCTION

Digital libraries that store scientific publications are becoming increasingly central to the research process. They are not only used for traditional tasks, such as finding and storing research outputs, but also as a source for discovering new research trends or evaluating research excellence. With the current growth of scientific publications deposited in digital libraries, it is no longer sufficient to provide only access to content. To aid research, it is especially important to leverage the potential of text and data mining technologies to improve the process of how research is being done.

This workshop aims to bring together people from different backgrounds who:

are interested in analysing and mining databases of scientific publications

develop systems that enable such analysis and mining of scientific databases (especially those who run databases of publications)

who develop novel technologies that improve the way research is being done.

2. TOPICS

The topics of the workshop will be organised around the following themes:

The whole ecosystem of infrastructures including repositories, aggregators, text-and data-mining facilities, impact monitoring tools, datasets, services and APIs that enable analysis of large volumes of scientific publications and surrounding issues, such as interoperability and data sharing.
Semantic enrichment of scientific publications by means of text and data mining, crowdsourcing or other methods.
Analysis of large databases of scientific publications to identify research trends, high impact, cross-fertilisation between disciplines, research excellence etc.

Topics of interest relevant to theme 1 include but are not limited to:

Infrastructures including repositories, aggregators, text-and data-mining facilities, impact monitoring tools, datasets, services and APIs for accessing scientific publications and/or research data.
Interoperability issues in research TDM workflows
around integration of cutting-edge tools in production systems

Topics of interest relevant to theme 2 include, but are not limited to:

Information extraction and text-mining applied to scholarly data
Automatic categorization and clustering of scholarly data
Approaches to information retrieval of academic publications
Academic recommender systems
Models for semantically representing and annotating publications (ontologies, interoperability issues, etc.)
Literature-based discovery
(Reproducible) text and data mining workflows for scientific publications
Scholarly knowledge graphs

Topics of interest relevant to theme 3 include, but are not limited to:

Measuring impact of publications (bibliometrics, webometrics, altmetrics, semantometrics)
Higher-level impact metrics to assess performance of researchers, departments, universities, etc.
Analysing research collaboration networks
Methods for identifying research trends and cross-fertilization between research disciplines.
Application and case studies of mining from scientific databases and publications.

3. SPECIAL OPEN PUBLICATIONS DATASET TRACK

We would like to invite the workshop participants to makes use of the CORE publications dataset containing large volume of research publications from a wide variety of research areas. The dataset contains not only full-texts, but also an enriched version of publications' metadata. This dataset provides a framework for developing and testing methods and tools addressing the workshop topics. The use of this dataset is not mandatory, however it is encouraged. The dataset is available through the CORE portal: here

4. EXPECTED AUDIENCE

The workshop on Mining Scientific Publications aims to bring together researchers, digital library developers and practitioners from government and industry to address the current challenges in the domain of mining scientific publications.

5. PREVIOUS ORGANISATION

The 1st International Workshop on Mining Scientific Publications was held in conjunction with JCDL 2012. The 2nd run of this workshop was held in conjunction with JCDL 2013. The 3rd run was associated with DL 2014 in London. The 4th run took place together with JCDL 2015. Finally, the 5th run of this workshop was associated JCDL 2016. All runs of the workshop have been extremely successful in terms of attracting submissions and participants from leading institutions in the area including Cambridge University, Microsoft, British Library, Elsevier, National Library of Medicine, Library of Congress, University of Pennsylvania (CiteSeerX), Know-Center Graz, University of Athens (OpenAIRE project) and Mendeley.

6. FORMAT

We plan this workshop as a one whole-day event. The workshop is organized this year for the fifth time (the four previous workshops were also in association with JCDL) and is planned to take place yearly. The workshop will consist of two invited talks, a series of presentations followed by a short discussion, a short work in groups session dedicated to addressing specific issues in the field and a final round table discussion at the end of the day. The workshop participants will be also encouraged to visit and experience demonstrations that will be presented during coffee breaks. In the evening, the workshop participants will have the possibility to attend an informal dinner.

7. SUBMISSION FORMAT

We invite submissions related to the workshop's topics. Long papers should not exceed 8 pages and short papers should not exceed 4 pages of the ACM style. Furthermore, we welcome demo presentations of systems or methods. A demonstration submission should consist of a maximum two-page description of the system, method or tool to be demonstrated. All submissions will be uploaded to EasyChair for a peer-review.

Papers should be submitted using the EasyChair system provided here:

Successful submissions will be published as a special issue in the D-Lib journal . See previous proceedings at here

8. PEER REVIEW

All submissions will be peer-reviewed and meta-reviewed by members of the Programme Committee. Each publication will be assigned a score and the best publications will be selected. In this sense, the process will be the same as in the last years.

9. PUBLICATION

This year, we have applied for publishing accepted short and full papers in the ACM International Conference Proceedings Series (ICPS). We are currently awaiting ACM's decision on the matter.

The proceedings of the special issues from the last years are available at:

D-Lib July/August 2012 contents

D-Lib September/October 2013 contents

D-Lib November/December 2014 contents

D-Lib November/December 2015 contents

D-Lib September/October 2016 contents

10. KEYNOTE SPEAKERS

Waleed Ammar, Allen Institute for Artificial Intelligence
Waleed Ammar is the research team lead for semanticscholor.org. He develops models for converting natural language text into structured representations, with a special focus on scientific publications. Before doing his Ph.D. at Carnegie Mellon University, Waleed was an SDE2 at Microsoft Research, web developer at eSpace Technologies, and teaching assistant at Alexandria University. He was awarded the Google PhD fellowship award and two Microsoft Research Tech Transfer awards.

Towards a more efficient, less painful discovery of scientific research findings

How do we help scientists find their needle in a haystack of scientific publications? In this talk, I will first give an overview on several projects we're working on at the Allen Institute for Artificial Intelligence to address this question, including advances in ranking, figure extraction, metadata extraction, document similarity and question answering. Then, I will describe the literature graph, our approach to capture semantics via a symbolic representation of the scientific literature, and discuss preliminary results and future work.

Jevin D. West, University of Washington
Jevin West is an Assistant Professor at the Information School at the University of Washington and co-director of the DataLab. He develops tools and methods for reading the literature at the scale of millions of publications. These tools include auto-categorization approaches, network visualization designs, and recommender systems. Using these tools, he investigates biases in science, the origin of ideas and disciplines, and reward structures in science. He co-founded several research projects around these ideas including Eigenfactor.org and Viziometrics.org.

Viziometrics: building a figure-centric search engine for the scholarly literature

Figures are a primary mode for communicating scientific results, yet little has been done to extract and analyze this information at scale. Most of the work in mining the literature has been on full text, citations, or metadata associate with an article. These visual objects are information dense and complex, but as the saying goes, worth a thousand words. In this talk, I will present some methods for extracting this information and provide some ways that this information can be used for better searching the scholarly literature and for asking basic questions around visual communication and impact.

11. IMPORTANT DATES

~~Sunday, 23rd April 2017 11:59 (Hawaii time) - Submission deadline~~
~~Friday, 5th May 2017 11:59 (Hawaii time) - Extended Submission deadline~~
~~Thursday, 18th May 2017 - Notification of acceptance~~
~~Monday, 12th June 2017 - Camera-ready~~
~~Monday, 19th June 2017 - Workshop~~

12. PROGRAM

9:00-9:10	Introduction
9:10-9:45	Keynote talk Towards a more efficient, less painful discovery of scientific research findings Waleed Ammar
9:45-10:05	Long paper Analyzing Semantic Concept Patterns to Detect Academic Plagiarism Norman Meuschke, Nicolas Siebeck, Moritz Schubotz and Bela Gipp
10:05-10:20	Short paper Investigating Convolutional Networks and Domain-Specific Embeddings for Semantic Classification of Citations Anne Lauscher, Goran Glavas, Simone Paolo Ponzetto and Kai Eckert
10:20-10:40	Long paper AppTechMiner: Mining Applications and Techniques from Scientific Articles Mayank Singh, Soham Dan, Sanyam Agarwal, Pawan Goyal and Animesh Mukherjee
10:40-11:10	Break
11:10-11:30	Long paper Word importance-based similarity of documents metric (WISDM) Viktor Botev, Kaloyan Marinov and Florian Schäfer
11:30-11:45	Short paper Audience Based View of Publication Impact Robert Patton, Drahomira Herrmannova, Christopher Stahl, Jack Wells and Thomas Potok
11:45-12:05	Long paper Multi-level mining and visualization of scientific text collections. Exploring a bilingual scientific repository Pablo Accuosto, Francesco Ronzano, Daniel Ferrés and Horacio Saggion
12:05-12:20	Demo paper Content Analytics Toolbench (CAT): a flexible single point of access for content enhancement and data analytics across massive corpora Ron Daniel and Michael Lauruhn
12:20-12:40	Long paper Rapid Tagging and Reporting for Functional Language Extraction in Scientific Articles Mahmood Ramezani, Vijay Kalivarapu, Stephen Gilbert, Sarah Huffman, Elena Cotos and Annette O'Connor
12:40-13:00	Invited talk Towards effective research recommender systems Petr Knoth
13:00-14:00	Lunch
14:00-14:35	Keynote talk Viziometrics: building a figure-centric search engine for the scholarly literature Jevin West
14:35-14:55	Long paper HyPRec: a Weighted Hybrid Approach for Scientific Paper Recommendation Anas Alzoghbi, Mostafa M. Mohamed, Omar Nada, Ibrahim Alshibani, Victor Anthony Arrascue Ayala and Georg Lausen
14:55-15:10	Short paper Comparing citation numbers between articles at two stages of a Model Organism Database curation workflow Michael Lauruhn and Gillian Millburn
15:10-15:30	Long paper Methods for Synthesis of Funding Agency & Publisher Data Monica Ihli
15:30-16:00	Break
16:00-16:20	Long paper Geographical Distribution of Biomedical Research in the USA Yingjun Guan, Jing Du and Vetle Torvik
16:20-16:35	Demo paper Iris.AI - Science Assistant Viktor Botev
16:35-16:50	Short paper A Discipline-Enriched Dataset for Tracking the Computational Turn of European Universities Federico Nanni and Giulia Paci
16:50-17:00	Closing

13. ORGANIZING COMMITTEE

Petr Knoth, Knowledge Media institute, The Open University, UK

Robert Patton, Oak Ridge National Laboratory, USA

Drahomira Herrmannova, Oak Ridge National Laboratory, USA

David Pride, Knowledge Media institute, The Open University, UK

Anita Khadka, Knowledge Media institute, The Open University, UK

14. PROGRAMME COMMITTEE

Iana Atanassova, CRIT, Université de Bourgogne Franche-Comté, France

Joeran Beel, Trinity College, University of Dublin, Ireland

Marc Bertin, Paris-Sorbonne University, France

Pável Calado, Instituto Superior Técnico, Universidade de Lisboa, Portugal

Tanmoy Chakraborty, University of Maryland, USA

Aristotelis Charalampous, KMi,The Open University, UK

Daniel Duma, University of Edinburgh, UK

Shang Gao, Oak Ridge National Laboratory, USA

Christopher G. Harris, SUNY Oswego, USA

Saeed Ul Hassan, Information Technology University, Pakistan

Antoine Isaac, Europeana & VU University Amsterdam, The Netherlands

Roman Kern, Graz University of Technology,Austria

Martin Klein, Los Alamos National Laboratory, USA

Birger Larsen, Aalborg University Copenhagen, Denmark

Paolo Manghi, ISTI-CNR, Italy

Bruno Martins, Instituto Superior Técnico, Universidade de Lisboa, Portugal

Philipp Mayr, GESIS - Leibniz Institute for the Social Sciences, Germany

Peter Mutschke, GESIS - Leibniz Institute for the Social Sciences, Germany

Franco Maria Nardini, ISTI-CNR, Italy

Francesco Osborne, KMi, The Open University, UK

John X. Qiu, Oak Ridge National Laboratory/University of Tennessee, USA

Eloy Rodrigues, Universidade do Minho, Portugal

Angelo Antonio Salatino, KMi, The Open University, UK

Pavel Smrz, Brno University of Technology, Czech Republic

Mike Thelwall, University of Wolverhampton, UK

Vetle Torvik, University of Illinois, USA

Michael T. Young, Oak Ridge National Laboratory, USA

12. LOCATION

University of Toronto

College View Ave, Toronto

Canada

6th International workshop On Mining Scientific Publications

Toronto, Canada

Navigation