Organised by:

european library
open university


3rd International Workshop on Mining Scientific Publications will take place in conjunction with DL 2014 London, UK - 12th September 2014.


Digital libraries that store scientific publications are becoming increasingly important in research. They are used not only for traditional tasks such as finding and storing research outputs, but also as sources for mining this information, discovering new research trends and evaluating research excellence. The rapid growth in the number of scientific publications being deposited in digital libraries makes it no longer sufficient to provide access to content to human readers only. It is equally important to allow machines analyse this information and by doing so facilitate the processes by which research is being accomplished. Recent developments in natural language processing, information retrieval, the semantic web and other disciplines make it possible to transform the way we work with scientific publications. However, in order to make this happen, researchers first need to be able to easily access and use large databases of scientific publications and research data, to carry out experiments.

This workshop aims to bring together people from different backgrounds who:
(a) are interested in analysing and mining databases of scientific publications,
(b) develop systems, infrastructures or datasets that enable such analysis and mining,
(c) design novel technologies that improve the way research is being accomplished or
(d) support the openness and free availability of publications and research data.


The topics of the workshop will be organised around the following three themes:

  1. Infrastructures, systems, open datasets or APIs that enable analysis of large volumes of scientific publications.
  2. Semantic enrichment of scientific publications by means of text-mining, crowdsourcing or other methods.
  3. Analysis of large databases of scientific publications to identify research trends, high impact, cross-fertilisation between disciplines, research excellence and to aid content exploration.

Topics of interest relevant to theme 1 include, but are not limited to:

  • Systems, services, datasets or APIs for accessing scientific publications and/or research data. The existence of datasets, services, systems and APIs (in particular those that are open) providing access to large volumes of scientific publications and their metadata is an essential prerequisite for being able to research and develop new technologies that can transform the way people do research. We invite papers presenting new systems, services, APIs or datasets that enable people to access databases of scientific publications and carry out their analysis. Papers addressing Open Access are of a special interest. We also invite papers that discuss issues and current challenges in design of these systems or address the issues of accessing and managing scientific publications and/or research datasets.

Topics of interest relevant to theme 2 include, but are not limited to:

  • Novel information extraction and text-mining approaches to semantic enrichment of publications. This might range from mining publication structure, such as title, abstract, authors, citation information etc. to more challenging tasks, such as extracting names of applied methods, research questions (or scientific gaps), identifying parts of the scholarly discourse structure etc.
  • Automatic categorization and clustering of scientific publications. Methods that can automatically categorize publications according to an established subject-based classification/taxonomy (such as Library of Congress classification, UNESCO thesaurus, DOAJ subject classification, Library of Congress Subject Headings) are of particular interest. Other approaches might involve automatic clustering or classification of research publications according to various criteria.
  • New methods and models for connecting and interlinking scientific publications. Scientific publications in digital libraries are not isolated islands. Connecting publications using explicitly defined citations is very restrictive and has many disadvantages. We are interested in innovative technologies that can automatically connect and interlink publications or parts of publications, according to various criteria, such as semantic similarity, contradiction, argument support or other relationship types.
  • Models for semantically representing and annotating publications. This topic is related to aspects of semantically modeling publications and scholarly discourse. Models that are practical with respect to the state-of-the-art in Natural Language Processing (NLP) technologies are of special interest.
  • Semantically enriching/annotating publications by crowdsourcing. Crowdsourcing can be used in innovative ways to annotate publications with richer metadata or to approve/disapprove annotations created using text-mining or other approaches. We welcome papers that address the following questions: (a) what incentives should be provided to motivate users in contributing metadata, (b) how to apply crowdsourcing in the specialized domains of scientific publications, (c) what tasks in the domain of organising scientific publications is crowdsourcing suitable for and where it might fail, (d) other relevant crowdsourcing topics relevant to the domain of scientific publications.

Topics of interest relevant to theme 3 include, but are not limited to:

  • New methods, models and innovative approaches for measuring impact of publications. The most widely used metrics for measuring impact are based on citations. However, counting citations does not take into account the publication content and the qualitative nature of the citation. In addition, there is a delay between the publication and the measurable impact in citations. We in particular encourage papers addressing new ways of evaluating publications’ impact beyond standard citation measures.
  • New methods for measuring performance of researchers. Methods for assessing impact of a publication can often be extended to methods that can assess the impact of individual researchers. However, there are also other criteria for measuring impact in addition to publications, such as the development and publication of research data, economical and market impact, that should also be taken into account. We welcome papers addressing these aspects.
  • Evaluating impact of research groups. The same as for impact of individual researchers holds for research communities.
  • Methods for identifying research trends and cross-fertilization between research disciplines. Identifying research trends should allow discovering newly emerging disciplines or it should help to explain why certain fields are attracting the attention of a wider research community. Such monitoring is important for research funders and governments in order to be able to quickly respond to new developments. We invite papers discussing new methods for identifying trends and cross-fertilization between research disciplines using methods ranging from social network analysis and text- and data-mining to innovative visualization approaches.
  • Application of mining from scientific databases. New methods and models developed for mining from scientific publications can be applied in many different scenarios, such as improving access to scientific publications, providing exploratory search in digital collections (using novel interfaces, visualisations, etc.), identifying experts. We encourage papers describing innovative approaches that use scientific publications and data to solve real-world problems.


The workshop on Mining Scientific Publications aims to bring together researchers, digital library developers, practitioners from government and industry and open access enthusiasts to address the current challenges in the domain of mining scientific publications.


We invite submissions related to the workshop’s topics. Long papers should not exceed 8 pages and short papers should not exceed 4 pages of the ACM style. Furthermore, we welcome demo presentations of systems or methods. A demonstration submission should consist of a maximum two page description of the system, method or tool to be demonstrated.

Papers should be submitted using the easychair system provided here.

All submissions will be peer-reviewed and meta-reviewed by members of the Programme Committee. Each publication will be assigned a score and the best publications will be selected.


All accepted papers have been published as a special issue of the D-Lib Magazine. The guest editorial introduces the papers in this issue


The workshop will include keynote presentations of Johan Bollen, Indiana University and Kris Jack, Mendeley, Ltd.

Johan Bollen Studying scholarly communication from social media data.

Online social networking services play an increasingly important role in the private and public lives of hundreds of millions of individuals, capturing the most minute details of their whereabouts, thoughts, opinions, feelings, and activities, in real-time. Advances in social network analysis and natural language processing have enabled computational social science which leverages computational methods and large-scale data to develop models of individual and collective behavior to explain and predict a variety of economic, financial, and social phenomena. In this keynote, I will outline our recent efforts to leverage social media data to study online scholarly communication, with a particular focus on metrics of scholarly impact and indicators of how scholarly information travels through the scholarly community.

Johan Bollen is associate professor at the Indiana University School of Informatics and Computing where he is a member of the Center for Complex Networks and Systems and the Cognitive Science Program. He was formerly a staff scientist at the Los Alamos National Laboratory. He obtained his PhD in Experimental Psychology from the University of Brussels (VUB) in 2001. His present research interests are computational social science, web science, behavioral finance, and informetrics. In his free time he enjoys P90x and performs as DJ Angst in the local Bloomington clubs.

Kris Jack Mendeley's Research Catalogue: building it, opening it up and making it even more useful for researchers.

This presentation focusses on Mendeley's Research Catalogue. We'll look at the tools, algorithms and technologies employed in crowdsourcing the 80 million+ articles in the catalogue from over 2 million users throughout the world. We'll then look at how the catalogue is exposed through Mendeley's Developer Portal, enabling third parties to build their own research tools. Finally, we'll explore some of Mendeley's plans for the future to make the catalogue even richer and more useful for researchers.

Kris Jack is the Chief Data Scientist at Mendeley where he manages their R&D activities and leads the Data Science team. He is passionate about creating tools for researchers with a particular focus on helping researchers to make new discoveries.


May 26, 2013 June 2, 2013 - Submission deadline
June 23, 2013 - Notification of acceptance
July 7, 2013 - Camera-ready
July 26, 2013 - Workshop


8:00-8:10 Introduction
8:10-9:00 Kris Jack, Mendeley Ltd. Mendeley's Research Catalogue: building it, opening it up and making it even more useful for researchers.
9:00-9:40 Roman Kern and Stefan Klampfl. Extraction of References Using Layout and Formatting Information from Scientific Articles
9:40-10:10 Robert Patton, Christopher Stahl, Jayson Hines, Thomas Potok and Jack Wells. Multi-year Content Analysis of User Facility Related Publications
10:10-10:40 Coffee Break
10:40-11:30 Johan Bollen, Indiana University. Studying scholarly communication from social media data
11:30-12:10 Nicolai Erbs, Iryna Gurevych and Marc Rittberger. Bringing Order to Digital Libraries: From Keyphrase Extraction to Index Term Assignment
12:10-12:40 Francesco Osborne and Enrico Motta. Exploring Research Trends with Rexplore
12:40-13:40 Lunch
13:40-14:20 Petr Knoth, Open University. From Open Access Metadata to Open Access Content: Towards an Infrastructure for Mining Scientific Publications
14:20-15:00 Muhammad Imran, Syed Zeeshan Haider Gillani and Maurizio Marchese: A Real-time Heuristic-based Unsupervised Method for Name Disambiguation in Digital Libraries
15:00-15:30 Coffee Break
15:30-16:20 Roundtable discussion
16:20-16:30 Conclusions & Closing


Petr Knoth, Knowledge Media institute, The Open University, UK
Zdenek Zdrahal, Knowledge Media institute, The Open University, UK
Markus Muhr, The European Library/Europeana, The Netherlands
Nuno Freire, The European Library/Europeana, The Netherlands


Robert Sanderson, Los Alamos National Laboratory, United States
Paolo Manghi, ISTI-CNR (DRIVER, OpenAIRE), Italy
Jan Hajic, Charles University in Prague, Czech Republic
Antoine Isaac, Europeana, The Netherlands
Loukas Anastasiou, The Open University, United Kingdom
Kris Jack, Mendeley Ltd., United Kingdom
Xiaolin Shi, Microsoft, United States
José Borbinha, Instituto Superior Técnico, Portugal
Johan Bollen, Indiana University, United States
Pável Calado, Instituto Superior Técnico, Portugal
Roman Kern, Know Center Graz, Austria
Bruno Martins, Instituto Superior Técnico, Portugal
Andreas Juffinger, CSC, Austria
Iryna Gurevych, Darmstadt University of Technology, Germany
Shane Bergsma, John Hopkins University, United States
Tanja Urbancic, University of Nova Gorica & Jozef Stefan Institute, Slovenia
Ziqi Zhang, University of Sheffield, United Kingdom

©2nd International Workshop on Mining Scientific Publications. Design based on CSS Templates For Free. Design based on Free Website Templates.