The 5th International Workshop on Mining Scientific Publications website is now available here.
Digital libraries that store scientific publications are becoming
increasingly central to the research process. They are not only
used for traditional tasks, such as finding and storing research
outputs, but also as a source for discovering new research trends
or evaluating research excellence. With the current growth of
scientific publications deposited in digital libraries, it is no longer
sufficient to provide only access to content. To aid research, it is
especially important to leverage the potential of text and data
mining technologies to improve the process of how research is
This workshop aims to bring together people from different
backgrounds who: (a) are interested in analysing and mining
databases of scientific publications, (b) develop systems that
enable such analysis and mining of scientific databases (especially
those who run databases of publications) or (c) who develop novel
technologies that improve the way research is being done.
The topics of the workshop will be organised around the
- The whole ecosystem of infrastructures including
repositories, aggregators, text-and data-mining facilities,
impact monitoring tools, datasets, services and APIs
that enable analysis of large volumes of scientific
- Semantic enrichment of scientific publications by means
of text and data mining, crowdsourcing or other
Analysis of large databases of scientific publications to
identify research trends, high impact, cross-fertilisation
between disciplines, research excellence etc.
Topics of interest relevant to theme 1 include, but are not limited to:
Infrastructures including repositories, aggregators, text-and
data-mining facilities, impact monitoring tools, datasets,,
services and APIs for accessing scientific publications
and/or research data
The existence of datasets, services,
systems and APIs (in particular those that are open)
providing access to large volumes of scientific publications
and research data, is an essential prerequisite for being able
to research and develop new technologies that can transform
the way people do research. We invite papers presenting
innovative approaches to the development of these systems
that enable people to access databases and carry out their
analysis. Papers addressing Open Access are of special
interest. We also welcome submissions discussing the
technical aspects of supporting Open Science, in particular
reproducibility of research, sharing of scientific workflows
and linking research data with publications. Finally, we also
invite papers discussing issues and current challenges in the
design of these systems.
Topics of interest relevant to theme 2 include, but are not limited to:
Information extraction approaches
Novel information extraction and text-mining approaches to
semantic enrichment of publications. This might range from
mining publication structure, such as title, abstract, authors,
citation information etc. to more challenging tasks, such as
extracting names of applied methods, research questions (or
scientific gaps), identifying parts of the scholarly discourse
Automatic categorization and clustering of scientific
Methods that can automatically categorize
publications according to an established subject-based classification/taxonomy (such as Library of Congress
classification, UNESCO thesaurus, DOAJ subject
classification, Library of Congress Subject Headings) are of
particular interest. Other approaches might involve
automatic clustering or classification of research
publications according to various criteria.
New methods and models for connecting and interlinking
Scientific publications in digital
libraries are not isolated islands. Connecting publications
using explicitly defined citations is very restrictive and has
many disadvantages. We are interested in innovative
technologies that can automatically connect and interlink
publications or parts of publications according to various
criteria, such as semantic similarity, contradiction, argument
support or other relationship types.
Models for semantically representing and annotating
publications. This topic is related to the aspect of
semantically modeling publications and scholarly discourse.
Models that are practical with respect to the state-of-the-art
in Natural Language Processing (NLP) technologies are of a
Semantically enriching/annotating publications by
Crowdsourcing can be used in innovative
ways to annotate publications with richer metadata or to
approve/disapprove annotations created using text-mining or
other approaches. We welcome papers that address the
following questions: (a) what incentives should be provided
to motivate users in contributing, (b) how to apply
crowdsourcing in the specialized domains of scientific
publications, (c) what tasks in the domain of organising
scientific publications is crowdsourcing suitable for and
where it might fail, other relevant crowdsourcing topics
relevant to the domain of scientific publications.
Topics of interest relevant to theme 3 include, but are not limited to:
New methods, models and innovative approaches for
measuring impact of publications.
The most widely used
metrics for measuring impact are based on citations. However,
counting citations not taking into account the publication
content and the qualitative nature of the citation. In addition,
there is a delay between the publication and the measurable
impact in citations. We in particular encourage papers
addressing new ways of evaluating publications’ impact
beyond standard citation measures.
New methods for measuring performance of researchers.
Methods for assessing impact of a publication can be often
extended to methods that can assess the impact of individual
researchers. However, there are also other criteria for
measuring impact in addition to publications, such as the
development and publication of research data, economical and
market impact that should also be taken into account. We
welcome papers addressing these aspects.
Evaluating impact of research groups.
The same as for impact
of individuals holds for research communities.
Methods for identifying research trends and cross-fertilization
between research disciplines.
Identifying research trends
should allow discovering newly emerging disciplines or it
should help to explain why certain fields are attracting the
attention of a wider research community. Such monitoring is important for research funders and governments in order to be
able to quickly respond to new developments. We invite
papers discussing new methods for identifying trends and
cross-fertilization between research disciplines using methods
ranging from social network analysis and text- and datamining
to innovative visualization approaches.
Application and case studies of mining from scientific
databases and publications.
New methods and models
developed for mining from scientific publications can be
applied in many different scenarios, such as improving access
to scientific publications, providing exploratory search in
digital collections, identifying experts. We encourage papers
describing innovative approaches that use scientific
publications and data to solve real-world problems.
Improving the infrastructure of repositories to support the
development and integration of new impact and performance
New ways of improving the repository infrastructure
can include, for example, tracking accesses and downloads,
researcher profiling and the interlinking of repository data
with external services.. These can be in turn used for
developing new impact metrics. We welcome papers
addressing these issues.
3. SPECIAL OPEN PUBLICATIONS DATASET TRACK
Like the last version of the workshop, we would like to invite the workshop participants to make use of the CORE publications dataset containing large volume of research publications from a wide variety of research areas. The dataset contains not only full-texts, but also an enriched version of publications’ metadata. The aim is to provide a framework for developing and testing methods and tools addressing the workshop topics. The use of this dataset is not mandatory, however it is encouraged. The dataset is now available through CORE portal here
4. EXPECTED AUDIENCE
The workshop on Mining Scientific Publications aims to bring
together researchers, digital library developers and practitioners
from government and industry to address the current challenges in
the domain of mining scientific publications.
5. PREVIOUS ORGANISATION
The The 1st International Workshop on Mining Scientific Publications was held in conjunction with JCDL 2012. The 2nd run of this workshop was held in conjunction with JCDL 2013. The 3rd run was associated with DL 2014 in London. All runs of the workshop have been extremely successful in terms of attracting submissions and participants from leading institutions in the area including Cambridge University, British Library, Elsevier Labs, National Library of Medicine, Library of Congress, University of Pennsylvania (CiteSeerX), Know-Center Graz, University of Athens (OpenAIRE project) and Mendeley.
6. SUBMISSION FORMAT
We invite submissions related to the workshop’s topics. Long papers should not exceed 8 pages and short papers should not exceed 4 pages of the ACM 2 column style. Furthermore, we welcome demo presentations of systems or methods. A demonstration submission should consist of a maximum two page description of the system, method or tool to be demonstrated.
Papers should be submitted using the easychair system provided here.
Successful submissions will be published in the D-Lib Magazine.
The 1st International Workshop on Mining Scientific Publications proceedings are available here.
The 2nd International Workshop on Mining Scientific Publications proceedings are available here.
The 3rd International Workshop on Mining Scientific Publications proceedings are available here.
6. KEYNOTE SPEAKERS
The workshop will include keynote presentations by Alex Wade
from Microsoft Research and Robert Patton
from Oak Ridge National Laboratory.
Alex Wade, Microsoft Research
Alex Wade is Director for Scholarly Communication at Microsoft Research, where he oversees a portfolio of research-focused products and services. Alex holds a Bachelor's degree in Philosophy from U.C. Berkeley, and a Masters of Librarianship degree from the University of Washington. During his career at Microsoft, Alex has managed Microsoft’s internal corporate search and taxonomy management services, and has served as Senior Program Manager for Windows Search
for multiple Windows OS releases. Prior to joining Microsoft, Alex was Systems Librarian at the University of Washington, and held technical library positions at the University of Michigan and the University of California at Berkeley. Alex is currently responsible for Microsoft Academic, a service designed to enable rich entity-based discovery and navigation of researchers, topics, conferences, and publications.
Knowledge Extraction and Conversational Agents
Web-scale search has been around for more than twenty years, and the latest evolution is moving it beyond simple keyword search and into better understanding of the content, large-scale mapping of the world’s knowledge, and richer ways to understand and respond to users intent. As a result, web-scale search is beginning to provide richer discovery than ever before, and bring the right content and answers to users when and where it is needed. This talk will cover several new approaches to academic information discovery and specifically how Microsoft Research is bringing academic knowledge to Bing and Cortana.
Robert M. Patton, Oak Ridge National Laboratory
Robert Patton is an analytics researcher at Oak Ridge National Laboratory (ORNL), where he is the lead scientist for mining scientific publications for ORNL user facilities. Robert holds a Ph.D. in Computer Engineering from the University of Central Florida. His research at ORNL has focused on nature-inspired analytic techniques to enable knowledge discovery from large and complex data sets, and has resulted in more than 40 publications, 3 patents, 2 R&D 100 Awards, and 2 commercial licenses. He has developed several algorithms and software tools for the purposes of data mining and temporal analysis of text data.
How Scientific User Facilities Impact Science
Scientific user facilities provide resources and support that enable scientists to conduct experiments or simulations pertinent to their respective research. In order to fully understand the scientific impact that these facilities make, subject area must be taken into account in addition to citation counts of the publications. In addition, users of these facilities are often teams of people with varying degrees of expertise. This talk will highlight how mining user facility publications not only influence the development of new scientific knowledge but also the dynamics of science team performance and development.
7. IMPORTANT DATES
May 3rd, 2015 11:59 (Hawaii time) - Submission deadline
May 10th, 2015 11:59 (Hawaii time) - New submission deadline
May 24th, 2015 - Notification of acceptance
June 17th, 2015 - Camera-ready
June 24th, 2015 - Workshop
Knowledge Extraction and Conversational Agents
Efficient Table Annotation for Digital Articles
Matthias Frey and Roman Kern
Structured affiliations extraction from the scientific literature
Dominika Tkaczyk, Bartosz Tarnawski and Łukasz Bolikowski
MapAffil: A bibliographic tool for mapping author affiliation strings to cities and their geocodes worldwide
NLP4NLP: the cobbler’s children won’t go unshod
Gil Francopoulo, Joseph Mariani and Patrick Paroubek
How Scientific User Facilities Impact Science
Robert M. Patton
Tool Demonstration: A 2nd Order Emergence Indicator
Jon Garner, Stephen Carley, Alan Porter and Denise Chiavetta
CORE API v2.0
PubIndia: A Framework for Analyzing Indian Research Publications in Computer Sciences
Tanmoy Chakraborty, Mayank Singh and Soumajit Pramanik
Semantometrics: Fulltext-based measures for analysing research collaboration
Drahomira Herrmannova and Petr Knoth
||Conclusions & closing
9. ORGANIZING COMMITTEE
Petr Knoth, Knowledge Media institute, The Open University, UK
Kris Jack, Mendeley Ltd., United Kingdom
Nuno Freire, The European Library/Europeana, The Netherlands
Drahomira Herrmannova, Knowledge Media institute, The Open University, UK
Nancy-Pontika, Knowledge Media institute, The Open University, UK
Lucas Anastasiou, Knowledge Media institute, The Open University, UK
10. PROGRAMME COMMITTEE
Bruno Martins, Technical University of Lisbon(IST), Portugal
Martin Klein, Los Alamos National Laboratory, USA
Ziqi Zhang, University of Sheffield, UK
C. Lee Giles, Pennsylvania State University, USA
Francesco Osborne, Knowledge Media institute, The Open University, UK
Natalia Manola, University of Athens, Greece
Antoine Isaac, Europeana & VU University Amsterdam, Netherlands
Iryna Gurevych, Darmstadt University of Technology, Germany
Paolo Manghi, ISTI-CNR, Italy
Tanja Urbancic, Jozef Stefan Institute, Slovenia
Pável Calado, IST/INESC-ID, Portugal
Robert Patton, Oak Ridge National Laboratory, US
Roman Kern, Graz University of Technology, Austria
Eloy Rodrigues, Universidade do Mingo, Portugal
Jose Borbinha, IST/INESC-ID, Portugal
University of Tennessee Conference Center
Knoxville, Tennessee, USA