Digital libraries that store scientific publications are becoming increasingly central to the research process. They are not only used for traditional tasks, such as finding and storing research outputs, but also as a source for discovering new research trends or evaluating research excellence. With the current growth of scientific publications deposited in digital libraries, it is no longer sufficient to provide only access to content. To aid research, it is especially important to leverage the potential of text and data mining technologies to improve the process of how research is being done.
This workshop aims to bring together people from different backgrounds who: (a) are interested in analysing and mining databases of scientific publications, (b) develop systems that enable such analysis and mining of scientific databases (especially those who run databases of publications) or (c) who develop novel technologies that improve the way research is being done.
The topics of the workshop will be organised around thefollowing themes:
Topics of interest relevant to theme 1 include, but are not limited to:
Topics of interest relevant to theme 2 include, but are not limited to:
Topics of interest relevant to theme 3 include, but are not limited to:
We would like to invite the workshop participants to make use of the CORE publications dataset containing large volume of research publications from a wide variety of research areas. The dataset contains not only full-texts, but also an enriched version of publications’ metadata. The aim is to provide a framework for developing and testing methods and tools addressing the workshop topics. The use of this dataset is not mandatory, however it is encouraged. The dataset is now available through CORE portal here.
The workshop on Mining Scientific Publications aims to bring together researchers, digital library developers and practitioners from government and industry to address the current challenges in the domain of mining scientific publications.
The 1st International Workshop on Mining Scientific Publications was held in conjunction with JCDL 2012. The 2nd run of this workshop was held in conjunction with JCDL 2013. The 3rd run was especially popular and was associated with DL 2014 in London. The 4th run was held together with JCDL 2015. All runs of the workshop have been extremely successful in terms of attracting submissions and participants from leading institutions in the area including Cambridge University, British Library, Elsevier Labs, National Library of Medicine, Library of Congress, University of Pennsylvania (CiteSeerX), Know-Center Graz, University of Athens (OpenAIRE project) and Mendeley.
We invite submissions related to the workshop's topics. Long papers should not exceed 8 pages and short papers should not exceed 4 pages of the ACM style. Furthermore, we welcome demo presentations of systems or methods. A demonstration submission should consist of a maximum two page description of the system, method or tool to be demonstrated.
The ACM proceedings template can be found on the ACM website. Papers should be submitted using the EasyChair system provided here.
Successful submissions will be published in the D-Lib Magazine.
The 1st international workshop on mining scientific publications proceedings are available here.
The 2nd international workshop on mining scientific publications proceedings are available here.
The 3rd International Workshop on Mining Scientific Publications proceedings are available here.
The 4th International Workshop on Mining Scientific Publications proceedings are available here.
Michael Kurtz is an astronomer and computer scientist at the Harvard-Smithsonian Center for Astrophysics in Cambridge, Massachusetts, which he joined after receiving a PhD in Physics from Dartmouth College in 1982. Kurtz is the author or co-author of over 300 technical articles and abstracts on subjects ranging from cosmology and extragalactic astronomy to data reduction and archiving techniques to information systems and text retrieval algorithms.
Kurtz is the founder and project scientist of the Smithsonian/NASA Astrophysics Data System (ADS) for which he won the van Biesbroeck prize of the American Astronomical Society. He has received the Citation research award from the American Society for Information Science; he is a fellow in the astrophysics section of the American Physical Society, and a fellow in the Information, Computing and Communication section of the American Association for the Advancement of Science.
He is on the board of directors of the Classification Society and the board of advisors of Force11. He is the moderator of the astrophysics Instrumentation and Methods section of arXiv, and is an editor of the Journal of the Association for Information Science and Technology.
List of publications via ADS: (h index 35)
List of publications via Google Scholar: (h index 40)
Wikipedia article: http://en.m.wikipedia.org/wiki/Michael_J._Kurtz
Website: www.cfa.harvard.edu/~kurtz
The Smithsonian/NASA Astrophysics Data System (ADS) is one of the oldest web based scholarly information systems. Next year we will have been online for a quarter of a century. Today it contains metadata on more than 11 million articles, and the full text for 5 million, including nearly every refereed article in physics, astrophysics, or geophysics. The ADS is used daily by several tens of thousand scientists, including essentially every research astronomer on earth, as well as weekly to monthly by a few hundred thousand more students and researchers, and with occasional use by several million members of the general public.
With substantial help from its collaborators the ADS uses a plethora of techniques to build, maintain, and enhance its services. These include text mining of articles and meta-data; data mining of usage logs; the development and implementation of new bibliometric measures for papers, people, and organizations; semantic tagging, and the creation of links to external data sources; machine learning and text classification; recommender systems; real-time network analysis; and various user interface issues.
The ADS is available at ads.harvard.edu, and a full featured API exists for developer and researcher use at https://github.com/adsabs
April 17th — Submission deadline
April 24th — New submission deadline
April 27th — New submission deadline
May 27th — Notification of acceptance
June 15th — Camera-ready
June 22nd (afternoon)-June 23rd (morning) — Workshop
Day 1 |
|
13:30-14:00 |
Registration and posters Subject Area Visual Analytics of Scientific User Facility Publications Robert Patton, Christopher Stahl, Chelsey Stahl, Thomas Potok and Jack Wells Extracting biological knowledge from literature using SQL Yannis Foufoulas, Anna Gogolou, Lefteris Stamatogiannakis, Harry Dimitropoulos, Natalia Manola and Yannis Ioannidis Towards deeper level of scientific publications world Marcin Skulimowski Language infrastructures in support of text mining Stelios Piperidis, Maria Gavrilidou and Penny Labropoulou |
14:00-14:10 |
Introduction |
14:10-14:45 |
Keynote AMiner: toward understanding big scholar data Yuxiao Dong |
14:45-15:05 |
Long paper Quantifying conceptual novelty in the biomedical literature Shubhanshu Mishra and Vetle Torvik |
15:05-15:20 |
Short paper Capturing Interdisciplinarity from Academic Abstracts Federico Nanni, Laura Dietz, Stefano Faralli, Goran Glavas and Simone Paolo Ponzetto |
15:20-15:50 |
Break and posters (continued) |
15:50-16:15 |
Invited talk Making sense of scientific textual content Stelios Piperidis |
16:15-16:30 |
Short paper Crawling Scientific Repositories: Challenges and Solutions for Automated Retrieval from Google Scholar and Co. Philipp Meschenmoser, Manuel Hotz, Bela Gipp and Norman Meuschke |
16:30-16:45 |
Demo Extraction of Text from PDF Research Articles Using Font Analysis Stephen Gilbert, Nirav Kamdar, Vijay Kalivarapu and Annette O'Connor |
16:45-17:00 |
Demo COBRA: Publication Discovery and Management System Christopher Stahl, Robert Patton and Jack Wells |
18:00 |
Social dinner |
Day 2 |
|
09:00-09:35 |
Keynote ADS: The Joy of Text Michael J. Kurtz |
09:35-09:55 |
Long paper Rhetorical Classification of Anchor Text for Citation Recommendation Daniel Duma, Maria Liakata, Amanda Clare, James Ravenscroft and Ewan Klein |
09:55-10:10 |
Short paper Temporal Properties of Recurring In-text References Marc Bertin and Iana Atanassova |
10:10-10:25 |
Break |
10:25-10:50 |
Invited talk Making sense of unstructured textual data to enhance information discovery and linking. Challenges and potential of text mining in scholarly IR Peter Mutschke |
10:50-11:10 |
Long paper Measuring Scientific Impact Beyond Citation Counts Robert Patton, Christopher Stahl and Jack Wells |
11:10-11:25 |
Short paper Preliminary Studies on the Impact of Literature Curation by Model Organism Databases on Article Citation Rates Michael Lauruhn, Tanya Berardini, Leonore Reiser and Ronald Daniel |
11:25-11:40 |
Break |
11:40-12:00 |
Long paper The Impact of Academic Mobility on the Quality of Graduate Programs Thiago Silva, Alberto Laender, Clodoveu Davis Jr, Ana Paula Silva and Mirella Moro |
12:00-12:20 |
Long paper An Analysis of the Microsoft Academic Graph Drahomira Herrmannova and Petr Knoth |
12:20-12:50 |
Discussion |
12:50-13:00 |
Closing |
13:00-14:00 |
Lunch |
Petr Knoth, Knowledge Media institute, The Open University, UK
Drahomira Herrmannova, Knowledge Media institute, The Open University, UK
Lucas Anastasiou, Knowledge Media institute, The Open University, UK
Nancy Pontika, Knowledge Media institute, The Open University, UK
Pável Calado, Instituto Superior Técnico, Universidade de Lisboa, Portugal
Bradford Demarest, Indiana University Bloomington, USA
Iryna Gurevych, Darmstadt University of Technology, Germany
Antoine Isaac, Europeana & VU University Amsterdam, Netherlands
Roman Kern, Graz University of Technology, Austria
Martin Klein, Los Alamos National Laboratory, USA
Paolo Manghi, ISTI-CNR, Italy
Bruno Martins, Instituto Superior Técnico, Universidade de Lisboa, Portugal
Franco Maria Nardini, ISTI-CNR, Italy
Francesco Osborne, KMi, The Open University, UK
Eloy Rodrigues, Universidade do Minho, Portugal
Angelo Antonio Salatino, KMi, The Open University, UK
Pavel Smrz, Brno University of Technology, Czech Republic
Wojtek Sylwestrzak, ICM Univeristy of Warsaw, Poland
Vetle Torvik, University of Illinois at Urbana-Champaign, USA
Saeed Ul Hassan, Information Technology University, Pakistan
Ziqi Zhang, University of Sheffield, UK
Rutgers University
Newark, NJ
USA