PRC Report Reveals: Researcher Awareness of Text Mining Relatively Low, Majority Open to Learning More

London, May 2016: The Publishing Research Consortium has commissioned Maralte BV to investigate the knowledge, views and experiences of researchers with text mining of journal literature.

The Report’s key findings are:

  • A growing number of papers refer to text mining techniques, but represent less than 0.1% of the total published literature, so the visible impact so far is still quite small.
  • Awareness of text mining techniques is still relatively low. Three out of four respondents had not used the technique, and of these two thirds had not heard of text mining before this survey.
  • Conversely, current users envisage the technique to hold great promise and have typically been using it for some years. The focus is on application to their research rather than an IT focus on the tools themselves.
  • Amongst those not currently using the technique, the most appealing application is for automated systematic literature review, and over two thirds are open to evaluating text mining as a technique to apply for their research.
  • Experienced users see the benefits in extracting information, concepts and new facts.
  • Although views on the infrastructure available for text mining are mixed, there is more agreement than disagreement that the current infrastructure supports the use of text mining in terms of software tools, availability of journal sources, and institutional support.
  • The text mining technique is not yet ‘plug and play’; it still predominantly requires programming skills and by many is seen as still at an experimental phase.

Marten Stavenga, Director of Maralte BV comments: “Our respondents across all the main subject areas using text mining techniques are very positive about text mining of journal literature, and they envisage that text mining has the potential to be of relevance for every researcher in their field.”

Michael Mabe, Chair of PRC comments: “PRC has commissioned this study because we are aware of rising attention being given to the potential for text mining techniques, and we are curious to understand the state of play with researchers. This study serves as a benchmark for further study and helps inform the current debate around text and data mining.”

The study involved a survey in March/April 2016 of over 15,000 researchers, generating 520 responses, together with extensive desk research involving keyword searches of the journal literature in Elsevier’s abstract and citation database Scopus to gauge the extent to which text mining techniques are being applied to current research.

This survey and literature search represents the most up to date snapshot of text mining activity and awareness of the techniques among researchers. While not derived from a fully representative and statistically significant sample, the range of responses nevertheless provides a valuable benchmark for future surveys and a reference point for current debates about the value of text mining to science, still very much a fluid area.

The report Text Mining of Journal Literature 2016: Insights from researchers worldwide is available from

The Publishing Research Consortium is a group of associations and publishers that support research into global issues that impact scholarly communication, in order to promote evidence-based discussion. Our steering group comprises representatives from the International Association of STM Publishers, The Publishers Association, Association of Learned and Professional Society Publishers, Association of American Publishers, Taylor & Francis, Elsevier, Springer Nature, and Wiley.