Introduction to text mining and machine learning in systematic reviews

By Tom Roper, Clinical Librarian, Royal Sussex County Hospital

A group of librarians from NICE, Public Health England, universities and NHS Library and Knowledge services were privileged to attend a workshop on Text Mining and Machine Learning in Systematic Reviews, led by [James Thomas] (http://iris.ucl.ac.uk/iris/browse/profile?upi=JTHOA32), Professor of Social Research and Policy at the EPPI-Centre.  James designed [EPPI-Reviewer[ (https://eppi.ioe.ac.uk/CMS/Default.aspx?alias=eppi.ioe.ac.uk/cms/er4), software to manage all types of literature review, including systematic reviews, meta-analyses, ‘narrative’ reviews and meta-ethnographies, and leads Cochrane’s [Project Transform](https://community.cochrane.org/help/tools-and-software/project-transform).

James outlined the problem: we systematically lose research, and then spend a great deal of effort and money on trying to find it again. We need to use correct methods, and, moreover, need to be seen to be correct. There are quantitative issues as well: Cochrane reviewers screen more than 2 million citations a year.  Can this considerable human effort be made more manageable by the judicious use of text mining and machine learning? While tools are being developed to help this task, their development is uneven, as is their adoption.

James distinguished between three types of machine learning, rules-based (unfashionable in computer science circles, he warned), unsupervised, and supervised, and gave us opportunities to try out tools based on these approaches using our own devices.

Rules-based approaches are accurate, but fragile – they either work, or fail completely. Unsupervised approaches work by leaving a machine to identify patterns in the data, for example by clustering documents, for example [LDAVis ]( http://eppi.ioe.ac.uk/ldavis/index.html#topic=6&lambda=0.63&term=) based, you don’t need me to tell you, on Latent Dirichlet Allocation.

Supervised approaches require a human or humans to give the machine training data; after a while, from a 280,000 row spreadsheet in an example James quoted, a statistical model can be constructed which can then be used with new material to determine whether or not a study is a randomised controlled trail or not. Training data comes from people, including data generated for other purposes, data created for the project itself  and crowd-sourced data, as in the case of [Cochrane Crowd ]( http://crowd.cochrane.org/index.html), which mobilises Cochrane Citizen Scientists to decide whether or not the subject of a database record is an RCT.

In systematic reviews, these approaches may be used to identify studies by citation screening or classification, to map research activity, and to automate data extraction, including performing Risk of Bias assessment and extraction of statistical data. Readers may be familiar with tools that take a known set of citations, and use word frequency counts, or analysis of phrases and adjacent terms to create word or phrases lists or visualisations.  Similarly, term extraction and automatic clustering can be used to do statistical and linguistic analysis on text, for human review, and, if deemed useful, modification of an initial search strategy. [Voyant Tools]( https://voyant-tools.org/) is one example, as are [Bibexcel]( https://homepage.univie.ac.at/juan.gorraiz/bibexcel/), [Termine]( http://www.nactem.ac.uk/software/termine/) and even the use of Endnote’s subject bibliography feature to generate lists of keywords.

Citation networks can be used for supplementary searching – will this change, James asked, if or when all bibliographic data becomes open? Useful tools here, apart from traditional ones such as Web of Science, include [VosViewer]( http://www.vosviewer.com/). We also spent some time playing with [EPPI-Reviewer]( https://eppi.ioe.ac.uk/eppireviewer-web/home), the EPPI-Centre’s own tool for systematic reviewers and with [Carrot2 Search](http://search.carrot2.org/stable/search)

In the future, James suggested that there is a great deal of interest in a “surveillance” approach to finding evidence, which can automatically identify if a review or some guidance needs updating. Cochrane are developing the [Cochrane Evidence Pipeline](https://community.cochrane.org/help/tools-and-software/evidence-pipeline) which aims to triage citations found by machine or crowd-sourced methods can either be triaged by the relevant Cochrane Review Group, or assessed using machine-learning.

While the workshop focussed on systematic reviews, for a jobbing librarian like me in a clinical setting, searches to support systematic review will make up only a small part of the workload. Nevertheless, searches still need to be conducted soundly and rigorously. Can artificial intelligence and machine learning help? Certainly some of the tools James showed are useful when formulating search strategies. A group within London and Kent Surrey and Sussex NHS Libraries is developing a search protocol for the region. We may well find ourselves referencing some of these tools. It is always stimulating to hear a world leader in a field talk, and I’m sure all the workshop participants would join me in thanking both Professor Thomas for giving up his time, and Health Education England for organising the workshop.

The tools James described, and more, may be found on the [EPPI-Centre website] (http://eppi.ioe.ac.uk/cms/Default.aspx?tabid=3677). See also the National Centre for Text Mining’s page of [software tools] (http://www.nactem.ac.uk/software.php)

For a systematic review on the subject see:

O’Mara-Eves A, Thomas J, McNaught J, Miwa M, Ananiadou S. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst Rev. 2015 Jan 14;4:5. doi: 10.1186/2046-4053-4-5.

For a more recent overview, I would recommend Julie Glanville’s chapter on Text Mining for Information Specialists in Paul Levay and Jenny Craven’s new book on systematic searching:

Glanville J. Text mining for information specialists. In: Craven J, Levay P, editors. Systematic searching:  practical ideas for improving results.  London : Facet Publishing 2018. p.147-169.

Evaluating the STEP literature searching eLearning modules

In September 2018, the STEP project team launched the final module of “How to Search the Literature Effectively”on e-Learning for Healthcare.

We are really keen to find out whether librarians are using the eLearning and to capture what they think. Please could you to take 5 minutes to complete our survey and to tell us how you are using these modules to develop the skills of your end users.

The survey will close on 24th May 2019

Survey link https://www.surveymonkey.com/r/VDY3WYS

If you have any queries, please contact

tracey.pratchett@lthtr.nhs.uk or sarah.lewis23@nhs.net

TripPro – liberating the librarian

Thank you to everyone who completed our online survey about TripPro back in December. We had a good response from an estimated 35% of NHS library services in England in addition to feedback from Public Health library teams.  Your feedback helped inform the decision for HEE to fund TripPro nationally for a further two years from April 2019.

What you said about the benefits of TripPro 

  • It’s used by many to support mediated literature search services, often as a scoping tool in the initial stages of a search 
  • It’s a handy short cut to grey literature, particularly useful for guidelines and systematic reviews  
  • It’s useful in training sessions, especially to demonstrate the hierarchy of evidence to newer users as it is so clearly laid out  
  • It’s a quick and easy way to focus non-professional searchers on quality documents 
  • It complements NICE Evidence search 

What’s new? 

  • Trip has recently started working with UnPayWall and as a result up to 70% of all articles (depending on document age and clinical area) are now available as full-text link outs 
  • The search technology has been upgraded, so the indexing is better and search results are more focused than previously 
  • Automated evidence maps have been added as a novel way of exploring the evidence base for interventions in a given topic area 
  • The Trip app is now available on the Apple App Store – and an Android version will be available soon 

What’s next? 

Your feedback also highlighted some of the limitations of TripPro and the scope for closer integration with other resources, more access routes and better signposting. We are working with Jon Brassey, Director of TripPro to explore ways to address these and will keep you updated.  

How do I access it? 

Access to TripPro is via IP address so you should get seamless access to the Pro version from your workplace. If you find you can’t access the Pro features, contact Jon Brassey with your organisation’s IP address (jon.brassey@tripdatabase.com

If you would like further details or a copy of the full survey report, please contact Helene Gorring, Library & Knowledge Services Lead in London and KSS.