This is the first post on this blog in 881 days, so it must be significant! And this is a copied post from the main Trip Database Blog. If you’re interested in rapid reviews – how does ten minutes sound?
Anyway, the post…:
Some years ago, I worked at Public Health Wales (PHW), and I’ve remained connected to the organisation and interested in the field of public health. Recognising that the evidence needs in public health often differ from those in mainstream clinical medicine, I was curious to explore how automation—particularly large language models (LLMs)—might support the production of public health evidence syntheses.
To test this, I selected a topic at random from PHW’s evidence service: A rapid review of barriers and facilitators to cancer screening uptake (breast, cervical, and bowel) in underserved populations, which had been published on medRxiv.
To explore this further, I adapted the methodology we use in our automated Q&A system, introducing an additional step – citation chasing – into the evidence-gathering process. Starting with 11 highly relevant original articles from the initial search, we used both backward and forward citation chasing to identify an additional 28 studies. With further layers of chasing, we could have found even more, but for this proof-of-concept exercise, the goal was simply to see whether the approach would work.
The combined set of 39 studies was analysed by the LLM, which extracted relevant content and drafted a comprehensive review. In addition to the main review, we asked the system to generate an abstract, a one-page summary for policymakers, and a formal methods section. The final output totalled approximately 10 pages. As a taster, here is the abstract (let me know if you’d like to see the full report:
“Cancer screening is a critical tool for early detection and improved health outcomes, yet underserved populations face significant barriers to accessing these life-saving services. This report synthesizes findings from multiple studies to identify key barriers and facilitators to cancer screening uptake for cervical, breast, and colorectal cancers among underserved groups. Common barriers include lack of knowledge, cultural and social norms, financial constraints, psychological fears, and distrust in the healthcare system. Facilitators include healthcare provider recommendations, culturally tailored interventions, alternative screening methods (e.g., self-sampling), and organized programs with support systems. The report highlights the need for multi-level, community-centered interventions that address systemic, cultural, and individual barriers. Policy recommendations include expanding access to affordable screening, increasing culturally competent care, and leveraging community-based outreach to improve screening rates and reduce health disparities.“
To evaluate the quality and usefulness of the output, we asked ChatGPT to compare our generated review with the original medRxiv article:
“Both the automated report and the medRxiv article explore the same core issue: understanding the barriers and facilitators to cancer screening uptake among underserved populations, specifically in relation to breast, cervical, and colorectal cancers. They arrive at broadly similar conclusions, highlighting key obstacles such as limited knowledge and awareness, cultural stigma, psychological fear, logistical and financial constraints, and widespread distrust in healthcare systems—particularly among marginalised groups. Likewise, both recognise similar facilitators: strong recommendations from healthcare providers, culturally tailored interventions, community-based outreach, the use of peer advocates, and alternative screening methods like self-sampling.
Despite this common ground, the two documents differ significantly in style, methodology, and intended use. The automated report takes a narrative and practice-oriented approach. It uses large language models (LLMs) to guide literature search and synthesis, offering rich thematic analysis with detailed contextual commentary for healthcare professionals. It includes a policy-maker one-pager, targeted insights for frontline practitioners, and practical recommendations rooted in real-world applicability. Its tone is accessible and it’s structured for immediate relevance to policy and service design.
In contrast, the medRxiv article follows a more formal academic structure, presenting a systematic review conducted using PRISMA guidelines. Its distinctive feature is the use of the COM-B model (Capability, Opportunity, Motivation – Behaviour), a behavioural science framework used to categorise and interpret barriers and enablers to screening. This gives it strong utility for designing or evaluating interventions, especially in academic or research settings. It is less practice-facing but more tightly aligned to theoretical models of behaviour change.
In terms of utility, the automated report is particularly valuable for practitioners, community health programme designers, and policy-makers seeking actionable guidance and context-sensitive strategies. It is grounded in practical implementation. The medRxiv article, by contrast, would be most useful for researchers, public health academics, and those developing behaviourally-informed interventions where theoretical coherence is important.
In essence, the two are complementary. One is a tool for action; the other, a tool for theory-driven design. Used together, they offer a robust foundation for improving cancer screening in underserved populations.”
It’s worth noting that this may not be a fully fair comparison, as the medRxiv article follows a formal academic journal format. There may well be a more practitioner-focused, internal version of the review that could have provided a closer parallel to the automated output. It’s also important to emphasise that this was a proof of concept designed to test the feasibility of the approach. With further refinement and more examples, the process and the quality of outputs are likely to improve significantly.
The PHW review involved eight authors and likely required hundreds of hours of work over several months. That effort undoubtedly brought depth, rigour, and valuable expert input—qualities that may exceed the automated version in certain areas.
In contrast, the fully automated approach we tested could produce a review in under ten minutes.
To be clear, I am not suggesting that this type of automation should replace traditional review processes used by organisations like PHW. However, I do see two clear potential use cases:
Efficiency and Acceleration – Could automation be used to do the heavy lifting in the early stages of a review, significantly speeding up the process and freeing up expert time for interpretation and refinement?
Capacity Gaps – During my time at PHW, the evidence service often had to decline requests due to limited capacity. In such cases, a fully automated review – while not perfect – may be far better than no review at all, or one done hastily under resource constraints.
It’s still early days, but the potential is hugely exciting. The question now is: where could this take us next?