Training Workshop: Combining imaging data using machine learning
This two-day workshop held in September 2021 provided theoretical learning and practical training for over 120 early career researchers on machine learning approaches to combine datasets (Day 1) and modalities (Day 2) in the context of dementia research.
The main goal was to inform early career researchers with experience or interest in using imaging data about innovative machine learning approaches and algorithms that may be used to characterise the structure and function of the nervous system, identify meaningful patterns and support clinical decisions in dementia.
Combining Datasets | Day 1
The first day of the workshop focused on combining datasets. Dr Tim Ritmann, co-lead of the DEMON network imaging working group, started the day by talking about the opportunities and pitfalls of combining data from different cohorts. He highlighted pros and cons of combining data to increase statistical power versus using them separately to test the reproducibility of the findings and gave examples of study designs to align diagnostic groups and/or imaging protocols to make the most out of multiple datasets.
Our second talk was from Dr Esther Bron, Assistant Professor at the department of Radiology and Nuclear Medicine of the Erasmus MC ā University Medical Center Rotterdam and the new chair of the DEMON reproducibility and open science special interest group.

Dr Esther Bron emphasised the fact that the dementia population is very broad and often oversimplified, but by using multiple datasets we can take into account differences across diseases and in the population. Esther also underlined how objective validation is key to evaluate methods and how open science can serve this purpose.
As examples of objective validation she presented several āgrand challengesā in dementia, for the objective comparison of algorithms to answer different clinical questions (screening, diagnosis, prediction, monitoring).
Dr Richard Bethlehem, Research Associate at the Autism Research Centre and Brain Mapping Unit at Cambridge University, showed us how to use www.brainchart.io, an online resource to quantify individual differences in brain structure from any current or future magnetic resonance imaging (MRI) study against models of expected age-related trends obtained by combining over 100,000 scans from over 100 studies (preprint available here).
Dr David Cash, principal research fellow at the Dementia Research Centre in the UCL Queen Square Institute of Neurology, prepared a great Binder-ready repository (available on github and Binder) that attendees could quickly access to follow the tutorial at their end. The lecture covered various techniques for how to handle variability in imaging data arising from multi-site and multi-scaner acquisition. Starting with some example differences between scans from a single subject but acquired on a different scanner, David presented some visualizations of quality control and pre-processing metrics. He then described how to reduce or remove these differences by including site as covariate in a linear regression and by using a more advanced harmonisation approach, ComBat.
We further explored this last approach with Dr. Joanne Beer, postdoctoral researcher at University of Pennsylvania. In the last tutorial of Day 1, Joanne presented longComBat her R package to implement longitudinal ComBat, a method for harmonising multi-batch longitudinal data.
Combining Imaging Modalities | Day 2

The second day focused on combining imaging modalities. Dr. Elena Rodriguez-Vieitez, a senior researcher at Karolinska Institutet in Stockholm (Sweden), opened the day with an introductory talk on Alzheimerās disease and multimodal approaches
Dr. Rodriguez-Vieitez presented a beautiful series of studies that used PET and MR imaging in combination to study pathology progression and predict clinical decline in people with Alzheimerās disease. She discussed the pros and cons of implementing machine learning and data-driven algorithms on multimodal imaging data, and how in vivo findings can be related to post-mortem evidence.
The second talk was led by Faeze Heidari, a master student at the Iran University of Medical Science and one of the event organisers. Heidari introduced an approach to combine heterogeneous data sources for diagnosis of Alzheimerās disease, namely EasyMKLFS, to combine a huge amount of basic kernels alongside a feature selection methodology and pursuing an optimal and sparse solution to facilitate interpretability. EasyMKLFS outperforms baselines (e.g. SVM and SimpleMKL), state-of-the-art random forests and feature selection methods. The main goal of this approach is to find an optimal combination of the sources in order to improve predictions, given different modalities of neuroimaging and other clinical information,considering each source of information as a kernel. So using multiple kernels instead of a single kernel can improve the classification performance. Moreover, MKL allows the extraction of information from the weights assigned to the kernels. Therefore, applications of MKL to neuroimaging based diagnosis might help the discovery of biomarkers of neurological/psychiatric disorders. EasyMKLFS, automatically selects and re-weights the relevant information obtaining sparse models. A clear advantage of EasyMKL compared with other MKL approaches is its high scalability with respect to the number of kernels to be combined.

Dr. Ottavia Dispasquale, from Kingās College London, who gave an amazing tutorial on the use of the Receptor-Enriched Analysis of Functional Connectivity by Targets (REACT) method. In the presentation, she explored the use of fMRI in combination with PET imaging in the REACT analysis and also demonstrated the use of those data (the data used in the tutorial can be accessed here)
Dr. Andreas Schindele, fellow at University Hospital Augsburg, closed the last day of the workshop with a tutorial on ML for Alzheimerās disease diagnosis, in which he brilliantly demonstrated the use of Python on [18F]FDG-PET and fMRI analysis.
Organising Committee
The organising committee was formed by early career researchers: Ludovica Griffanti (University of Oxford), Faeze Heidari (Iran University of Medical Sciences), Luiza Machado (Universidade Federal do Rio Grande do Sul), Maura Malpetti (University of Cambridge) and Henry Musto (Goldsmiths University), supported by Director Prof David Llewellyn, Deputy Director Dr Janice Ranson, and Imaging Working Group Leads Dr Michele Veldsman and Dr Tim Rittman.
They comment on the experience:

āWe had great fun organising the event! We met online every Friday from 5 cities in 3 continents!ā
The organising committee would like to thank all speakers and attendees for the success of the educational and interactive sessions!
Missed The Event? | Explore Recordings From The Event Below
Combining imaging data using machine learning workshop – Day 1
- Tim Rittman,Combining Clinical Cohorts – pitfalls and opportunities
- Esther Bron, Cross-cohort validation of machine learning for dementia diagnosis and prediction
- Richard Bethlehem, Brain chart for the human lifespan
- David Cash, Harmonisation strategies for multi-centre imaging studies
- Joanne Beer, Harmonising longitudinal multi-scanner imaging data with Long ComBat
Combining imaging data using machine learning workshop – Day 2
- Elena Rodriguez-Vieitez, Data-driven multivariate approaches in PET imaging, and their application to understand heterogeneity in AD
- Faeze Heidari, Combining heterogeneous data sources for diagnosis AD
- Ottavia Dipasquale, āEnriching fMRI analysis with molecular imaging using REACT – Receptor-Enriched Analysis of functional Connectivity by Targetsā (talk and demo) demo:
- Andreas Schindele, Deep Learning for Alzheimer diagnosis combining FDG-PET and fMRI images with Python