“A pre-established perfect route to data science does not exist” – my path into neuroepigenomics

Written by Dr Sarah Marzi, Experimental Models working group lead

My route to my current position as group leader in neuroepigenomics at the Imperial UK DRI was rather unusual. I started out studying mathematics at the University of Freiburg. This was the time before Germany introduced Bachelor’s and Master’s programmes. There was only the ‘Diplom’ and you were in it for the long run (typically 5-6 years). At the same time, I had always been fascinated by the human brain, so halfway through the maths degree, and after an exciting year as exchange student in Rome, I additionally started an undergrad in psychology, while shifting the focus of my maths degree from algebra and geometry to applied statistics and epidemiology. It was during this time that I had my first encounter with the world of epigenetics – my future research area – via a somewhat obscure paper applying Hidden Markov Models to classify CpG islands. I fell in love instantly and when after a series of applications an opportunity opened up to work on complex and neuropsychiatric disease epigenetics for my PhD at King’s College London, I was all in.

My PhD peer group at graduation.

I absolutely loved my PhD. I got to work primarily on the data science side of various large and unique epigenetic studies from population-based cohorts as well as post-mortem brain samples. The community of PhD students at my institute was outstanding and really became my scientific family. The two things I would highlight in terms of skill development: 1) Because I had a maths background everyone expected me to have programming skills, but this was absolutely not the case before embarking on the PhD. I had focussed on pure maths for large parts of my degree and even the ‘applied statistics’ looked heavily theoretical in hindsight. It did involve one dreadful piece of SAS code I had written for a simulation study, but which I would gladly bury and forget! Nonetheless, I managed to build respectable coding skills throughout my PhD and would encourage other ‘outsiders’ not to shy away from bioinformatics just because they have not encountered it before. However, if you get the opportunity (note to my coding-resistant 19-year-old undergrad self): take any chance to learn coding early on, it’s one of the most useful skills to build! 2) I was incredibly keen to gain some wet lab experience during my PhD. Having essentially dropped any biology after 10th grade (age 15), I always feel like a slight impostor in my field. Pushing for a chance to work in the lab really paid off in my case. Thankfully I had extremely supportive supervisors (eternally grateful to Jon Mill and Leo Schalkwyk) and got to work and learn from an amazing postdoc (shout-out to Teodora Ribarska, who I consult to this day for tricky protocol optimization questions). I got to undertake what arguably became the most exciting project of my PhD – and extending well beyond – by studying histone acetylation in post-mortem brain samples from AD cases and controls.

In the whirlwind years since finishing my PhD, I joined Vardhman Rakyan’s lab at Queen Mary University of London, working on developmental programming, ribosomal DNA, and epigenetic correlates of obesity. I learned so much in thinking deeply about functional elements of gene regulation and working on lots of new and exciting technologies, including nanopore sequencing. By luck of timing and fit, I managed to get an Edmond and Lily Safra Research fellowship at the UK Dementia Research Institute (DRI) at Imperial College to start my own group at the end of 2019. Setting up the lab has been very exciting but also rather intimidating. Particularly, since the covid-19 pandemic hit only a few months into my fellowship. What really helped was that I didn’t have to do it all on my own. Our DRI centre has been recruiting a whole set of new group leaders in neurogenomics who I was lucky to start with. One of them is a fellow DEMON working group lead: Nathan Skene, who leads the genetics and omics working group. Going through the challenges of figuring out how to be a group leader together has been immensely helpful. I have since recruited my first lab members, who are absolutely brilliant, and after a series of rejections even won some funding. We are now slowly coming back to working on site and after so many months of zoom and social deprivation have started organising a series of lab retreats and activities to really bring together the labs, establish connections and foster our newly formed scientific community. This has been so important after all the virtual confinement.

Our first in-person lab retreat with the newly established neurogenomics groups at the UK DRI at Imperial College, featuring the Marzi and Skene labs.

When I first learned about the DEMON network in the relatively early days, I was really keen to join and lucky enough to be appointed as lead for the experimental models working group. Our working group are particularly interested in bringing robust data science and AI approaches to the field of experimental medicine, to improve reproducibility and cross-model translation in the field. It has been an exciting journey developing the network and working groups and interacting with many talented and enthusiastic scientists from around the UK and worldwide. It is so nice to see the first series of position papers and collaborative projects come to life this summer. More recently, I was thrilled to have been appointed officially as Emerging Leader of the UK DRI. This recognition feels like the next step in establishing scientific independence and a validation of the team and science we are building. 

To finish up, I thought I would share some advice with aspiring data scientists: What I firmly believe is that to be a great data scientist it is crucial that you combine statistical and programming skills with field specific knowledge – in my case molecular genomics and neuroscience. A pre-established perfect route to data science does not exist. I have met excellent data scientists both from quantitative (maths, physics, computer science) and life science/medical backgrounds. What it really takes is a strong drive to learn and practice. It’s never too late to join the field!

<< Previous Post Next Post >>