Statistical Machine Learning
for ocean surface current prediction
Supervised by Dr Adam Sykulski and Dr Marina Evangelou at Imperial College London. Funded by the EPSRC Centre for Doctoral Training in Modern Statistics and Statistical Machine Learning (StatML).
Multivariate probabilistic prediction of ocean surface currents using NGBoost
Developing multivariate extensions of Natural Gradient Boosting (NGBoost) for probabilistic regression in spatiotemporal systems. The core challenge is learning structured uncertainty over correlated velocity components from surface drifter observations that are sparse, irregularly sampled, and contaminated by inertial oscillations.
Methodological contributions
- Robust data interpolation and processing pipelines for drifter and level 4 data products
- Extension of O'Malley et al. (2023) by incorporating sea surface temperature gradients as covariates
- Rigorous uncertainty quantification, calibration, and out-of-sample diagnostics
Context & motivation
- Ocean surface currents drive heat and carbon transport; accurate prediction is crucial for climate modelling
- Probabilistic forecasts are essential for uncertainty quantification
- Presented work-in-progress at Changing the Face of Science Symposium 2025
Statistical theory
& ocean model evaluation
Two 2.5-month research projects completed during the first year of the CDT, each resulting in theoretical or methodological contributions.
Robust models for time-varying signals: PDF derivation for complex random variables
Working alongside the paper Robust CDF-Filtering of a Location Parameter (Catania, Harvey & Luati, 2025), this project derived a general form of the probability density function for complex random variables under symmetry and regularity conditions. I produced a thorough proof validating the PDF for a weighted infinite sum of i.i.d. uniform random variables, providing additional proof details absent from the working paper, and derived a general PDF for an infinite sum of i.i.d. symmetrically beta distributed random variables, establishing conditions on distributions and weights for existence.
Key results
- General PDF under symmetry conditions for complex random variables
- Proof of PDF for weighted infinite sum of i.i.d. uniform variables
- General PDF for infinite sums of i.i.d. symmetrically beta distributed variables with existence conditions
Related work
- Catania, Harvey & Luati (2025) - Robust CDF-Filtering of a Location Parameter
- Score-driven models for time-varying parameters
Physically interpretable error metrics for ocean surface current models
Proposed a system of model evaluation and uncertainty metrics for a phenomena-centred approach to ocean surface modelling, aimed at maximising information gain from observational ocean data. The framework was designed to connect statistical model diagnostics to physically meaningful quantities, a gap in existing evaluation methodology for data-driven ocean models.
Contributions
- Proposed evaluation metrics grounded in oceanographic phenomena
- Demonstrated practical application using drifter velocity model examples
- Showed added value of physically interpretable metrics over standard statistical measures
Dissemination
- Poster and presentation at SIAM UKIE Annual Meeting 2024, University of Manchester
- Awarded SIAM UKIE Student Travel Award
- Invited to Uncertainty Quantification Workshop, National Physical Laboratory (2024)
Other research
(because it's fun)
Alongside my formal PhD work I pursue original questions in recreational mathematics. I investigate new ideas often, and occasionally one of them turns into something worth presenting.
On sequences of consecutive odd semi-primes
A semi-prime is a positive integer that is the product of exactly two primes (not necessarily distinct). This project investigates sequences of consecutive odd semi-primes, characterising their structure, identifying patterns, and exploring upper bounds on their length. The question arose from a prior investigation into consecutive semi-primes (including even ones), and the restriction to odd numbers introduces substantially richer arithmetic structure.
Questions investigated
- What is the longest possible chain of consecutive odd semi-primes?
- What arithmetic conditions determine where such chains can and cannot occur?
- How does the density of odd semi-primes relate to chain length?
Dissemination
- MathsJam Annual Gathering 2025
Further recreational work in progress
The ideas I am currently interested in include extensions of my work on arithmetic sequences of semi-primes and the relationship between curve-stitching and cyclic groups. Get in touch if you'd like to hear more or work together!
Mathematics education
& university pedagogy
My experience teaching mathematics at the London School of Economics across analysis, algebra, algorithms, and more has raised questions I want to investigate formally. This is a nascent but growing part of my research identity.
Further pedagogical work
Ideas in development. If you work in mathematics education research or are interested in collaborating on questions around undergraduate or postgraduate pedagogy, I'd be very glad to hear from you.
Current supervisors &
collaborators
Imperial College London · EPSRC CDT StatML
Specialisms: Time series analysis, ocean statistics, spectral methods
Specialisms: Statistical machine learning, biostatistics
Get in touch
I'm always happy to talk about my research, potential collaborations, or questions at the intersection of statistics, oceanography, and mathematics education. I'm particularly interested in connecting with others working on probabilistic machine learning for physical systems, recreational mathematics, or university pedagogy.
vanessa@vanessamadu.com