Blog
2026
Most labeled datasets for environmental and geospatial applications, especially those that represent true “ground truth” from field measurements, are collected in fairly limited regions. Whether I am reading research papers or reviewing student proposals in my machine learning and spatial data class, I see one common mistake: adding latitude and longitude as features to machine-learning models. The trouble is that adding lat/lon does seem to improve classification, regression, or estimation performance on these limited datasets. Reported metrics go up, errors go down, and everything looks better on paper. So why do I call this a mistake? Because in many cases, the model isn’t learning the underlying environmental process at all—it’s memorizing location. It is learning how to map lat/lon to the target.
Most labeled datasets for environmental and geospatial applications, especially those that represent true “ground truth” from field measurements, are collected in fairly limited regions. Whether I am reading research papers or reviewing student proposals in my machine learning and spatial data class, I see one common mistake: adding latitude and... Continue reading.
2025
There has been considerable excitement, along with reasonable amounts of skepticism, around geospatial foundation models and so-called Earth embeddings. These models promise reusable representations of places learned from large, heterogeneous datasets, and they are increasingly treated as a general-purpose building block for downstream geospatial machine learning tasks. Yet one question kept resurfacing: what actually happens when static Earth embeddings are used in a higly dynamic estimation problem?
There has been considerable excitement, along with reasonable amounts of skepticism, around geospatial foundation models and so-called Earth embeddings. These models promise reusable representations of places learned from large, heterogeneous datasets, and they are increasingly treated as a general-purpose building block for downstream geospatial machine learning tasks. Yet one question... Continue reading.
2022
Forecasting COVID-19 geographic spread is a challenging task. Reliable forecasting is crucial, as it is used in resource allocation and allows local authorities and health officials to implement timely interventions.
Forecasting COVID-19 geographic spread is a challenging task. Reliable forecasting is crucial, as it is used in resource allocation and allows local authorities and health officials to implement timely interventions. Continue reading.