Sunday, 27 March 2016

Smoothing data

I quite frequently use a sequence of processes to smooth data. In this short sequence I hope to show how the data change as a consequence of the process.

In many cases day-on-day records for individual species are too few or too volatile to be particularly meaningful. This is especially true in early spring and to a lesser extent during weekdays in more productive times of year. The solution is therefore to break data into blocks. In this example, I have used Eristalis pertinax, a very abundant spring species that starts to dominate the data from around the end of March. I have extracted photographic data for the past three years (2013, 2014 and 2015) to provide a bit more context. In each of the three years, the data have gained strength because there are more active recorders and therefore more records from all parts of the country (especially northern England and Scotland.

The data I have used have been split into three zones - based on 200km sections of the OS grid that roughly equate to:
  • South of a line between the Thames and the Severn
  • The Midlands between the Thames-Severn line and a further line between the Humber and the Mersey.
  • North of the Humber-Mersey line.
  • Occasionally, I add a further division north of the Solway, but on this occasion I have not.

The first stage is to generate data for each successive week (Figure 1). As can be seen, the data vary hugely from week to week and inevitably the numbers of records differ for each zone and for each year. Over the three successive years it is very clear that the volumes of data have increased dramatically, which makes it a bigger challenge to put the successive years into context.
Figure 1. Weekly records for the years 2013 to 2015 for Eristalis pertinax according to three zones of latitude.
From this, I construct a further table in which individual weeks' data are converted into a percentage of the total records for the zone in that year. Again, the results tend to be a bit volatile, but they now equate to one another because the effects of the massively greater numbers of records from the south are put into context (Figure 2).
Figure 2. Proportion of records of Eristalis pertinax  for each week according to year and zone.
Finally, I run a three-week centred running mean for the data created as percentage of total records. This finally smooths the results because the combination of each three week period is averaged. This removes the idiosyncrasies of big gains and falls in the data and picks out the overall trend - either rising numbers of records or falling numbers of records (Figure 3).

Figure 3. Three-week centred running mean for records of Eristalis pertinax for each zone in the years 2013 to 2015.
The resulting graphs clearly show how the overall phenology pattern differs according to latitude, and how emergence times also vary from year to year. In this example, the indications are that E. pertinax emerged a little earlier in 2014 than in either 2013 or 2015, with the possible exception of the North in 2013. Unfortunately, the dataset for the North in 2013 was rather sparse and therefore I would treat the result with a little caution. Nevertheless, the data do help to show year-on-year regional and latitudinal variation.

The results also show how emergence times are far more protracted in the Midlands and in Southern England, whilst in the northern zone emergence times are far more compressed (hence the higher peaks in 2014 and 2015).


This sort of analysis is only possible for species where there are substantial blocks of data. Nevertheless, it is clear that photographic data can be used for some forms of monitoring and quite remarkable levels of precision in representing species' phenology.

If the numbers of recorders in more northerly regions improves, then it may be possible to break data into four zones in future, thus providing the opportunity to investigate a much wider range of parameters. There may also be scope to split the country further into east and west sub-units of each zone - something I may try in due course.

