Sunday, 12 August 2018

Impact of drought upon numbers of hoverfly species recorded

Ad-hoc, or opportunistic, data are always very difficult to interpret. There is no consistency in the method of recording, with the numbers of people involved varying from year to year. The relative skills of recorders will also change as people change interests, lose interest or, sadly, depart this world! So, all analysis must be accompanied with caveats.

I have been trying to make sense of this year's drought in the UK. Can we use the numbers of records? For this exercise I looked at the numbers of hoverfly records extracted from social media per week in the preceding 5 years and produced two graphs. One compares the years (Figure 1) and the other compares 2018 against the average for the preceding 5 years (Figure 2). It is clear from Figure 1 that each year is so different that it is almost impossible to make any comparison. It should be noted that the numbers of records in 2013 and 2014 were small in comparison with 2016, and that the numbers of records in late summer and Autumn 2016 were much greater than in any of the other 4 years. 2016 was the peak year for data extraction direct from the UK Hoverflies Facebook page, since when many of the most active recorders of the time have switched to maintaining their own spreadsheets. In addition, the autumn of 2016 was unusually warm and saw recording extend far longer than normal (into early November).

Figure 1. Numbers of hoverfly records extracted from social media between 2013 and 2018

There is more to be made of Figure 2 in so far as it is clear that the general trend is similar with peak numbers occurring in August. A clear drop in the numbers of records during the drought is also apparent, as is the approximate 2-week difference in the start of the season as a result of the length of the last winter. Nevertheless, the range in numbers of records between 2013 and 2017 is substantially skewed by the first year when the UK Hoverflies Facebook group was launched (mid summer 2013). In that year, there was relatively little activity if one bears in mind the levels of activity in 2016.

Figure 2. Numbers of hoverfly records extracted from social media in 2018 against an average for the previous 5 years (2013 to 2017)
So, we can detect a general narrative, but it would be unwise to rely simply on the numbers of records. Is there an alternative metric that might tell us more? I have previously discussed the effects of the drought on recorder activity (28 July 2018: Recorder activity - a possible proxy for looking at the impact of weather on datasets?). In that analysis it seemed that recorder activity had diminished at a time when it might be expected to see growth in activity. So, with diminished recorder activity it may be that the numbers of records is directly related?

This time, my attention turned to the numbers of species recorded. Again, the year by year totals vary hugely, making it difficult to place 2018 into context (Figure 3). When placed into the context of the 5-year average, 2018 does stand out quite markedly (Figure 4). I think the crucial point is that the overall trend for 2018 was similar to both 2016 and 2017 so I have also plotted 2018 against the average for 2015 to 2017, which are the three years in which recorder effort was similar to, or exceeded 2018 (Figure 5). The result work very nicely, with 2018 clearly fitting the 3-year average until the third week of June, when the numbers of species recorded crash. Numbers of species seem to be on the rise now. The rise is partly explained by the arrival of second generations of some species and possibly also the effects of the big mass occurrence event ten days ago.

The big question is whether the numbers of species recorded will recover by September? If not, we must also ask 'what will be the knock-on effects into 2019 and how can we establish whether any reduction in numbers arises because of the 2018 extreme weather?

Figure 3. Numbers of species recorded from social media in the years 2013 to 2018. Note that whilst the totals for 2013 and 2014 are lower than those for later years, the disparity is not as great as in the numbers of records (Figure 1).
Figure 4. Numbers of species recorded in 2018 compared with the 5-year average from 2013 to 2017. Using this metric we might assume that 2018 was actually species-rich until the crash in June; however, the effects of lower levels of recording in 2013 and 2014 are clear when the graph excludes these years (Figure 5).

Figure 5. Numbers of species in 2018 compared with the 3-year average for 2015 to 2017. Using this metric it seems that 2018 was broadly comparable with previous years until the third week of June when numbers crashed. There seems to be recovery in the last week, perhaps as a result of cooler wetter conditions.


  1. Interesting as ever, Roger. If I understand correctly, the numbers are based on date of extraction from social media by you or collaborators, and not on date of submission (though that may be very close) or the date of observation. The numbers for the latter evolve, of course. For example, although due to travel and other commitments, I haven't posted any Syrphidae records for many months, I have still recorded and photographed when I could. Eventually, these will be submitted, leading to a tiny increase in the numbers per week based on observation date. No doubt many other observers wait till "quiet times" (like winter) before asking for ID help (as has been requested in the past) from which you create records. It will be interesting to see the 'evolved' graphs based on observation date in, say, 6 month's time.

  2. The dates used are the actual dates of the record and NOT the date submitted or extracted - that would produce utter crap! What sort of numptie do you take me for - I do know how to manage data and have spent 30 years doing so!

    They are data extracted by me from Facebook, Flickr and iSpot up until 2016; after which Ian Andrews has extracted iSpot and Geoff Wilkinson extracted data from a small suite of FB contributors. I extract the rest - in 2016 that was about 25,000 records to species and 7,000 records to gender only (i.e. about 1,000 hours work per year). Since then numbers have dropped as many members maintain their own spreadsheets and submit the data at the end of the year. As far as I am aware no other scheme does this - which makes the HRS unusual in the sense that it can produce 'real-time' data analysis.

    Records comprise on each line the species, date recorded, grid reference, locality name, Recorder, Determiner, stage/gender, abundance, URL for the post and then notes such as flower visit etc. They are the same data as would be required by any sensible biological recording scheme.

  3. Paul does have a point, if your comparing incomoplete 2018 data with previous years completed data, then the results might be skewed by the delay in submission corresponding with peak 'backlog' amongst recorders. You could implement an equivalent August submission cut-off for prior years, to at least partially account for this, but in the end I think it reinforces your core point - casual records are hard to interpret.

    1. I'm afraid not Matt. These data are STRICTLY data extracted from Facebook. 98% of records submitted this way arrive within two days of the date of recording. That is why I use the FB data - because it is statistically as robust as one can get. It is not about about absolute numbers but the shape of the graph.

      The full data do arrive later in the year but are a very different composition and this it is about comparing like with like i.e. data collected in the same way - in the same way that Birdtrack operates.