Tuesday, 22 May 2018

What do we want data for?

It is all very well me moaning – that gets nobody anywhere and generates a feeling of ‘what a miserable old git’. So, if one is going to moan, one really must do something positive to justify the moans. Hopefully the effort I make for the HRS goes some way to addressing this imbalance!

The key to resolving the real (or perceived) issues with increasing demands to generate more data is to try to understand who wants the data and the uses to which they wish to put them? At a strategic level it seems to me that there are a small number of critical uses:

  • An ongoing audit of the composition of the UKs wildlife assets – what is new and what might have been lost.
  • Where is that wildlife? Data is needed at a variety of scales depending upon uses; these may range from 10km resolution for national maps to 1km, 100m or even 1m resolution for more local usage.
  • Where are the conservation priorities? If based on IUCN criteria what is needed is data that are sufficiently refined to inform analysis for Red Lists, both national and local. However, we also have BAP and Section 41/42 species to consider too. 
  •  Why is our wildlife asset expanding/declining? Use of data to inform the debate about how wildlife assets are changing is response to a wide variety of environmental factors. 
  • Reporting on the status of a small sub-set of our wildlife for national/international audit and site safeguard purposes.

There may be other obvious reasons for needing data, but these five bullet points seem to me to encapsulate the main issues. There are, within those headlines, several common themes so the headline reasons might be trimmed further.

Data needs


If there are defined reasons for needing the data, then the next stage is to determine what data are needed? Do we simply want any old records, or do we want something more structured? Well, ideally, science would be best served by data collected under a random stratified process such as employed in the Breeding Bird Survey. Data collected according to other set protocols such as those employed by WeBS counts, the Rothampstead Moth Survey or Butterfly Transects are also very powerful.

The main drawback of these structured programmes is that they may overlook a proportion of our wildlife, so we also need something else: a way of ensuring that highly localised and specialised species are recorded on at least an intermittent basis. Structured programmes address a tiny fraction of the 50,000+ organisms known from the UK; so, an alternative is needed. This is where the use of ad-hoc or ‘opportunistic' data come into play. Such data might include records of protected species or casual sightings from gardens, but they can (and often do) involve something more useful.

So, what makes data really useful?


Meaningful interpretation relies on data that have been accumulated over a long timescale. A single record of an animal or plant is meaningless unless there is something to place it into context. For example, a record of a rare beetle from a given site might or might not imply a breeding population. A single record of the same beetle from a site where it has been recorded on 100 previous occasions suggests that there has been a resident population and that this population is still present (to some degree).

So, for data to be robust and useful it needs to be part of a much bigger picture. That picture might be created by a single person visiting a single site for a given timeframe; or it might be the same site visited by multiple people over the same timeframe. Crucially, if everybody who visits the site records a full list of what they see and can reliably identify, the sum of those data become very powerful. This is the principle that underpins BirdTrack, for example. It can be used to look at trends, when combined with data from other locations too. We have thus established two further critical points:

  • To be most useful, submission of complete species lists needs to be encouraged – rather than just the report of a single supposedly rare species. The wider list provides the local and longer-term context. 
  • Combinations of complete lists by different recorders can be used to investigate trends. The power of the data increases with the numbers of recorders making submissions, so the number of recorders submitting full lists becomes a critical differentiator.

We then reach the issue of composition of species lists. If lists are compiled by recorders with a limited grasp of a given group, they will inevitably be short. Much longer lists will be supplied by specialists in that group, and the combined lists provided by those specialists will be considerably more powerful because they provide so much more contextual information.

Real data needs


We have therefore arrived at the critical stage in developing a strategic approach to data collection. What we need is for all recorders to record everything they see, but for them also to develop sufficient specialism to provide important context. If a dataset comprises records compiled by generalists then it will be heavily skewed towards the common and easily recognisable. If on the other hand, the data are supplied by specialists who cannot be bothered with the common and easily identified, then there will be a different skew. Neither is helpful!

We need, however, to inject an element of practical reality into this analysis. There are currently lots of generalists and relatively few specialists, so the data are inevitably skewed. We need to change this imbalance by focussing on why there are so few specialists and what is preventing people from deepening their breadth of coverage. I submit that at least part of the problem lies in 30+ years of the mantra take nothing but photographs, leave nothing but footprints. There is a new generation that is naturally resistant to taking specimens (quite understandably). It will be a brave leader to take on this challenge, but without such an approach, there will always be an imbalance in the taxonomic coverage.

There is hope, however. What is needed is a higher profile effort to show how data can be used and to show the power of mass data collection. Over the last few years I have tried to do this in my blog, but one person will have little effect unless they are influential. So, if we want a ‘Citizen Science Revolution’ we really need to educate the potential contributors, so they understand what is important and what is not important. Those that get the general message ‘more volume’ will help to address numerical targets, whilst those who get the message ‘depth and breadth’ will help to make a real difference.

The trick is, how to shift effort towards better structured data assembly without alienating those who want to contribute but don’t want to become a dedicated recorder. One obvious way is to promote the adoption of a local ‘patch’ and to encourage regular/constant effort at differing levels of intensity.

1 comment:

  1. Hi Roger, while looking through citations of data on GBIF i came across this paper: https://academic.oup.com/sysbio/advance-article-abstract/doi/10.1093/sysbio/syy044/5034972?redirectedFrom=fulltext

    Might be of interest. Unfortunately, i don't have a Oxford account ot read it in full. You may have more luck..?

    ReplyDelete