Wednesday, 23 May 2018

Data requirements for occupancy modelling


In the past ten years, several models have been developed to make use of 'ad-hoc' or ‘opportunistic data’. They are regularly used in analyses of trends in Britain’s wildlife and are the black boxes behind the banner headlines of x or y changes in the abundance of Britain’s wildlife (substantially declines). The processes are complicated, so my brief description is necessarily short and open to correction by those in the know. However, for basic purposes of explaining how different datasets perform, the following may be useful:

These models take existing data and use them to predict where a given species might occur. To do so, they develop a list of the species that occur in surrounding squares that contain similar land-cover characteristics. The lists will comprise a mixture of those species that might be expected almost everywhere, those that are more specialised but are still widespread and abundant, and scarcer species that have more demanding ecological parameters.

The completeness of coverage of surrounding squares will determine the degree to which a model can predict presence or absence. It has been assumed that models will smooth out irregularities in recording effort, but I have felt for a very long time that they will be affected by the composition of species lists. If the list is complete, there is more chance of predicting the presence of scarcer species or of species that are difficult to identify. On the other hand, incomplete lists will make it more difficult for the model to identify critical ecological factors and species will not be predicted.

Crucially, a test of whether a list is complete will depend upon those species that occur consistently across the landscape. There are arguably three classes of species that fall into this category:

  • readily recognisable species that almost everybody records;
  • species that are difficult to find but are still very widespread and are therefore less well recorded; and
  • species that are very widespread, but difficult to identify and hence are under-recorded.

If a species list contains all the above species, it can be assumed that it is comprehensively recorded. The shorter the list of these ‘constant’ species, the less well it is recorded. The problem that dogs these models is the issue of completeness of coverage. So, inevitably, if coverage is weak, the models will have trouble predicting presence or absence. This shows up quite well in models covering, say, the west coast of Scotland where there has been very little recording at any time. At the moment, I am unconvinved that we really know what the constants are amongst the taxonomically challenging parts of our wildlife.

So, the question then arises:

What can we do to improve the accuracy of predictive models?


Readers who use BirdTrack will be aware that the system requires the recorder to say whether they have submitted a complete or partial submission. If your list only notes the rare and unusual, it is not included in the analysis, and likewise if there were species that you were unable to identify then the list is incomplete and should not be included in the analysis. BirdTrack takes opportunistic recording one step closer to providing the robust data that occupancy models need to deliver reliable results.

In most other taxonomic groups, ‘opportunistic’ data is a complete hotchpotch of complete lists and casual single records. All have an important role to play because they all help to fill in little parts of the jigsaw. But, of course, if a visit is made to a site and only part of what was seen is reported, then the model only has part of the species list to work with. Repeat visits by a range of recorders will fill in some of the gaps over time, but unless the range of recorders includes people who tackle the tricky species, the lists will always be incomplete, and the model will inevitably have less to work on.

So, if we want to improve the accuracy of predictive models, the answer is quite simple. We need to improve overall coverage, both in terms of geographical extent and in terms of depth of species composition. This is one reason why a general call for more recording may not have the desired effect; indeed, it could compound model shortfalls by focussing on a larger volume of the easily identified species and give the impression that more challenging species are declining or declining at a faster rate than they actually are. 

I have shown in previous posts how the trend for Portevinia maculata has sharply altered upward since photographic recording became the preferred recording medium. The Portevinia maculata model, however, illustrates a second issue. It was probably greatly under-recorded and is now much better recorded. So, the army of recorders who have looked for it and added new squares have made an important contribution to our knowledge of its true distribution. So, there are definite benefits from certain increases in recorder effort.

It therefore follows that if one of the significant objectives of biological recording is to improve our knowledge of the distribution and status of Britain’s wildlife, we need to think about how to improve the data that underpin these predictive models. These models were used to produce the maps in the WILDGuide to hoverflies and doubtless in other guides too. So, there are also benefits to the avid recorder if the models are improved - the next generation of guide books should be more accurate!.

Thus, rather than a general cry for more data, I think the new cry should be – complete lists please? Or, if you are not one for retaining specimens, do please try to ensure that your coverage is as complete as possible. We have seen a strong shift in this direction in the UK Hoverflies Facebook group and it is much welcomed. I think this shift illustrates two important points:

  • More active group members have developed the ability to create such lists; and
  • These members have developed the key skill of logging all observations rather than just a checklist of the unusual.

Whatever your interest in wildlife recording, it is worth thinking about the added value of full species lists. They will make a difference.

Tuesday, 22 May 2018

What do we want data for?

It is all very well me moaning – that gets nobody anywhere and generates a feeling of ‘what a miserable old git’. So, if one is going to moan, one really must do something positive to justify the moans. Hopefully the effort I make for the HRS goes some way to addressing this imbalance!

The key to resolving the real (or perceived) issues with increasing demands to generate more data is to try to understand who wants the data and the uses to which they wish to put them? At a strategic level it seems to me that there are a small number of critical uses:

  • An ongoing audit of the composition of the UKs wildlife assets – what is new and what might have been lost.
  • Where is that wildlife? Data is needed at a variety of scales depending upon uses; these may range from 10km resolution for national maps to 1km, 100m or even 1m resolution for more local usage.
  • Where are the conservation priorities? If based on IUCN criteria what is needed is data that are sufficiently refined to inform analysis for Red Lists, both national and local. However, we also have BAP and Section 41/42 species to consider too. 
  •  Why is our wildlife asset expanding/declining? Use of data to inform the debate about how wildlife assets are changing is response to a wide variety of environmental factors. 
  • Reporting on the status of a small sub-set of our wildlife for national/international audit and site safeguard purposes.

There may be other obvious reasons for needing data, but these five bullet points seem to me to encapsulate the main issues. There are, within those headlines, several common themes so the headline reasons might be trimmed further.

Data needs


If there are defined reasons for needing the data, then the next stage is to determine what data are needed? Do we simply want any old records, or do we want something more structured? Well, ideally, science would be best served by data collected under a random stratified process such as employed in the Breeding Bird Survey. Data collected according to other set protocols such as those employed by WeBS counts, the Rothampstead Moth Survey or Butterfly Transects are also very powerful.

The main drawback of these structured programmes is that they may overlook a proportion of our wildlife, so we also need something else: a way of ensuring that highly localised and specialised species are recorded on at least an intermittent basis. Structured programmes address a tiny fraction of the 50,000+ organisms known from the UK; so, an alternative is needed. This is where the use of ad-hoc or ‘opportunistic' data come into play. Such data might include records of protected species or casual sightings from gardens, but they can (and often do) involve something more useful.

So, what makes data really useful?


Meaningful interpretation relies on data that have been accumulated over a long timescale. A single record of an animal or plant is meaningless unless there is something to place it into context. For example, a record of a rare beetle from a given site might or might not imply a breeding population. A single record of the same beetle from a site where it has been recorded on 100 previous occasions suggests that there has been a resident population and that this population is still present (to some degree).

So, for data to be robust and useful it needs to be part of a much bigger picture. That picture might be created by a single person visiting a single site for a given timeframe; or it might be the same site visited by multiple people over the same timeframe. Crucially, if everybody who visits the site records a full list of what they see and can reliably identify, the sum of those data become very powerful. This is the principle that underpins BirdTrack, for example. It can be used to look at trends, when combined with data from other locations too. We have thus established two further critical points:

  • To be most useful, submission of complete species lists needs to be encouraged – rather than just the report of a single supposedly rare species. The wider list provides the local and longer-term context. 
  • Combinations of complete lists by different recorders can be used to investigate trends. The power of the data increases with the numbers of recorders making submissions, so the number of recorders submitting full lists becomes a critical differentiator.

We then reach the issue of composition of species lists. If lists are compiled by recorders with a limited grasp of a given group, they will inevitably be short. Much longer lists will be supplied by specialists in that group, and the combined lists provided by those specialists will be considerably more powerful because they provide so much more contextual information.

Real data needs


We have therefore arrived at the critical stage in developing a strategic approach to data collection. What we need is for all recorders to record everything they see, but for them also to develop sufficient specialism to provide important context. If a dataset comprises records compiled by generalists then it will be heavily skewed towards the common and easily recognisable. If on the other hand, the data are supplied by specialists who cannot be bothered with the common and easily identified, then there will be a different skew. Neither is helpful!

We need, however, to inject an element of practical reality into this analysis. There are currently lots of generalists and relatively few specialists, so the data are inevitably skewed. We need to change this imbalance by focussing on why there are so few specialists and what is preventing people from deepening their breadth of coverage. I submit that at least part of the problem lies in 30+ years of the mantra take nothing but photographs, leave nothing but footprints. There is a new generation that is naturally resistant to taking specimens (quite understandably). It will be a brave leader to take on this challenge, but without such an approach, there will always be an imbalance in the taxonomic coverage.

There is hope, however. What is needed is a higher profile effort to show how data can be used and to show the power of mass data collection. Over the last few years I have tried to do this in my blog, but one person will have little effect unless they are influential. So, if we want a ‘Citizen Science Revolution’ we really need to educate the potential contributors, so they understand what is important and what is not important. Those that get the general message ‘more volume’ will help to address numerical targets, whilst those who get the message ‘depth and breadth’ will help to make a real difference.

The trick is, how to shift effort towards better structured data assembly without alienating those who want to contribute but don’t want to become a dedicated recorder. One obvious way is to promote the adoption of a local ‘patch’ and to encourage regular/constant effort at differing levels of intensity.

Monday, 21 May 2018

Is a shortage of biological records the real problem?


Having been alerted to Chris Packham’s call for a wildlife recording revolution, I put on my ‘Mr Grumpy’ hat yet again! It seems to me that there are for ever calls for more ‘citizen science’ and a mass ‘call to arms’ amongst those with limited expertise and masses of enthusiasm. We saw this with the FoE Great British Bee Count and saw its results so neatly expressed in comparative maps produced by BWARS. We have also had an ongoing chorus of effort to increase biological recording through various NBN and OPAL initiatives.

Those of us at the very sharp end of biological recording (i.e. the Recording Schemes) are fully aware that there is a general belief that there are insufficient data. As scheme organisers we ought to be extremely grateful for the raised profile and the flood of incoming records. It therefore feels ungrateful to be saying anything negative but, as is my wont, I feel I do have to say something. Unlike most people I am no ‘shrinking violet’ – I say what needs to be said and am doubtless dismissed as a ‘moaning Minnie’. I’ll bet the groan goes up ‘oh hell, Morris is at it again – I wish that b….r would just shut up and let us get on with generating records’.

Unfortunately, somebody needs to say something because there seems to be a belief that there is a magical expert tree. If the records are there the experts will crawl out just clamouring to deal with them. Well, I don’t see a great deal of evidence for this. There seem to me to be two groups within ‘expert’ circles: those who will engage and those who stick their heads down and avoid any contact with ‘citizen science’. Thus, the actual numbers of specialists who can assist in delivering reliable data are painfully small, and there is a big danger that as demand for their service increases they in turn get so worn out that they don’t want to engage.

A serious discussion is needed


Before we rush into a clamour for more biological records, perhaps we should ask ourselves why we want them and how they are going to be used? That bit of the circle does not seem to have been properly thought through.

When biological recording took off in the 1960s it was all about biogeography – mapping projects. We did not have a clue as to the distribution and abundance of wildlife and the first simple step is to map it and then to look for patterns that relate to environmental factors such as land-cover, lat/long, hard and drift geology, hydrology etc. This first step has very largely been achieved, but today we are also able to link it to climate envelopes and to chart changes that result from climate warming.

In the 1980s there was a real push to develop a way of expressing ‘rarity’. Various Red Data Books emerged. Having had a hand in some of the invertebrate projects I think the best we can say is that at the time we had limited information and at least some of the statuses attributed to species were way out! Over time, we have seen statuses revised and refined; but we have also seen how statuses can change quite dramatically over relatively short periods of time. So, one additional purpose of biological recording must be about monitoring and creation of a feedback loop.

More recently, powerful computing has facilitated a flurry of interest in modelling using a variety of Basian techniques. In theory, modern occupancy models smooth out irregularities of sampling intensity; however, Stuart and I now have robust evidence that the limited spread of most biological recording is skewing outputs. Yes, all models and all datasets show major declines, but the steepness of the decline and the breadth of the decline is affected by the depth of taxonomic coverage. Very little thought is being given to the depth and breadth of records issue.

This brings me on to the critical biological recording bottleneck. As I see it, the problem is not a lack of biological recording. This must be the golden age of biological recording, with datasets growing at unprecedented levels. For the Hoverfly Recording Scheme we have seen record volumes grow from between 20-25,000 a year between 1980 and 2010, to around 60,000 a year since 2016. But it has come at a cost – both Stuart and I spend a great deal more of our lives running the scheme, and we have had to recruit five new assistants to help meet demand. We are still operating at full capacity and if we want to step back and retire (which we do) we must find somebody to take on the central roles. That is easier said than done. I am sure other Recording Schemes find themselves in a similar boat!

So, it is all very well making a call to arms for more recording, but please remember that the whole of the biological recording process is dependent upon a miniscule group of willing technical specialists (‘experts’). That group is not expanding at the same rate as the capacity to generate data. Real ‘expertise’ only develops over many years and after careful analysis of the full range of taxa within one’s subject group. Weeks, months, years of peering down a microscope, comparing preserved specimens, thinking about better ways of identifying species are required to be capable of providing the know-how to ensure that datasets are reliable. These are not skills that can be replaced by a computer (at least yet).

So, the real debate must be about how we meet the demand for reliable data? How do we make the prospect of spending many hours a year validating datasets and providing determinations a desirable thing to do? Most people have partners and families who won’t thank them if they disappear off for hours on end running a recording scheme. Many people with a passion for a technical area won’t want to be bothered checking the umpteenth photograph of a tricky fly, bee or beetle that they know will only be given a reliable determination from a preserved specimen. Indeed, there is still outright antipathy towards ‘citizen science’ amongst a not inconsiderable part of the technically savvy.

Some ideas


We therefore need to make the science of photographic ID and ‘citizen science’ more attractive to people who are inclined to become ‘alpha taxonomists’. I have droned on for a long while that we need to be thinking about a new discipline of ‘live animal taxonomy’.

If I was freed up from the HRS, I would certainly want to put the past ten years’ experience to good use and produce a very different complete guide to Britain’s hoverflies – I have lots of ideas but no time. I think there is wider scope for a Europe-wide project that might have been led by a mixed team of German, Dutch, UK specialists. That would have been a viable European project. Sadly, Brexit has blown that one out of the water – there is no chance of UK funding for such projects and we would bring nothing to the table other than our own willingness to participate – we would be totally dependent upon European money and are soon to be on the outside.

The other big challenge is to look at the composition of Recording Scheme organisation. If record flow increases, we must generate a bigger circle of people involved in the various aspects of data management. Data management is the big drudge job in its various forms, including active data farming (from Facebook), data gathering (from independent recorders) and data verification; plus, of course actually managing the database and importing data.

There are three areas where there has, however, been real progress. Firstly, we have seen that Facebook groups are great media for mentoring basic ID skills. I have been greatly impressed with the progress of several people on the UK Hoverflies Facebook page – their participation certainly eases the pressure on the ‘resident experts’ (Ian Andrews, Joan Childs, Geoff Wilkinson and me).

Even more importantly, we have seen a substantial shift towards group members maintaining their own spreadsheets. This is a significant shift because it means that there is a growing group of recorders in whom we have confidence and who have the confidence to make their own records. It has vastly eased the burden on me – had this shift not happened I would have had to pack up for my own sanity’s sake!

Finally, there is no shortage of data analysts. Various University groups regularly make use of HRS and other opportunistic data and there are also independent analysts and PhD students to whom we have supplied data. But, for me, there is a fly in the ointment. When we took on the Recording Scheme it was my hope that we (i.e. Stuart and I) would do a lot of the analytical work and actually publish some ground-breaking work. Today, the workload is such that the best we can do is to assemble data and pass it on to someone else to do the real science. I’m not sure that is what I signed up for and I certainly have not signed up to servicing the biological record production industry!

So, some thought needs to go into making sure that the question is asked ‘what makes a Recording Scheme organiser tick and how do we make running a scheme attractive?’