Wednesday, 23 May 2018

Data requirements for occupancy modelling

In the past ten years, several models have been developed to make use of 'ad-hoc' or ‘opportunistic data’. They are regularly used in analyses of trends in Britain’s wildlife and are the black boxes behind the banner headlines of x or y changes in the abundance of Britain’s wildlife (substantially declines). The processes are complicated, so my brief description is necessarily short and open to correction by those in the know. However, for basic purposes of explaining how different datasets perform, the following may be useful:

These models take existing data and use them to predict where a given species might occur. To do so, they develop a list of the species that occur in surrounding squares that contain similar land-cover characteristics. The lists will comprise a mixture of those species that might be expected almost everywhere, those that are more specialised but are still widespread and abundant, and scarcer species that have more demanding ecological parameters.

The completeness of coverage of surrounding squares will determine the degree to which a model can predict presence or absence. It has been assumed that models will smooth out irregularities in recording effort, but I have felt for a very long time that they will be affected by the composition of species lists. If the list is complete, there is more chance of predicting the presence of scarcer species or of species that are difficult to identify. On the other hand, incomplete lists will make it more difficult for the model to identify critical ecological factors and species will not be predicted.

Crucially, a test of whether a list is complete will depend upon those species that occur consistently across the landscape. There are arguably three classes of species that fall into this category:

  • readily recognisable species that almost everybody records;
  • species that are difficult to find but are still very widespread and are therefore less well recorded; and
  • species that are very widespread, but difficult to identify and hence are under-recorded.

If a species list contains all the above species, it can be assumed that it is comprehensively recorded. The shorter the list of these ‘constant’ species, the less well it is recorded. The problem that dogs these models is the issue of completeness of coverage. So, inevitably, if coverage is weak, the models will have trouble predicting presence or absence. This shows up quite well in models covering, say, the west coast of Scotland where there has been very little recording at any time. At the moment, I am unconvinved that we really know what the constants are amongst the taxonomically challenging parts of our wildlife.

So, the question then arises:

What can we do to improve the accuracy of predictive models?

Readers who use BirdTrack will be aware that the system requires the recorder to say whether they have submitted a complete or partial submission. If your list only notes the rare and unusual, it is not included in the analysis, and likewise if there were species that you were unable to identify then the list is incomplete and should not be included in the analysis. BirdTrack takes opportunistic recording one step closer to providing the robust data that occupancy models need to deliver reliable results.

In most other taxonomic groups, ‘opportunistic’ data is a complete hotchpotch of complete lists and casual single records. All have an important role to play because they all help to fill in little parts of the jigsaw. But, of course, if a visit is made to a site and only part of what was seen is reported, then the model only has part of the species list to work with. Repeat visits by a range of recorders will fill in some of the gaps over time, but unless the range of recorders includes people who tackle the tricky species, the lists will always be incomplete, and the model will inevitably have less to work on.

So, if we want to improve the accuracy of predictive models, the answer is quite simple. We need to improve overall coverage, both in terms of geographical extent and in terms of depth of species composition. This is one reason why a general call for more recording may not have the desired effect; indeed, it could compound model shortfalls by focussing on a larger volume of the easily identified species and give the impression that more challenging species are declining or declining at a faster rate than they actually are. 

I have shown in previous posts how the trend for Portevinia maculata has sharply altered upward since photographic recording became the preferred recording medium. The Portevinia maculata model, however, illustrates a second issue. It was probably greatly under-recorded and is now much better recorded. So, the army of recorders who have looked for it and added new squares have made an important contribution to our knowledge of its true distribution. So, there are definite benefits from certain increases in recorder effort.

It therefore follows that if one of the significant objectives of biological recording is to improve our knowledge of the distribution and status of Britain’s wildlife, we need to think about how to improve the data that underpin these predictive models. These models were used to produce the maps in the WILDGuide to hoverflies and doubtless in other guides too. So, there are also benefits to the avid recorder if the models are improved - the next generation of guide books should be more accurate!.

Thus, rather than a general cry for more data, I think the new cry should be – complete lists please? Or, if you are not one for retaining specimens, do please try to ensure that your coverage is as complete as possible. We have seen a strong shift in this direction in the UK Hoverflies Facebook group and it is much welcomed. I think this shift illustrates two important points:

  • More active group members have developed the ability to create such lists; and
  • These members have developed the key skill of logging all observations rather than just a checklist of the unusual.

Whatever your interest in wildlife recording, it is worth thinking about the added value of full species lists. They will make a difference.


  1. With occupancy models, we can estimate the total number of species (or taxa) present, for example, using multiple season multi species occupancy models. Not only can these models help with false absences but they can help us with false positives too. Improving coverage and species identifications will increase accuracy when there is an associated increase in the number of surveys, and similarly it will help to record the common as well the rare species.

    A really important consideration is the number of repeat surveys at our sampling sites of interest: the accuracy of the estimates can be improved more effectively by repeating more surveys at a single sampling location rather than by spreading fewer surveys across many sampling locations. The exact number of surveys required will depend on the detection probabilities of the species involved, their occupancy rates and available resources.

    Another key feature of occupancy models is that they allow us to account for the different abilities and detection rates of different observers. This has the potential to help with observers that are more likely to record some species than others.

    Darryl MacKenzie et al (2018) have published a 2nd edition of Occupancy Estimation and Modelling, which is a useful update on these methods, and there are some good packages in R that can be used to help with any analysis.

    Apologies, I was unable to comment on the UK Hoverflies Facebook post, so hope it is ok to discuss on here.

  2. This comment has been removed by a blog administrator.