Sunday 30 September 2018

Full data - why does it matter?

In the early days of biological recording, all that mattered was the creation of dots on maps. Nobody quite knew what occurred where, so a record comprising a four-figure or even two-figure grid reference and a year date was sufficient to create such a dot - job done!

In those early years the information was, at best, sparse. There was a certain amount of interest in first and last dates for species, so there were plenty of recorders who would say 'I'll give you first and last dates but cannot be bothered with the rest'. Again, the early data are not great for complete runs of records throughout the year and, even now, we get a proportion of recorders who don't see a lot of point in recording common species throughout the season. I think this partly stems from a lack of understanding of how data can be used and why it is desirable to have the most comprehensive information possible.

So, what do we need, ideally?


This is a complex question because one could collect all manner of data but it has to be stored and retrieved and then used. Even the very best databases are not well suited to retention of every form of information. For example, habitat information is often very difficult to capture because different recorders interpret habitat differently. Instead, I think we need to look at how the data are currently used and work to that requirement.

From the HRS perspective, we use data to create maps - so grid reference is essential. Most requests for data are for mapping to 1km resolution but may be more refined on occasions so a minimum of a four-figure grid reference is desirable. Higher resolution (e.g. 100 metres, 10 metres or 1 metre) may be possible if a single record but when one moves about whilst recording 100 metre resolution is probably the best that can be achieved. That is normally enough to locate a record within a polygon forming the outline of a site, so will probably work OK with GIS investigations. Rot holes in individual trees may well be recorded at much finer resolution if you have a GPS.

The full date is also absolutely essential. Unless it is for a given date, the record cannot readily be used in any analysis to look at phenological changes or relationships with local or national weather patterns. So, data that comprise a date range (e.g. 5-7 July 2018) cannot be used in such analyses. They are also very tricky to store in the database. We do accept data from Malaise traps and other trapping systems but the information has much more limited uses.

Giving first and last dates actually distorts the dataset, so this approach is not helpful. When we look at phenological change and any links to climate change, we look at the deviation from the historic median, so we need to calculate the new median. That is dependent upon as refined data as possible. So, no matter how common a species is, we want all records and not just first and last dates.

Likewise, we are interested in all records for a given site on a given date. Full lists convey much more information, and the coverage of all species is a critical part of modern occupancy models that I have written about in the past (Data requirements for occupancy modelling 23/05/2018). Common species form the constant background for understanding what might or might not be present using occupancy models.

I like detail on the gender of the animal seen. Until I started extracting this data from photographs we had very little information on the differences in male and female phenology but we now have quite a lot. Not all species behave in the same ways, as I have also shown in my posts. We continue to benefit from improvements in this aspect of data collection and can now look at how males and females respond to changing weather patterns.

There is a huge amount of interest in flower visit information related to pollinator studies. Demand can only be expected to grow. So, records of flowers visited are very useful. Only last year I was asked for HRS data on visitors to ivy and was able to confidently supply some 5,000 records. What we need though is records where the animal is actually visiting the flower and showing signs of nectaring or taking pollen. Unfortunately, historic data has all manner of information that might or might not be correct, with lots of records of say 'on sycamore' which might mean visiting sycamore flowers or could mean sitting on a sycamore leaf. So notes need to be accurate - I note as 'at x or y' - with the at denoting that a flower visit is involved.

Behavioural notes are also very useful e.g. seen in copula or defending a territory. Other observations can also be helpful as they start to build up a picture of the life of the animal. Such notes are especially useful where larvae have been recorded. If you do record larvae, do make sure that this is noted - we have a lot of records of species such as Cheilosia grossa that clearly don't occur in July as adults, but the data give no hint that it is a larval record. Such records are flagged as doubtful and don't get used in analysis or mapping!

Hopefully this short discussion will help to explain why I can be such a pain in pressing for full dates or proper grid references. The critical issue is that datasets such as those compiled by the HRS are a key tool in understanding what is happening to wildlife and may in some small way influence policy-makers to do the right thing!


No comments:

Post a Comment