Saturday, 9 July 2016

The challenge of identifying from photographs

After my last post, which generated a bit of additional comment, I thought I really ought to go back and look at the data to try to explain the challenges faced by specialists working on photographic identification. As a starting point I thought it might be helpful to look at the genus Syrphus and what causes the problems:

The first and most significant challenge is that we don't actually know how many species there are in Britain! Some years ago, Syrphus rectus was added to the UK fauna. This is a North American species that may well be holarctic, but we don't know. Males are almost identical to Syrphus vitripennis, especially in the distribution of microtrichia on the wings. Females are an intermediate between S. vitripennis and S. ribesii, in so far that they have the microtrichia of vitripennis, but the hind femur has a dark ring on it. So, maybe the females are doable from photographs? Field photographs never generate the necessary definition to pick up the microtrichia so unless one can detect eye hairs (S. torvus) or the wholly yellow femur of S. ribesii there is considerable uncertainty about the identity of most animals. In theory, there are additional characters of hair colour and extent of yellow on the legs that can help in some circumstances, but these are highly variable features and can be influenced by the angle one views the animal. So, Syrphus is already a problem and that is compounded by uncertainty. Further difficulties arise because some very high quality photographs have shown that even S. ribesii posesses minute hairs on the eyes, and that in some situations these can be mistaken for the hairs of torvus (demonstrated in females with yellow hind femur).

The following photos may help to explain. These are high resolution stackshots taken by John Bridges (North East Wildlife John has been doing some really interesting work this year compiling detailed shots of hovers and here are some of the results (many thanks for permission to use John).
Syrphus torvus female, showing the eye hairs. The hairs of female eyes are far less obvious than those of males, but this shot nicely shows how small they are and how fine the resolution of photos needs to be to resolve identification.
Syrphus ribesii male with the second basal cell showing complete coverage by microtrichia. These minute hairs are only seen when the animal is carefully orientated to the light.

Syrphus vitripennis second basal cell with area devoid of microtrichia highlighted (area above the green line)

So, what can we do? Well, the problem does partially resolve itself if the photos are high quality macro shots such as those taken by Brian Valentine (LordV on Flickr). Even so, getting the right angles is tricky and one has to work with what is presented. The other otion is perhaps to retain specimens, anaethetise them and use stackshot to take more detailed photos. That is quite an undertaking and is likely to be beyond the average photographer. So, we must accept that at least some species are rarely likely to be taken to a full ID unless one captures all the salient features. This is demonstrated by the table I have attached later in this account where the problem areas are highlighted.

Challenges such as this can be found in many genera, but even so, there are plenty of hoverflies that can be identified from photographs providing the photo is of sufficiently good resolution and is sharp. The smaller the animal is in the frame when photographed, the less likely it is that it will be possible to achieve a firm identification. This is illustrated in the table below. I have compared the data for 2015 and 2016 so that as a big a sample as possible is considered. Clearly there are year on year differences in success rates. In many cases there are very minor differences between years, but for some the differences are quite substantial - something I will have to look at when I have time.

Success rate for identification of photographs in 2015 and 2015 at generic level. Genera where major problems arise are highlighted in orange.
It should not be assumed that 100% success rates actually imply that identification is easy. In many cases, the numbers of species in a genus are small, or the frequency with which a species/genus is encountered is very low and the sample size is too small to make any realistic assessment. Furthermore, there may be records within the uncertain genus category that might still not be taken to species.

Thursday, 7 July 2016

Facebook vs. iRecord - a conundrum

An exchange with a member of the UK Hoverflies Facebook Page these last couple of days prompted me to put some thoughts down on the relative benefits of different approaches to biological recording and the way in which it has developed with the advent of digital photography.

When Stuart Ball and I took on the HRS in 1991 the data was mainly supplied on record cards that had to be entered into a database. The BRC at Monks Wood did that job, whilst the scheme orgnaisers acted as the interface with recorders and checked the data to ensure they made sense. Unfortunately, BRC were not sufficiently well funded to keep pace with the volume of data and a big backlog developed. In the case of the HRS this was approximately 2 cubic metres of cards. In the five years after Stuart and I took the job on, we did the job of data entry and gathering machine-readable data. I think we can say we were the first scheme to do this. That effort generated 375,000 records but was organised so that data management took place in the winter and we were able to concentrate on field work in the summer.

A developing paradigm

Things changed with the advent of digital photography, improved internet access and of course such advances as the WILDGuide which made hoverflies accessible to a much wider audience. Nevertheless, until around 2012 the vast bulk of our activities centred upon traditional interactions with recorders, many of whom had been contributing for 30-40 years and were personally known to us. Over 50% of incoming data came from  fewer than 25 people.

The FB page generates about 25,000 records per year and in addition has helped to develop about a dozen people who now do their own IDs and submit data as spreadsheets - maybe a further 5,000 records. Historically, the HRS attracted around 20-25,000 records a year through traditional routes - not least the maybe 2,000 records a year that I generated myself from fieldwork. That traditional resource is now advancing in age - most of the top 25 recorders (who have contributed about 50% of all records) have been involved for over 20 years and several for over 40 years. In the last few years we have lost 2 and a third is far from well and able to record. That means we have got to grow a new generation. We are doing this through two avenues:

  • Regular training events that Stuart and I run across the length and breadth of the UK (from Lerwick and Kirkwall to Exeter, Studland and West Sussex). We run between 5 and 8 such courses per year, but probably only generate about 1 new recorder like our old guard from about every 50 trainees; so perhaps one per year.
  • Interaction with recorders on the web. Facebook has proven to be exceptionally effective in this respect. FB not only helps with ID skills but it helps to develop the wider recording skills and an understanding of what data are needed. That said, we must also accept that a very high number of contributors who are (were) first and foremost photographers who wanted IDs for their shots. Importantly, the spread of involvement has widened considerably and this has made a huge difference to the recording scheme.

New challenges

Working with photographic recording brings with it a completely new set of challenges, not least expectations. Many contributors happily accept that not all photos can be identified, but occasionally they express frustration. I withdrew from one forum after getting abuse because I was not prepared to put names to most photos of the genus Syrphus, which is far from straightforward, even from specimens. The problem of Syrphus identification crops up again from time-to-time and is frustrating for everybody, especially when it is also one of the most frequently photographed genera. We can only do what is possible, and I'm afraid that there is also an issue of best use of resources.

I take the view that it is unwise to call oneself an 'expert' - one is setting oneself up for a fall. So I tend to use the term specialist and accept that I too have a great deal to learn. The difference between me and the relative novice is that I am acutely aware of the pitfalls, and have probably fallen into a good many holes of my own making! That is how we learn. But, using the term 'expert' , the number available to provide identifications is extremely small - well below 20 across the country and probably fewer than 10 who can make a reliable job of it.

Data harvesting

The big question then arises as to how data should be harvested? Should one extract data directly from FB posts or should one direct contributors to iRecord? I have probably built a rod for my own back by harvesting directly from FB, but I do have sound reasons for doing this:

  • A very substantial number of FB members started either as photographers who wish to know what their subject matter is, or who enjoy sharing their experiences with others who are interested from the perspective of getting a good shot. As such contributing to iRecord or another medium is not their highest priority - we would lose a great deal of data if I did not extract from this site.

  • There are considerable advantages to compiling a dataset that has been checked by a small group of the more reliable specialists. This improves confidence that the data are robust, providing one does not simply discard partially identified records to provide perspective; hence I extract all records.

  • I extract a great deal of additional data that often gets overlooked by recorders: the gender of the animal, morphs, abundance, behaviour and flower visits (not the plant the animal was sitting on). It is a comprehensive dataset.

  • I think the page would be a far less effective resource without the feedback that I manage to post on trends in species abundance or record numbers. If we are to generate a new cohort of recorders (and hopefully replacements for the existing team) then we must educate and mentor people.

  • The impact of FB can be seen from the attached graph (it will be bigger still in 2016 as we are dealing with about 50% more records than 2015.
Figure 1. Numbers of records held within the HRS database, separated according to origin: NBN data are held separate to the main HRS datase

But, what about iRecord?

This was built as part of a wider initiative to increase biological recording activity. It has an admirable objective, but starts from the principle that recording scheme organisers are there to validate records. In theory this is the case, but most RS organisers took the job on many years ago when things were less complicated - they maintained a database, did their own recording and gathered in records from a relatively small cohort of reliable records, most of whom they knew individually. iRecord is very impersonal and photo ID is an art that has to be developed - not everybody is willing to do this and relatively few RS organisers have signed up to iRecord - much to the frustration of the Country Agencies who want the data.

In the first year of iRecord there were 14,000 posts of hoverfly records, 11,000 of which were a single data dump from an LRC whose data we already had. I had to work through the lot to clear them, especially as quite a few had to be shifted from full species to aggregate after splits changed the status of species (lots in Platycheirus). That job took me about 100 hours.

This year the pace has dropped and at the moment there are currently 2,457 records awaiting verification. By the time the autumn arrives I reckon that number will have risen to about 6,000 records. So it is a substantial but manageable job in the winter. But it does frustrate me hugely for a number of reasons:

  • There are quite a few contributors who post a set of photos that are all of different animals that they lump under the same species name - that has to be disentangled.
  • Records often lack detail - when I extract data from this page I also log the gender, flower visits, behaviour etc. Posts on FB often lack this or say 'on rose' when they mean 'sitting on the leaves of a rose bush' and not 'visiting the flower of a rose' - there is a huge difference in the value of such data and as there is interest in pollinators my approach is providing a far more robust dataset.
  • A fair few records are misidentified - there is one regular contributor who rarely achieves 50% correct and seems not to have learned at all in the past 2 years.
  • Where records are not accompanied by photos one gets no real feel for the actual skill of the recorder. This is illustrated by people whose data cover Syrphus - lots of records without photos but the odd one with a photo that clearly cannot be taken to species (e.g. males of poor resolution). At that point one must be wary of the overall quality of the data from that person. These have to be dealt with - iRecord is not a particularly good interactive medium and FB is far better in this respect.

·        The dataset that emerges is a mish-mash of occasional records and records from one or two more advanced recorders, so there is not much chance of advancing the science of recording. It is compounded by problems with individual recorders going back through their diaries and adding records that they submitted to the RS many years ago as a record card and that I have already computerised - so I am doing a lot of repeat work for relatively little return.

Where do we go from here?

We have seen a paradigm shift in the way biological recording works. The internet and digital photography has changed the relationship between recording schemes and contributors. It has brought a plethora of benefits, but has also exposed significant weaknesses in the system. The most worrying weakness is the relatively small number of people with sufficient experience to deal with identification, coupled with raised expectations that they will provide their time in line with demand. Unfortunately, there are limits to what they can do or are willing to do. Some are  not particularly computer literate, and do not have the spare time to respond in line with the immediacy of modern life. Others deal with groups of organisms that require dissection or high magnification and checking numerous characters that are difficult or impossible to depict in photographs. Others still just don't want their lives ruled by a computer: a comment that resonates is 'I like the fieldwork but I don't want to become involved in administration'.

Thus, my view is that we are witnessing a turning point in biological recording. If we want to use interactive media, then we have got to grow capacity to respond to demand. My guess is that the resident specialists on the HRS FB page jointly contribute over 2,000 hours a year to this one medium. It has achieved a huge amount but there have to be limits to what can be done with existing capacity. So, we must grow new capacity - which of course depends upon the same limited cohort of specialists! We will get there I think, but we must also ask for expectations to be tempered:

We won't ever manage to identify everything posted as a photograph, and we probably will never fulfil all the aspirations of data users. Nevertheless, the UK is in a far better position than anywhere else in the World, with the possible exception of The Netherlands where biological recording is also well served by the non-vocational ethos.