Saturday 6 February 2016

Open access data – are we missing something?


 
On balance, I am inclined towards improving access to data but I have a feeling that Natural England's recent announcement on service level agreements with LRECs has opened Pandora's Box. There are widely differing views about the degree to which access to data should be free to all and I suspect that many of the views expressed are based upon misconceptions and wrong assumptions. So, let us try to disentangle some of this:

1. There is no reason why data held by LERCs should not be fully open access.


Wrong: not all data providers are necessarily happy to have their data made publicly available. That is a matter of personal choice, and many permutations exist. LERCs may well have a variety of different levels of commitment to provide data, and may not be free to make it available to all and sundry. In addition, some recorders think that their data is actually worth something and are extremely reluctant to see it released for anything other than strictly nature conservation reasons.

Nevertheless, it also seems to be the case that some LERCs are unwilling to make their data available via the NBN Gateway. That could be interpreted as an 'own goal' in the light of the NE announcement.

2. Withholding data means that it has a commercial 'worth'.

Producing a species list that can be replicated in an environmental statement is just one stage in the process. There is also a need to interpret lists. This is something that demands particular skills and context. Whilst a consultant might get a staff member to provide some sort of interpretation, this is often a bigger task when tackled without context than when undertaken by somebody who has an intimate knowledge of the area concerned and the taxa involved. The majority of consultancies employ generalists and not specialists; indeed in many areas of biology the numbers of available specialists are very low.

LERCs are to my mind far more likely to have links with relevant local specialists and are probably far better placed to provide this capacity. As such, it is likely that it will be more cost-effective to use the LERC to provide interpretation than do it oneself. Although I do not have corroborating evidence, I am told that LERCs that have made their data freely available have actually found that they get more trade because there is a better knowledge of the data that they hold.

Some empirical evidence from LERCs for the various models would help to resolve this question.

3. Consultancies make a lot of money out of data from LERCs.

I suspect not! Most clients want the job as cheap as possible, but they do want the job to be done properly. Thus, if the consultant knows that the job can be done quicker and cheaper by the LERC than by their staff, they will factor this into their quote to the client. Of course there are consultancies that see the client as a 'cash cow' but when they do they will get found out and develop a reputation that is hard to shake off.

4. It is cheaper to employ LERCs than consultants because consultants pay their staff much more.

This is a popular misconception, especially in the public sector. In reality, consultants rarely have defined benefits pension schemes and often pay the office junior to do the jobs that an LERC might do. True, salaries in consultancies do rise with experience and perhaps reach higher levels than in other employment streams, but as far as I can see there are few real differences during normal economic times. At the moment, salaries in consultancies might be a bit more liberal than in the public sector, but everybody is squeezed.

It is also worth bearing in mind that if there is an economic contraction then consultancies are far more likely to shed staff quickly (and sometimes on very poor terms when the finances hit the rocks). I seem to recall huge job losses in several of the major consultancies as the recession hit in 2008/09 (I heard of 25% losses in some of the biggest) and am aware of several smaller ones that have gone out of business in recent years.

The real difference between LERC costs and consultancies is that LERCs should be able to add value because they provide a specialist service.

5. Making data available to developers means that it is helping to destroy the countryside.

In theory, this is a wholly wrong misconception, but that depends upon data being used responsibly. Nevertheless, in the course of my career I have come across several cases where data have been withheld and as a consequence the assessment of a site's importance has been less favourable to wildlife.

I wonder how often sites have been lost because data were lacking?

In today's climate, the general assumption greatly favours the developer and wildlife issues are unlikely to carry much weight in planning decisions; but, if the information is not available there can be no fight for wildlife at all.

6. Adding data to the NBN means that my carefully validated data is corrupted by dodgy data.

Wrong. Each dataset is retained as a separate source. Yes there may be poor datasets but unless valid datasets are available it is not possible to minimise the impacts of weak ones. All data analysis depends upon specialist skills to provide a valid interpretation; this is the skill that is vested in quite a small cohort of taxonomic specialists such as recording scheme organisers.

7. Efficiencies can be achieved by centralising data collection and validation.

One part of this approach depends upon an an assumption that LERCs will continue to exist even if NE/Defra funding is withdrawn. That is a brave judgment when one bears in mind that the loss of a key partner often means that other funders feel less of an obligation to participate.

What will happen if LERCs fold? A second assumption then obtains: that without LERCs, recorders will simply use other tools to submit records. This too is a brave conclusion because people often have local allegiances.

8. There is a network of specialists that can be called upon to provide data validation services.

In theory this might be true, but in reality such specialists are not sitting there waiting to be called upon to provide a free service to biological recording. If the call coincides with their objectives they may well participate, but in many cases I suspect this not to be the case. I can think of several major recording scheme organisers that are unlikely to participate, and as such this leaves huge gaps in the validation process.

It is also unwise to assume that existing voluntary validators will continue to be available. In the past ten years, the role of Recording Scheme organiser has changed out of all recognition. In the case of more active schemes it has been necessary to increase the technical capacity and to start to introduce internal administrative processes to keep the scheme running. I have not forgotten the response I got from one of the most able hoverfly recorders when I approached him to become a scheme organiser: 'I enjoy the fieldwork but do not want to become involved in administration'. Wise words I think!

Validating photographic records is one such difference. In some cases, the job of validating has reached almost unmanageable proportions. There is a serious danger that demand for free data administration will reach such a level that specialists whose primary interest is fieldwork will withdraw their services.

Doubtless, there will be others who will offer their services, but will they really have the requisite skills? Some may, but I suspect we can all think of people who display a serious over-estimation of their abilities!

9. Improving data streams is about increasing the numbers of recorders.

This is arguably the biggest misconception of all. Yes one can increase the numbers of records, but sheer volume does not equate to quality. As an example, I can go out and record hoverflies of all taxa and maybe generate 20-30 records from a single site on a good day. Alternatively I could go out and record those taxa across all biodiversity and generate a list of 100 unexceptional easily recognised species. Which is the more useful in terms of site interpretation or site protection?

What is more, increasing the numbers of records requiring administration simply places more pressure on the existing technical capacity. I guess this could be considered as an 'efficiency gain', but it may not feel that way to the volunteers concerned.

To my mind, the crucial issue is to grow the numbers of people with sufficient skills and experience to mentor others, assist in validation and provide interpretive skills. This takes time and depends upon a very narrow spectrum of existing specialists to deliver the desired skills and standards. These are the same people that are expected to provide data validation services and of course to continue to provide the detailed taxonomic records that are the foundation of current specialist recording.

10. Greater efficiency can be achieved by funding centralised services, with fewer local centres.

An interesting assumption that forgets some fundamental aspects of biological recording. Firstly, most people still have some sort of affinity to a region or local area. Those who range far and wide tend to go to honeypot sites to add the tick of x or y to their lists (especially true when I was into moths and I dare say it still holds good where it comes to recording moths, orchids or dragonflies).

LERCs are most suited to engaging with people whose focus is their County or a particular local reserve. Such people may think about submitting records locally because there is relevant feedback or there are events that appeal to them; they may not bother if it is just a matter of submitting to some big national repository that is a cost-efficient data collection service.

To my mind, the issue is not about robotic data collection, it is about a human interaction that gives recorders a warm feeling and a sense of being valued. As I have mentioned in previous posts, LERCs also provide an important training interface and their loss will remove one of the critical support mechanisms required to build a bigger and more resilient recorder base.

The lesson I have learned from running training courses is that one is sadly mistaken if one thinks that all the participants are there in order to become recorders of that taxonomic group. They are not. Some want to enjoy their walks in the countryside; some will develop partial skills amongst a range of other interests, and the odd one will become a devotee of the group in question (at least for a year or two). I suspect that the same holds for the range of biological recorders – some are interested in contributing to national schemes; some are interested locally but not nationally, and some have an allegiance to a particular project.

And the moral of the story …..


The emphasis on collection of data at a national scale suggests that biological recording is regarded as simply an unpaid arm of a professional body. It is all too easy to fall into this trap. I would like to think that when I worked for NE I might still have said what I have in this analysis; but maybe not, as perhaps I too would be beguiled by the desire to access data that would allow me to help to conserve England's wonderful wildlife.

Perhaps it is time to make allowances? But, there remains the need to ask 'when did anybody really inquire what motivates people to collect and submit records?' And, who thought to determine 'what is it that we can do for the community of biological recorders?' LERCs continue to be needed because they provide the mechanisms for local communication that cannot be achieved by a highly automated national scale data assembly processes. Without some of the LERCs I know that we would not have been able to deliver the training programme that we have committed to over the past seven or eight years; in which case we would not have the growing network of recorders who are motivated to get involved locally.

However, the real test of the local value of LERCs must depend upon the views of those who contribute to them. Would they record anyway? Would they simply submit data in a different way? and would they notice the difference? I doubt the LERCs themselves can actually speak for the recorders, and unless there is a substantial body of opinion from contributors, the concerns of LERC employees will largely be ignored as special pleading.

The critical issue for biological recording is to find a way of developing an enhanced network of motivators and organisers who engage locally. In that way, data volume will improve, as will  quality. Perhaps in a such conditions there is the chance that a new generation of specialists will develop to fill the shoes of the existing generation.

6 comments:

  1. Point 6 Adding data to the NBN means that my carefully validated data is corrupted by dodgy data..

    You say that this widely held belief is wrong Roger but I fear that it is actually correct. This is all to do with the way data is displayed on the NBN - Each dataset, although kept separate, is displayed with a parity of esteem. Many, many users do not take the trouble to check the sources of the dots they are looking at, or read the accompanying metadata. NBN should give esteem to those datasets that HAVE approved [by BRC?] validation and verification protocols in place and only display those BY DEFAULT. All other datasets should of course be visible, but only through ticking a checkbox. The thought of 823k records of goodness knows what from FoE being on the Gateway and swamping out 520k records from BWARS does not fill me with cheer!

    ReplyDelete
    Replies
    1. My response is correct Stuart. It simply depends on whether you as the user of the NBN believe everything that is there or whether you use a certain amount of discrimination. That is why LERCs are important, as are data validators and recording schemes.

      I would not disagree that there are problems, but these are largely procedural and about drawing on available skills to make sure the data are reliable. Hence my further points about potential overload of specialists - the NBN and all biological data are utterly reliant upon a small cohort of specialists who have taken the trouble to develop deeper skills.

      It seems to me that if we are to have a sensible conversation with the agencies/Defra about biological data, we have to start from the position that the glass is half full and not half empty, but that there might be ways of making it a great deal fuller.

      Delete
    2. I take your point completely... this is about the difference between a fact (The data are kept separate) and a perception (They look like they are all lumped together). It is too easy for many to make this erroneous assumption though. Checking the traffic to the BWARS dataset and also to the BWARS metadata shows that the latter is consulted less than 10% as many times as the data itself. As you say.... not all data is equal.... even if (at a casual glance it may appear so)

      Delete
    3. In the end, it probably does not matter that much if all the data are lumped together. If that is how they are used by consultancies then they will come unstuck and maybe start to use real specialists. If they are used by conservation agencies, then if they don't exercise some caution then they need their knuckles rapped.

      I think to solution is to have a clear mechanism to highlight the degree to which data have been scrutinised and verified. If a lot of rubbish gets onto the system then NE and others will have to have a rethink about how to secure reliable data - making the role of the Recording Schemes of paramount importance. The big issue, therefore, is how to maintain/grow skills, and what can be done locally in the event of demise of LRCs.

      Delete
  2. Stuart, take a look at the Atlas of Living Scotland (AoLS). Each record has a host of standardised quality metadata. This is better than the free text quality metadata on the NBN Gateway. I'm not sure it has been implemented on AoLS yet but I think the intention is to enable these to be search filters.

    Mike Beard

    ReplyDelete
  3. This comment has been removed by a blog administrator.

    ReplyDelete