The following post has been written by Dom Fripp – senior metadata curation developer at Jisc and part of the UKRDDS team.
Earlier this year, Torsten Reimer wrote a blog post entitled “Less is more? A metadata schema for discovery of research data”. In it, he considered what metadata schemas were in use in UK HEI repositories and catalogues to aid the discovery of research datasets. He also raised the possibility that, given the similar motivations acting on the HEIs (funder mandates, research integrity and the increasing awareness of the value of data), that the metadata requirements might be very similar, maybe even the same.
His rationale was to compare various metadata schemas from institutional research data repositories and look at what was common between them. This conversation was extended into a Birds of a Feather session at the recent IDCC16 conference in Amsterdam. The outcomes of that are included as an addendum of his original blog post, from which he drew two conclusions, more of that later.
Torsten’s list compared the metadata fields currently used for research data at Imperial College and Cambridge University. The list of shared fields was:
- Author/contributor name(s)
- Author/contributor ORCID iD(s)
- Licence (e.g. CC BY)
- Identifier (ideally DOI)
- Publication date
- Institution(s) (of the authors/contributors)
- Funder(s) (ideally with grant references; can also be “none/not externally funded”)
At the same time (and unbeknownst to each other) I was taking a similar approach in the preparation of a metadata schema for Jisc’s UK Research Data Discovery Service. This schema was based on Use Cases and User Requirements, provided by a mixture of HEIs and specialist data centres in the UK. I detailed the process in a recent blog post.
Due to the similarity of the work, it was straightforward to compare Torsten’s list with the newly minted schema. As the table below indicates, there is complete overlap.
|Imperial & Cambridge||UK Research Data Discovery Service profile|
|Author/contributor ORCID iD(s)||Creator identifier|
|Licence (e.g. CC BY)||License|
|Identifier (ideally DOI)||Unique Resource Identifier|
|Version||Relation type / related identifier*|
|Institution(s) (of the authors/contributors)||Publisher / creator affiliation**|
|Grant reference||Project number|
* in the UK Research Data Discovery Service metadata profile, following Datacite, Version is handled within related fields so, rather than numbered, the versions are linked by successor / previous identifiers. These can be numbered accordingly, e.g. (adapted from Datacite table 9)
** This is not a clear mapping as the publisher is not necessarily the creator or contributor institution. In the UK Research Data Discovery Service metadata profile, this can be handled in the creator affiliation field.
This result gives credence to Torsten’s first conclusion in his blogpost, that the minimum metadata requirement “…may be, at least partly, a UK-specific issue.”
If it is, then the good news is that there seems to be a lot in common between UK HEI and data centre repository metadata. It can be argued that this indicates a set of common experiences and requirements to which similar metadata fields have been applied. There’s a reason why the profile for the UK Research Data Discovery Service shares a lot in common with the Datacite and Dublin Core schemas – not only because those schemas have been driven by user requirements but also they are currently sufficient for most cases of discipline-neutral descriptive metadata for research data.
The second part of the preparatory work undertaken to develop the UK Research Data Discovery Service profile was to look at good metadata practice in the schemas that support other research data aggregators and discovery tools around the world.
In terms of establishing what is common between different metadata profiles, I took a similar route to Torsten by looking at a variety of profiles that supported internationally implemented discovery services. The approach is documented in more detail in my previous post. To simplify the task (some of the profiles are expansive) I listed only what was mandatory within the schemas, as a fair assessment of what was core. The results are shown in the following bar chart.
Analysis revealed that even mandatory fields aren’t always common and that very few fields outside of creator, title, type and the resource identifier are considered mandatory in most schema.
Why is this? There are two reasons I’d like to mention (and many more I won’t have thought of so please feel free to comment below).
Firstly, the metadata requirements for a discovery service is likely to have grown up around the scope and requirements of the project. It is important to stress that this analysis is not comparing the quality or success of the schema – they support different projects with different aims – but merely the shared fields to see if the findings shed any light on the potential for international metadata requirement.
Secondly, aggregators take different approaches to mandatory metadata requirements. ANDS uses the RIF-CS schema which has many mandatory elements. This is because ANDS is addressing a national solution to a creation through preservation. This includes discovery, value, access and re-use standards that requires administrational and disciplinary metadata.
On the other hands, B2FIND ( the discovery element of EUDAT) requires only a title and a uniform resource identifier, yet offers integration with Dublin Core, ISO 19115, MarcXML, CMDI and DDI. It too is part of the larger EUDAT infrastructure.
So it seems that there is no consensus to be drawn from internationally implemented schemas. If Torsten’s hunch is correct, and the minimum metadata set is a UK problem then there is potential to develop a stronger answer during the Jisc Research Data Shared Service pilot. This will put more HEIs together and get them talking about research data metadata standards that will play a key role in the resultant infrastructure.
Torsten’s second conclusion in his post was “When engaging in discussions with metadata experts there is no such thing as a pragmatic definition.”
This I agree with. If the ongoing work with the UK Research Data Discovery Service and the broader requirements of the Shared Services Pilot are anything to go by, the pragmatism required is not a matter of definition but of approach. The cumulative effect of use cases, requirements and behaviour in these projects could potentially result in a consensus on minimum metadata requirements in the UK.