UKRDDS Phase 3 – enabling discovery of research data

In my previous post, I described the latest phase (three) of the Research Data Discovery Service. In this post I’d like to describe in more detail the plans for harvesting metadata from other UK HEIs and Data Centres with research data collections.

In phase 2 metadata was harvested from 9 HEIs (Hull, St Andrews, Glasgow, Oxford Brookes, Edinburgh, Oxford, Southampton, Leeds and Lincoln) and 6 Data Centres (Archaeology Data Centre, Cambridge Crystallographic Data Centre, ISIS/ICAT – STFC, UK Data Service, Visual Arts Data Centre and NERC), all funded to participate in the pilot. The participants also provided a set of requirements for a discovery service, provided harvestable endpoints and helped test the alpha system as it developed. As the project progressed, a further five HEIs (Sheffield, Bath, Nottingham, Lancaster and Bristol) volunteered to be involved and we started to incorporate their metadata into the system near the end of phase 2.

I’m glad to say that all of the participants from phase 2 are keen to continue to be involved in the project. In phase 3 we plan to add as many UK HEIs and Data Centres, that have research data collections, into the Discovery Service. There are a number of institutions that are also part of the Research Data Shared Service and the Research Data Metrics for Usage projects and, if they haven’t already been involved in this project, we will be looking to include them as well. However, at this point we would like to hear from any other institutions that have a research data collection and an endpoint that we can use to harvest the metadata into the Discovery Service. It was clear in phase 2 that some institutions have well established research data management policies and practices, while others are less well advanced. It doesn’t matter what stage of this process you have reached, we would still like to hear from you.

We will be working to enhance the current test service, adding functionality to match requirements, and ensuring there is a fully functional and tested system ready to transfer to a production service (provided it meets the relevant criteria and the business case is agreed within Jisc). Incorporating other participants’ metadata (potentially for all HEIs with research data collections) is an important objective of the project.

If you are interested in being involved in this latest phase of the project, or would like to discuss this further, please contact Christopher Brown.

 

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

UKRDDS Phase 3

The latest phase of the UKRDDS will run from October 2016 to September 2017 and follows on from the second phase of the project. This post summarises work from the second phase and what’s planned for this third phase.

Phase 2

This Jisc-led second phase of the project ran from March 2015 to September 2016 and included support from the Digital Curation Centre and the UK Data Service, on HEI and Data Centre engagement respectively. It built on the pilot work with the aim of running a test UK Research Data Discovery Service. The main aim of the second phase was to lay the firm foundations for the service by harvesting metadata from 9 HEIs and 6 Data Centres, each funded to participate in the project. These pilot organisations provided metadata of their research data collections for harvesting, provided a set of user requirements and helped to test the alpha system. The alpha service was made publicly available during development to ensure the research community had the opportunity to test its functionality. This phase came to an end with a final workshop for all participants where the alpha system was tested, requirements were reviewed and the plans for the next phase were presented.

Phase 3

The third phase of the project has the following objectives:

  • moving the test service from alpha to beta;
  • enhancing the service by adding further requirements;
  • incorporating other participants’ metadata (potentially for all HEIs with research data collections);
  • running as an enhanced beta service to allow for further testing;
  • at the end of the project have a fully functional system ready to operate as a service (provided it meets the relevant criteria and the business case is agreed within Jisc).

It’s hoped that all participants from phase 2 continue to be involved during phase 3 of the project. In phase 2 there were three Advisory Groups – User, Researcher and Technical & Metadata. However, in phase 3 there will be one Advisory Group with voluntary participation from all those HEIs and Data Centres having their metadata harvested into the service. It’s expected that sub-groups could form to discuss specific issues, for example metadata mapping. This structure will ensure the project continues to get input and feedback from participants to ensure the system satisfies the needs of its users.

At the final workshop (see previous post) valuable feedback was provided as to how to make sure the project is a success. This includes ensuring the project engages with researchers and other users as soon as possible to further test the system and make sure it is satisfying their needs and not just those of the participants. Also, requirements have been mainly coming from the data collection perspective, but these need to be gathered from the user perspective sooner rather than later. Phase 2 focussed on primary types of data but we should look at secondary types of data (see scope of datasets) in the context of researchers using the service. Other use cases need to be considered, such as those from a funder’s perspective. Other questions raised included: What about all the other data internationally in subject based data centres – do we want that or not? Is the distinction between UK and non UK data important? For now, the focus remains with UK datasets.

This work will allow us to move from a test service to a production ready one. We will be able to harvest from more data sources, do more formal and informal system testing, look at further requirements (refining and implementing them), develop a business case for the service with the ultimate aim of delivering a more mature and tested service to Digital Resources (the area of Jisc that runs and supports services, such as the Archives Hub).

In developing additional functionality we will review existing requirements set to “won’t” (from the MoSCoW prioritisation process performed early on in phase 2) and out of scope, gather further requirements from the final workshop, and potentially other requirements, integrate more closely with the Research Data Shared Service work and the IRUSdataUK project.

How to get involved

In phase 2 metadata was harvested from 9 HEIs and 6 Data Centres funded to participate in the pilot. A further four HEIs volunteered to be involved in the project and their metadata was added to the system near the end of phase 2. In phase 3 the plan is to add more (if not all) HEIs and Data Centres that have research data collections. This will necessitate a set of requirements to join the service. These are currently being finalised, but the minimum requirements are:

  • Research data metadata can be provided
  • There is a harvestable endpoint
  • It’s a supported schema
  • A named contact is available for support to
    • Check harvest and metadata
    • Report issues
    • Request manual harvest, if required
    • Liaise with the developers when adding metadata to the service
  • Jisc will provide a developer/admin to liaise with the support person

If you are interested in being involved in this latest phase of the project, or would like further information, please email Christopher Brown.

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

UKRDDS Phase 2 Final Workshop

The project team and representatives from all participating pilot HEIs and Data Centres convened in London for the third workshop of the UK Research Data Discovery Service on 13 October 2016. This was the final workshop of what is now known as phase 2 of the project, which ran from March 2015 to September 2016 (extended from the original end date of July 2016).

The objectives of the workshop were to review the second phase of the project, discuss what still needs to be achieved in the next (third) phase of the project and how people can be involved and engaged.

Prior to the workshop, all the relevant sources of information were collated on the workshop’s padlet. This includes links to the shared notes, an online app for collecting sticky notes, all supporting documentation and slides.

To collect as much feedback as possible during the workshop, in addition to the exercises, posters were put up titled Questions, Issues, Ideas and a FLAP (Future considerations, Lessons learned, Accomplishment and Problem areas) board for phases 2 and 3. Any notes added to these posters have been transcribed into the shared spreadsheet mentioned in the group exercises.

Presentations

The day started with Catherine Grout describing what had changed in the landscape since phase 2. The research data discovery service sits within a suite of Jisc work called “Research at Risk”, which offers tools, services, advice and guidance to those involved with research data management in the UK. In particular the Research Data Shared Service will offer a simple solution that meets the needs of institutions and the requirements for funders.

The project is managed by Christopher Brown and he summarised the work of phase 2. This phase had brought the pilot into alpha status, laying the firm foundations for a potential service. A further year of work will make this a more hardened system with further testing and user feedback, making the service more valuable and useful. Further HEIs have been brought into the project, in addition to the original participating pilots.

User stories, supported by a “MoSCoW” prioritisation process, have driven the development of a range of outputs resulting in the alpha system and associated research and documentation (links to the latter, and a list of participants, are available via the padlet).

Recent focus has been on system testing, with changes made on the staging server and using the live server as a benchmark for testing. Harvesting continues, alongside development on other requirements and specific issues (NERC, VADS) and the addition of other HEIs (Nottingham, Sheffield, Lancaster, Bath and Bristol). Feedback on the project and participating pilots’ involvement will be an important method of assessing phase 2 and directing phase 3.

Dom Fripp has worked on metadata mapping for the project. A new “metadata profile document” has been circulated and is open for comments and questions (currently on version 1.1), alongside a mapping document. These documents inform the work of our developer in building the metadata schema into CKAN. The mapping exercise is very important work and is of interest globally – comments are very welcome. This is still a live process, and issues that arise should be shared and reported to be addressed in future development (the example of issues related to migration between DataCite 3 and 4 was noted). In future this documentation will be migrated to github.

Group Exercises

The main focus of the workshop wasn’t to listen to presentations but for participants to engage in a number of group exercises.

The first exercise was to assess and test the current alpha system on the staging server. Delegates could work alone or in groups at their tables. There were four tables and reporting back was done one table at a time. The areas suggested for testing included – your organisation’s metadata; any fields missing; is the harvested data correct; search functionality; presentation of results; usability. These were suggestions and other areas could be tested.

Notes were added to a poster under the categories of Bug, Error and Feedback. These have been transcribed into the following shared spreadsheet (along with notes from the Requirements exercise). These will be reviewed and checked against existing JIRA tickets. For any new issues a new ticket will be created.

The second exercise was a follow on to the first and delegates were asked the following questions:

  • Does the service satisfy the requirements of your organisation?
  • What further requirements should be added?
  • What should be improved?

They were asked to write their answers down on sticky notes and put them on a poster under the following categories: Drop, Add, Keep or Improve.

As with the first exercise, the notes have been transcribed into the shared spreadsheet.

Both exercises provided valuable feedback on the current system and ideas for future requirements.

The Road Ahead

The day finished with Christopher Brown describing plans for the next phase of work and how participants could be involved.

In phase 2 we engaged with participants and gathered user stories, prioritised and implemented requirements based on these user stories, evaluated software and chose CKAN, developed an Alpha system, harvested metadata from participants into this system and are now moving to Beta.

Phase 3 will run from October 2016 to September 2017 and will allow us to move from a test service to a production ready one. We will be able to harvest from more data sources, do more formal and informal system testing, look at further requirements (refining and implementing them), develop a business case for the service with the ultimate aim of delivering a more mature and tested service to Digital Resources (the area of Jisc that runs and supports services, such as the Archives Hub).

In developing additional functionality we will review existing requirements set to “won’t” and out of scope, gather further requirements from this workshop, and potentially others, integrate more closely with the Research Data Shared Service work and the IRUSdataUK project.

It’s hoped that all participants would continue to be involved during phase 3 of the project. This would be at a level expected from all new participants wishing to have their metadata harvested into the discovery service.

The day ended with a thank you to all the participants and the project team for the help and support in running an engaging and productive workshop, and to all the participants who have helped throughout phase 2 of the project.

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

Discovery Service demo at Digifest16

Jisc’s annual Digifest (#digifest16) took place in Birmingham on 2-3 March 2016. The Discovery Service presentation and demo featured on day 2 at the Shared Services Stand. You can view all slides from the programme page, but the presentation from this project’s session is available from SlideShare.


Following the presentation, which gave a brief overview and update on the current status of the project, Mark Winterbottom gave a demo of the alpha service. The session was well attended (36 people with seating for 20) and feedback was positive. I stressed that this was currently an alpha service and, as such, there was still plenty of work to do. Following the demo there has been interest from other institutions in having their data harvested by the service.

I encouraged everyone to try out the service and report back any issues through the “Feedback” link. Development work is proceeding through an iterative release cycle so the service is having new features and functionality added every 2-3 weeks. If you are interested in knowing what’s changed please check the blog for updates. Although the focus is harvesting metadata from the funded participating pilots, we are keen to involve other institutions as the project progresses. If you are interested in participating you can email via the “Feedback” link or contact me directly.

I’d like to thank Mark for driving the demo, the team members who attended and supported the session and @JiscLive for tweeting the session.

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

Second Workshop – feedback and testing

The project team and representatives from all participating pilot HEIs and Data Centres convened in London for the second workshop of the UK Research Data Discovery Service on 18 February 2016. As well as taking the opportunity to meet face-to-face, rather than rely on the monthly webinars, the purpose of this workshop was to:

  • Review and test the current alpha version of the Discovery Service
  • Discuss requirements that still require clarification
  • Finalise the metadata schema
  • Discuss the scope of datasets
  • Ensure participants had the opportunity to raise issues, ideas and questions
  • Clarify the support required from participants for the remainder of the project

The workshop started with a presentation from Catherine Grout on how the Discovery Service fits in with the Research@Risk co-design theme from Jisc. One particular project that is closely related to this work is the Research Data Management Shared Service, providing interoperable systems for researchers and institutions to adhere to best practice throughout the RDM lifecycle. The metadata work in the Discovery Service is particular relevant to the Shared Service work and the production of a discipline neutral schema.

I gave a brief update on the status of the project, what’s been done, where we are and what needs to be done. The highlights have been on gathering user stories and requirements, development of the alpha site and harvesting datasets to populate the service. The presentation described the purpose of the workshop and gave an introduction to the group exercise. The aim of this exercise was to collect valuable feedback from participants that we as a team could act on to improve the project. The starfish exercise includes people working in groups and then putting items on sticky notes under the following categories: start, stop, do more, do less, continue. These were all related to what the project should be doing, or not doing, to support the participants and build a Discovery Service. Once all the ideas were added everyone voted on the most important. There was a wide range of suggestions, but the ones with the most votes were:

  • Start – Feeding metadata from UKRDDS to data centres / repositories
  • Stop – Harvesting anything but datasets, data services, software…
  • Continue – Liaison with other initiatives looking at metadata for data e.g. DataCite; Developing a roadmap for the service separating out what is in version 1 and version 2.
  • Do Less – Discussion on browse options
  • Do More – Clarity on what the aim of the service is; Systematic user testing with researchers; Be clear about sustainability of service post project; What is the minimum useful metadata profile? Need a standard.

The exercise provided valuable feedback to the team and, although we are already working on some of the items, we will be making sure we have responded to these over the coming weeks.

To ensure as much feedback as possible was gathered there were three posters on the wall to collect Ideas, Issues and Questions throughout the day. These were reviewed at the end of the workshop to make sure we’d covered everything.

Following the group exercise, there was a session to review some of the requirements that still needed clarity. For example, what do we mean by “quality” of metadata? Is this completeness of metadata? What related datasets and information should be shown with a data record in the service? One of the issues raised with the project’s monthly webinars was that people would prefer to talk about things like requirements face-to-face and this provided the opportunity to do so. Although there were many requirements to review, I did manage to get some discussion going around the main issues and clarity on specific requirements. The conclusion of this session was that the team has enough information to implement all the requirements, if necessary, and where there is any uncertainty it can be raised with the advisory groups. Requirements will continue to be implemented and system testing will ensure functionality can be tested against these requirements. Any issues will be raised at that testing stage. As the development work is taking an agile approach we don’t have to wait until the end of the project to find out if there’s an issue. Releases will be every 2-3 weeks allowing functionality to be tested against requirements on an iterative basis.

Veerle Van den Eynden and Diana Sisu deal with engagement with the participating data centres and HEIs respectively. They have been supporting development work as metadata has been harvested to the discovery service. Over recent weeks they had been collecting feedback on the sort of support participants will require for the final months of the project. As the new metadata profile is implemented participants will require support making sure the profile is complete. Also, support on issues around licensing, extra metadata elements, working with repository suppliers will be provided.

The second half of the workshop focussed on the following three areas:

  • metadata development
  • system testing
  • scope of datasets.

Before breaking into three groups to discuss these, a short introduction was provided for each one.

Metadata development – Dom Fripp gave an introduction to the proposed new metadata schema for the service. This had already been circulated within the team and to participants in a shared online document. The draft metadata schema should satisfy user requirements, be simple enough, learn from existing schemas rather than create a new one and be flexible enough to develop along with the service. There was agreement within the group about the core fields with discussion around specific issues such as licences, dates and harmonisation of terms. The schema will be updated to reflect the discussion and circulated to participants prior to publicising more widely.

Scope of Datasets – Veerle Van den Eynden and Diana Sisu ran this group to continue the discussion on what datasets should be included in the service. A report had been shared with participants and discussed in advisory group meetings so this session was to get further feedback prior to finalising the document. Feedback was that the service should be as inclusive as possible but differentiate as appropriate, be clear about access issues such as currently embargoed data, and although the focus is on UK data there is much stored in the UK that wasn’t created there with complex ownership issues.

System Testing – Mark Winterbottom gave a demo of the alpha system and Ade Stevenson collected feedback. The alpha site had been available online well in advance of the workshop and having functionality added on a regular basis. This session gave participants the opportunity to look at how the system works, delve into some of the more technical aspects of the service and ask technical questions of the team. Following this session, Mark collected all the actions as tickets in the project’s JIRA system, which is being used to track development work. After the workshop I reviewed these with Mark and planned out the development work for the next two months. The test site will be updated every 2-3 weeks to allow participants to test added functionality and features.

All three of these groups provided valuable feedback and produced engaging discussions. After reporting back there was time to review the ideas, issues and questions collected throughout the day before the final wrap up. Details of each part of the workshop have been provided in the links above. As the project progresses further posts will look at some of the issues dealt with in more detail, for example the metadata profile. The FAQ section of the Discovery Service will be added to and will include issues raised at the workshop.

The day ended with a thank you to all the participants and the project team for the help and support in running an engaging and productive workshop.

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

Discovery Service Update – Alpha Site

The early phase of this project focussed on gathering use cases and requirements from our participating pilot universities and data centres. The use cases and requirements have been discussed and refined by the participants, mostly through participation in the Advisory Groups that support the project. The three groups (Technical & Metadata, User, and Researcher) meet online monthly to discuss the project’s progress and include the pilots and other interested parties. Having a clear set of prioritised requirements has helped ensure that we can develop a Discovery Service that meets the needs of our users.

Recent work has focussed on technical development and implementing some of the “Must have” requirements within CKAN, which is the selected technical solution for building the Discovery Service. The main task has been harvesting metadata from the pilots using their available endpoints. Once this was completed it meant there was data in the service allowing the search functionality to be added. I should stress that both of these areas are still being worked on and there is other functionality to add. However, I am pleased to say that this work has progressed well and that the alpha version of the service is online at http://ckan.data.alpha.jisc.ac.uk/.

Please note: This is not a launch but showing what has been developed so far. Before using please be aware of the following:

  • This is an *alpha* version so it is far from a finished product. Consider everything as “under development”;
  • We welcome feedback and there’s a feedback link on the top right. The team’s priority is to test functionality against the set of requirements from our pilots who will be testing the system. All feedback is welcomed and will be added to our tracking system. At the moment the feedback link will allow you to email your feedback, but in the future we’ll include a feedback form;
  • The metadata shown has been harvested from existing endpoints as is. A core set of metadata for the service is in the process of being defined so the metadata you see will change;
  • The Advanced Search is under development. Eventually it will allow all core metadata fields to be searched;
  • The look and feel of the system complies with Jisc requirements for an alpha version but this will change as it moves to a beta version;
  • Only metadata from the participating pilots is shown and the pilots are at different stages with their metadata collection;.

This month there will be a face-to-face workshop (18 Feb) where there will be system testing ensuring the functionality meets the requirements, clarifying the remaining requirements, finalising the core metadata schema and the scope of datasets. I’ll provide an update on the project later this month after the workshop.

 

 

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

Use cases for a national research data discovery service

Veerle Van Den Eynden works for the UK Data Service and is part of the UKRDDS project team focussing on engagement with the Data Centres that are participating as pilots in the project. Veerle has written the following post to summarise the finalised set of use cases for the discovery service.

The partnership that is busy developing a UK-wide registry for research data collections held in UK research institutions and subject data centres has finalised a list of use cases for such service. The use cases reflect the technical, functional, metadata and other requirements a service should have from the point of view of different actors in the research and data landscape: researchers, funders, project/research managers, data repositories, system machines and machines. Examples of some use cases are: promoting datasets held by different repositories/data centres for cross-disciplinary research; enabling researchers to showcase their data as impact for the Research Excellence Framework; helping institutions to find research datasets created by all their own researchers; and helping researchers to understand the quality and reusability of datasets and their metadata.

The use cases have developed from the pilot phase of the project, a workshop with all partners in April 2015, various discussions between the project team and the participating data centres and institutional repositories, and feedback from the project’s advisory groups.

The current partnership for the Jisc UK Research Data Discovery Service project includes 7 data centres and 9 university repositories which are representative of the UK research data landscape: Jisc, Digital Curation Centre, UK Data Archive, Archaeology Data Service, NERC Data Catalogue Service, ISIS ICAT data catalogue, UK Energy Data Centre, Visual Arts Data Service, Cambridge Crystallographic Data Centre, University of Hull, University of St Andrews, University of Glasgow, Oxford Brookes University, University of Edinburgh, University of Oxford, University of Southampton, University of Leeds and University of Lincoln. These partners form 3 advisory groups that oversee the various activities: Technical and Metadata Advisory Group, User Advisory Group and Researcher Advisory Group.

The use cases are visually represented in a simplified way in the following image:

use cases slide

The full list of use cases, with their detailed descriptions, have been published as a shared Google Doc at https://docs.google.com/document/d/1lZ03_oCoqd5wgwQoo_VkdGAMilPxr3UP13FRaexB3R0/edit?usp=sharing.

 

 

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

Use Cases

As mentioned in the previous post, there are three advisory groups for the UKRDDS project and on 23 July they met online for the first time. The main purpose of the first set of meetings was to go through the user stories that were gathered at the first workshop and have been developed into use cases and will allow us to extract a set of requirements for the service.

All three meetings took place on Blackboard Collaborate and a recording is available at https://ca-sas.bbcollab.com/mr.jnlp?suid=M.12C3A74D539E3E1DB1369DB9CC7B34&sid=2009077. If you’re really interested, you may just want to listen to the first meeting as this turned out to be the most productive and the following two meetings had the same agenda.

In each meeting I went through the use case document (UKRDDS-UserStories-RefinedListdoc 20150729) and each use case in turn. This document had been made available on Google Docs so that all participants could provide comments in advance of the meetings. Participants were encourage to talk about the comments they’d added, if any, and contribute to the discussion.

The initial set of user stories had been collected from the first workshop. These had been collated and then categorised with similar user stories combined. The idea behind collecting user stories was that an initial set could be collected and then expanded on to form more detailed use cases, from which a clear set of requirements for the discovery service would emerge. Each user story had been prioritised using the MoSCoW method and these were reviewed during each meeting. Also, each user story has an owner (someone who has an interest in that particular user story and willing to take responsibility for ensuring the relevant information is correct), usually the person who suggested the user story, and these were reviewed.

Once the project team had reviewed the user stories, they were shared with the project participants via Google Docs. This initial set of 34 use cases then grew to 43 as extra ones were added. Each one was reviewed in the meetings. Although future meetings will have an agenda framed around the different types of user/expert in each group, these first meetings had the same agenda and I had the unenviable job of going through the 43 use cases in three meetings. While not all team members were able to make the meetings, David Wilson helped out with the Technical and Metadata Group and Alex Ball kindly helped out in all three. My thanks to them both.

I’m not planning on going through all the comments and 43 use cases in turn here, but the document has been updated to reflect the actions required and the discussions from each meeting. The next stage was to update the document to merge some use cases, update those that require clarification and produce a final set (now 44 use cases). This document (UKRDDS-UserStories-Updated 20150729) will be shared with all participants in the project for comment. This final step should be a quick process as it’s important that a line is drawn in the sand and the set of requirements is finalised. We are open to hearing about other use cases or requirements, but there is a danger of scope creep and delays to the project if we don’t derive an agreed set of requirements, from which we can evaluate the ANDS and CKAN software and start to build the discovery service, as soon as possible.

As the requirements evolve and the advisory group meet on a regular basis, updates will be provided on this blog. If you would like to contribute or provide feedback please contact me.

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS