Discovery Service demo at Digifest16

Jisc’s annual Digifest (#digifest16) took place in Birmingham on 2-3 March 2016. The Discovery Service presentation and demo featured on day 2 at the Shared Services Stand. You can view all slides from the programme page, but the presentation from this project’s session is available from SlideShare.


Following the presentation, which gave a brief overview and update on the current status of the project, Mark Winterbottom gave a demo of the alpha service. The session was well attended (36 people with seating for 20) and feedback was positive. I stressed that this was currently an alpha service and, as such, there was still plenty of work to do. Following the demo there has been interest from other institutions in having their data harvested by the service.

I encouraged everyone to try out the service and report back any issues through the “Feedback” link. Development work is proceeding through an iterative release cycle so the service is having new features and functionality added every 2-3 weeks. If you are interested in knowing what’s changed please check the blog for updates. Although the focus is harvesting metadata from the funded participating pilots, we are keen to involve other institutions as the project progresses. If you are interested in participating you can email via the “Feedback” link or contact me directly.

I’d like to thank Mark for driving the demo, the team members who attended and supported the session and @JiscLive for tweeting the session.

Second Workshop – feedback and testing

The project team and representatives from all participating pilot HEIs and Data Centres convened in London for the second workshop of the UK Research Data Discovery Service on 18 February 2016. As well as taking the opportunity to meet face-to-face, rather than rely on the monthly webinars, the purpose of this workshop was to:

  • Review and test the current alpha version of the Discovery Service
  • Discuss requirements that still require clarification
  • Finalise the metadata schema
  • Discuss the scope of datasets
  • Ensure participants had the opportunity to raise issues, ideas and questions
  • Clarify the support required from participants for the remainder of the project

The workshop started with a presentation from Catherine Grout on how the Discovery Service fits in with the Research@Risk co-design theme from Jisc. One particular project that is closely related to this work is the Research Data Management Shared Service, providing interoperable systems for researchers and institutions to adhere to best practice throughout the RDM lifecycle. The metadata work in the Discovery Service is particular relevant to the Shared Service work and the production of a discipline neutral schema.

I gave a brief update on the status of the project, what’s been done, where we are and what needs to be done. The highlights have been on gathering user stories and requirements, development of the alpha site and harvesting datasets to populate the service. The presentation described the purpose of the workshop and gave an introduction to the group exercise. The aim of this exercise was to collect valuable feedback from participants that we as a team could act on to improve the project. The starfish exercise includes people working in groups and then putting items on sticky notes under the following categories: start, stop, do more, do less, continue. These were all related to what the project should be doing, or not doing, to support the participants and build a Discovery Service. Once all the ideas were added everyone voted on the most important. There was a wide range of suggestions, but the ones with the most votes were:

  • Start – Feeding metadata from UKRDDS to data centres / repositories
  • Stop – Harvesting anything but datasets, data services, software…
  • Continue – Liaison with other initiatives looking at metadata for data e.g. DataCite; Developing a roadmap for the service separating out what is in version 1 and version 2.
  • Do Less – Discussion on browse options
  • Do More – Clarity on what the aim of the service is; Systematic user testing with researchers; Be clear about sustainability of service post project; What is the minimum useful metadata profile? Need a standard.

The exercise provided valuable feedback to the team and, although we are already working on some of the items, we will be making sure we have responded to these over the coming weeks.

To ensure as much feedback as possible was gathered there were three posters on the wall to collect Ideas, Issues and Questions throughout the day. These were reviewed at the end of the workshop to make sure we’d covered everything.

Following the group exercise, there was a session to review some of the requirements that still needed clarity. For example, what do we mean by “quality” of metadata? Is this completeness of metadata? What related datasets and information should be shown with a data record in the service? One of the issues raised with the project’s monthly webinars was that people would prefer to talk about things like requirements face-to-face and this provided the opportunity to do so. Although there were many requirements to review, I did manage to get some discussion going around the main issues and clarity on specific requirements. The conclusion of this session was that the team has enough information to implement all the requirements, if necessary, and where there is any uncertainty it can be raised with the advisory groups. Requirements will continue to be implemented and system testing will ensure functionality can be tested against these requirements. Any issues will be raised at that testing stage. As the development work is taking an agile approach we don’t have to wait until the end of the project to find out if there’s an issue. Releases will be every 2-3 weeks allowing functionality to be tested against requirements on an iterative basis.

Veerle Van den Eynden and Diana Sisu deal with engagement with the participating data centres and HEIs respectively. They have been supporting development work as metadata has been harvested to the discovery service. Over recent weeks they had been collecting feedback on the sort of support participants will require for the final months of the project. As the new metadata profile is implemented participants will require support making sure the profile is complete. Also, support on issues around licensing, extra metadata elements, working with repository suppliers will be provided.

The second half of the workshop focussed on the following three areas:

  • metadata development
  • system testing
  • scope of datasets.

Before breaking into three groups to discuss these, a short introduction was provided for each one.

Metadata development – Dom Fripp gave an introduction to the proposed new metadata schema for the service. This had already been circulated within the team and to participants in a shared online document. The draft metadata schema should satisfy user requirements, be simple enough, learn from existing schemas rather than create a new one and be flexible enough to develop along with the service. There was agreement within the group about the core fields with discussion around specific issues such as licences, dates and harmonisation of terms. The schema will be updated to reflect the discussion and circulated to participants prior to publicising more widely.

Scope of Datasets – Veerle Van den Eynden and Diana Sisu ran this group to continue the discussion on what datasets should be included in the service. A report had been shared with participants and discussed in advisory group meetings so this session was to get further feedback prior to finalising the document. Feedback was that the service should be as inclusive as possible but differentiate as appropriate, be clear about access issues such as currently embargoed data, and although the focus is on UK data there is much stored in the UK that wasn’t created there with complex ownership issues.

System Testing – Mark Winterbottom gave a demo of the alpha system and Ade Stevenson collected feedback. The alpha site had been available online well in advance of the workshop and having functionality added on a regular basis. This session gave participants the opportunity to look at how the system works, delve into some of the more technical aspects of the service and ask technical questions of the team. Following this session, Mark collected all the actions as tickets in the project’s JIRA system, which is being used to track development work. After the workshop I reviewed these with Mark and planned out the development work for the next two months. The test site will be updated every 2-3 weeks to allow participants to test added functionality and features.

All three of these groups provided valuable feedback and produced engaging discussions. After reporting back there was time to review the ideas, issues and questions collected throughout the day before the final wrap up. Details of each part of the workshop have been provided in the links above. As the project progresses further posts will look at some of the issues dealt with in more detail, for example the metadata profile. The FAQ section of the Discovery Service will be added to and will include issues raised at the workshop.

The day ended with a thank you to all the participants and the project team for the help and support in running an engaging and productive workshop.

Discovery Service Update – Alpha Site

The early phase of this project focussed on gathering use cases and requirements from our participating pilot universities and data centres. The use cases and requirements have been discussed and refined by the participants, mostly through participation in the Advisory Groups that support the project. The three groups (Technical & Metadata, User, and Researcher) meet online monthly to discuss the project’s progress and include the pilots and other interested parties. Having a clear set of prioritised requirements has helped ensure that we can develop a Discovery Service that meets the needs of our users.

Recent work has focussed on technical development and implementing some of the “Must have” requirements within CKAN, which is the selected technical solution for building the Discovery Service. The main task has been harvesting metadata from the pilots using their available endpoints. Once this was completed it meant there was data in the service allowing the search functionality to be added. I should stress that both of these areas are still being worked on and there is other functionality to add. However, I am pleased to say that this work has progressed well and that the alpha version of the service is online at http://ckan.data.alpha.jisc.ac.uk/.

Please note: This is not a launch but showing what has been developed so far. Before using please be aware of the following:

  • This is an *alpha* version so it is far from a finished product. Consider everything as “under development”;
  • We welcome feedback and there’s a feedback link on the top right. The team’s priority is to test functionality against the set of requirements from our pilots who will be testing the system. All feedback is welcomed and will be added to our tracking system. At the moment the feedback link will allow you to email your feedback, but in the future we’ll include a feedback form;
  • The metadata shown has been harvested from existing endpoints as is. A core set of metadata for the service is in the process of being defined so the metadata you see will change;
  • The Advanced Search is under development. Eventually it will allow all core metadata fields to be searched;
  • The look and feel of the system complies with Jisc requirements for an alpha version but this will change as it moves to a beta version;
  • Only metadata from the participating pilots is shown and the pilots are at different stages with their metadata collection;.

This month there will be a face-to-face workshop (18 Feb) where there will be system testing ensuring the functionality meets the requirements, clarifying the remaining requirements, finalising the core metadata schema and the scope of datasets. I’ll provide an update on the project later this month after the workshop.

 

 

Use cases for a national research data discovery service

Veerle Van Den Eynden works for the UK Data Service and is part of the UKRDDS project team focussing on engagement with the Data Centres that are participating as pilots in the project. Veerle has written the following post to summarise the finalised set of use cases for the discovery service.

The partnership that is busy developing a UK-wide registry for research data collections held in UK research institutions and subject data centres has finalised a list of use cases for such service. The use cases reflect the technical, functional, metadata and other requirements a service should have from the point of view of different actors in the research and data landscape: researchers, funders, project/research managers, data repositories, system machines and machines. Examples of some use cases are: promoting datasets held by different repositories/data centres for cross-disciplinary research; enabling researchers to showcase their data as impact for the Research Excellence Framework; helping institutions to find research datasets created by all their own researchers; and helping researchers to understand the quality and reusability of datasets and their metadata.

The use cases have developed from the pilot phase of the project, a workshop with all partners in April 2015, various discussions between the project team and the participating data centres and institutional repositories, and feedback from the project’s advisory groups.

The current partnership for the Jisc UK Research Data Discovery Service project includes 7 data centres and 9 university repositories which are representative of the UK research data landscape: Jisc, Digital Curation Centre, UK Data Archive, Archaeology Data Service, NERC Data Catalogue Service, ISIS ICAT data catalogue, UK Energy Data Centre, Visual Arts Data Service, Cambridge Crystallographic Data Centre, University of Hull, University of St Andrews, University of Glasgow, Oxford Brookes University, University of Edinburgh, University of Oxford, University of Southampton, University of Leeds and University of Lincoln. These partners form 3 advisory groups that oversee the various activities: Technical and Metadata Advisory Group, User Advisory Group and Researcher Advisory Group.

The use cases are visually represented in a simplified way in the following image:

use cases slide

The full list of use cases, with their detailed descriptions, have been published as a shared Google Doc at https://docs.google.com/document/d/1lZ03_oCoqd5wgwQoo_VkdGAMilPxr3UP13FRaexB3R0/edit?usp=sharing.

 

 

Use Cases

As mentioned in the previous post, there are three advisory groups for the UKRDDS project and on 23 July they met online for the first time. The main purpose of the first set of meetings was to go through the user stories that were gathered at the first workshop and have been developed into use cases and will allow us to extract a set of requirements for the service.

All three meetings took place on Blackboard Collaborate and a recording is available at https://ca-sas.bbcollab.com/mr.jnlp?suid=M.12C3A74D539E3E1DB1369DB9CC7B34&sid=2009077. If you’re really interested, you may just want to listen to the first meeting as this turned out to be the most productive and the following two meetings had the same agenda.

In each meeting I went through the use case document (UKRDDS-UserStories-RefinedListdoc 20150729) and each use case in turn. This document had been made available on Google Docs so that all participants could provide comments in advance of the meetings. Participants were encourage to talk about the comments they’d added, if any, and contribute to the discussion.

The initial set of user stories had been collected from the first workshop. These had been collated and then categorised with similar user stories combined. The idea behind collecting user stories was that an initial set could be collected and then expanded on to form more detailed use cases, from which a clear set of requirements for the discovery service would emerge. Each user story had been prioritised using the MoSCoW method and these were reviewed during each meeting. Also, each user story has an owner (someone who has an interest in that particular user story and willing to take responsibility for ensuring the relevant information is correct), usually the person who suggested the user story, and these were reviewed.

Once the project team had reviewed the user stories, they were shared with the project participants via Google Docs. This initial set of 34 use cases then grew to 43 as extra ones were added. Each one was reviewed in the meetings. Although future meetings will have an agenda framed around the different types of user/expert in each group, these first meetings had the same agenda and I had the unenviable job of going through the 43 use cases in three meetings. While not all team members were able to make the meetings, David Wilson helped out with the Technical and Metadata Group and Alex Ball kindly helped out in all three. My thanks to them both.

I’m not planning on going through all the comments and 43 use cases in turn here, but the document has been updated to reflect the actions required and the discussions from each meeting. The next stage was to update the document to merge some use cases, update those that require clarification and produce a final set (now 44 use cases). This document (UKRDDS-UserStories-Updated 20150729) will be shared with all participants in the project for comment. This final step should be a quick process as it’s important that a line is drawn in the sand and the set of requirements is finalised. We are open to hearing about other use cases or requirements, but there is a danger of scope creep and delays to the project if we don’t derive an agreed set of requirements, from which we can evaluate the ANDS and CKAN software and start to build the discovery service, as soon as possible.

As the requirements evolve and the advisory group meet on a regular basis, updates will be provided on this blog. If you would like to contribute or provide feedback please contact me.

Advisory Groups

There are three advisory groups for the UKRDDS project and on 23 July they met online for the first time (see next blog post). The monthly online advisory group meetings are for participants to discuss progress, issues, method of working, communication, etc. for the UK Research Data Discovery Service.

The three advisory groups and their remit are as follows:

Technical and Metadata Group

This group has a combined remit to investigate and discuss technical and metadata issues, as follows:

  1. Looking at the service from a technical standpoint. Scope includes consideration of issues such as handling duplicates, deletions, choice of crosswalks for support, QA of crosswalks, and other relevant issues. Comprised of developers and architects within the project, plus developers from ANDS and CKAN and relevant technical experts in participating data centres and HEIs.
  2. Advising on the development of the metadata schema, including the necessary and desirable metadata elements to achieve discovery functionality and which conventions should be adopted when using these and other relevant issues. Comprised of metadata experts from within the project and relevant metadata experts in participating Data Centres and HEIs.

User Group

To ensure there is active engagement with the data contributors from HEIs and Data Centres throughout the project. This is to help gather user requirements, provide feedback on the project progress and deliverables, ask questions and share experience. This group includes those HEIs and Data Centres that are funded to support the project through sharing catalogues and submitting data. It should include research managers and other groups (e.g. data contributors). There is likely to be overlap between this group and others.

User (Researcher) Group

As the overall aim of the project is production of a service to provide improved discoverability of research data for reuse in research, it is critical that we provide a mechanism for researchers to interact with and feedback on the development of the service. This may be achieved by representative bodies and / or nomination of researchers by project partner institutions. Engaging with the UK research community will help ensure they are prepared for the Discovery Service and with international metadata development. This group includes those HEIs and Data Centres that are funded to support the project through sharing catalogues and submitting data. There is likely to be overlap between this group and others.

Meetings will occur online every 4 weeks, or as required. The next blog post will cover the discussions from the first meetings where the focus was on use cases.

Initial Workshop (Once more unto the breach…)

On St George’s Day members of the UKRDDS team from Jisc, the DCC and the UK Data Archive gathered together with pilot institutions and data centres for the project’s initial workshop.

It was the first opportunity, since the initial pilot, for all stakeholders to come together and to discuss the plan for the project, the work involved, the governance structures, what’s expected from the pilots and the supporting structures required to ensure the project’s success. The workshop was also the start of the process of gathering user requirements and the development of a set of use cases for the Discovery Service.

The day was split into two parts. The morning was taken up by presentations from the project team. The afternoon was the key part of the workshop with groups working together to gather requirements for a Discovery Service and define an initial set of use cases.

Presentations

The following briefly summarises each presentation and includes links to the relevant slides.

Welcome and Introduction – Catherine Grout introduced the workshop and gave some background into how the Discovery Service fits in with Jisc’s Research at Risk co-design challenge.

UK RDDS Project Overview and Plan walkthrough – Christopher Brown gave an overview of the Jisc-led project, including who is involved, and went through the plan and each work package. This also included advisory group structures and how the project will be run.

HEI Engagement – Laura Molloy summarised the HEI engagement work from the initial pilot and the requirements for this project.

Data Centre Engagement– Veerle van den Eynden summarised the Data Centre engagement work from the initial pilot and the requirements for this project.

Metadata Development – Alex Ball can an overview of how far the metadata work had progressed in the pilot and what was now required for metadata standard work.

Software Evaluation – David Wilson talked about the technical aspects of the project for further evaluating ANDS, evaluating CKAN and developing a set of evaluation criteria that could be used against any other potential solution, if not within this project but by anyone in the future.

Prior to lunch, there was a short group discussion with the opportunity to ask questions to the team on the plan, work packages, communication, governance or anything related to the project. It was good to hear such positive comments and the clear need for such a Discovery Service.

Requirements Gathering and Use Cases

Delegates were split into three groups of around 7/8 people. Each group was given the list of HEI and Data Centre requirements that came out of the original pilot’s workshop (Excerpt -UKResearchDataRegistryPilot_reportWP5_v1-2” and “Excerpt-UKResearchDataRegistryPilot_reportWP4_03”) and asked to say if these were still relevant and what, if anything, was missing. They were also given printed copies of a Use Case Template. Rather than come up with detailed use cases, they were asked to think about user stories using the following structure to describe each one:

“As a <role>, I want <goal/desire> so that <benefit>“

The “role” included stakeholders such as Research Manager, Researcher, Funder, Developer, etc.

It was a very productive afternoon and an initial set of user stories and requirements were gathered. Technical and metadata requirements came up as did the importance of a search feature, and the functionality around this, metadata quality, and an easy to use interface. The team is in the process of grouping the user stories together, prioritising them and ensuring that they are linked to the requirements. Further details will be published on this blog as they are developed.

I would like to thank everyone who was able to attend and contribute to the success of the workshop.

Welcome

Welcome to the blog for the Jisc UK Research Data Discovery Service. Further details about the project can be found on the About page, but this initial post provides an introduction to the project.

Following on from the initial pilot (phase 1) work, where the Digital Curation Centre (DCC) piloted an approach to a registry service aggregating metadata for research data held within UK universities and national, discipline specific data centres, this project builds on this initial pilot work with the aim of running a UK Research Data Discovery service. This project (phase 2) will lay firm foundations for the service, including a service operation plan and business case for its delivery into the future.

Background

In 2013, the Digital Curation Centre (DCC) piloted an approach to a registry service to aggregate metadata for research data held within UK universities and national, discipline specific data centres.

This six month pilot, which engaged the support of a number of higher education institutions (HEIs), tested an existing data registry architecture, based on the software and metadata requirements of Research Data Australia developed by the Australian National Data Service (ANDS). Its aims were to demonstrate the feasibility of a research data discovery service for the UK and to develop a better understanding of the optimal technical platform, metadata strategy and harvesting mechanism.

An essential feature was to initiate the engagement with stakeholders from the HEIs and data centres to ensure the pilot was designed to meet stakeholder requirements.

The issue

In order to be reused, research data must be discoverable. Universities are making research data assets available through repositories or other data portals. The Engineering and Physical Sciences Research Council (EPSRC) requires research organisations to maintain a data catalogue.

It is likely that some mechanism for aggregation will be necessary to increase visibility, to promote discovery and linking between datasets in related subject areas held in different institutions. Whereas document repositories can, in principle, make articles open to full-text searching by Google, this recourse is not available to data archives relying on metadata.

The solution

As UK universities become more involved in the management of research data and capacity develops, the requirement for a UK research data discovery service has grown. The benefits of such a service include:

  • Breaking down data silos, encouraging linking and reuse of related data collections, particularly in interdisciplinary research
  • Facilitating linking data to other research outputs, making data citation and referencing easier, and thereby incorporating data in research achievements and impact

About the project

In this second phase of work, we will:

  • Build on the stakeholder engagement
  • Further evaluate the ANDS solution and explore an alternative such as the Comprehensive Knowledge Archive Network (CKAN)
  • Assess whether any other solutions are potential candidates
  • Continue the metadata standards work
  • Move the pilot to a suitable instantiation for a future service.

This Jisc-led initiative, with support from the Digital Curation Centre and the UK Data Archive, will develop a discovery service that enables the discovery of UK research data, meets our customer requirements and is in a position to be taken forward into a service run by Jisc.