UKRDDS Phase 3 – beta update

Headline

Research Data Discovery Service development work on hold as priority is to deliver the Research Data Shared Service.

Phase 3 Summary

It’s been some time since there’s been an update from the Research Data Discovery Service project and I will explain why in this post. First of all I’ll summarise phase 3 work in 2017.

The most recent webinar for phase 3 of the project was held earlier this year, in April. The aims of the webinar were to welcome new participants, provide an update of the project, introduce the new beta version (http://researchdiscoveryservice.jisc.ac.uk), highlight progress and review requirements from phase 2 and 3. You can read the report from this webinar on the project blog.

From May to June, development work for phase 3 continued in a series of sprints, each lasting two weeks. The beta version of the Discovery Service was updated every fortnight and a list of changes was sent to the project mailing list and posted on this blog. During this time our developer, Mark, was busy creating a custom harvest extension and working on harvesting participants’ metadata.

Latest Update

The most recent of our fortnightly sprint posts concluded with the following update:

“Mark is moving to the Research Data Shared Service project for a few weeks so the current development work in RDDS is on hold. The rest of the team will be using the time to focus on testing and ensuring the current set of requirements are updated and prioritised for the next phase of development. Unfortunately, there won’t be any technical updates during this time. The two weekly cycle of technical updates will restart once Mark is back on the project.”

Unfortunately for the Discovery Service, those “few weeks” were extended, became permanent and Mark has remained on the Shared Service (RDSS) project since July. This was due to the Shared Service becoming the priority for development work to ensure it delivers. Although this is good for the Shared Service you can imagine that it’s had a huge impact on the Discovery Service. The non-technical work mentioned in the above quote has been completed but without a developer further progress cannot be made. In short, with the prioritisation of the Shared Service the Discovery Service project has been put on hold.

I have delayed making this announcement as I have been working on revising the plan and attempting to get a developer on the project to complete the custom harvester and re-harvest from as many participants as possible. Unfortunately, this has not been possible and the project remains on hold.

For those participants who are also piloting the Shared Service and for anyone who’s seen a presentation from the Shared Service team, you might be aware that the Discovery Service will be a component of the Shared Service. This remains the case and current market research work, which will go into the business case, has looked at running the Discovery Service as such a component. At some point integration with the Shared Service will be implemented.

Participation

I would like to thank all the participants who have been involved in the project. In particular, I would like to thank everyone who has supported the project, helped identify requirements, provided metadata, tested harvested metadata, provided feedback and engaged with the team. The project wouldn’t have achieved anything without your contribution.

As a reminder, here’s a list of participants:

Phase 2 participants – 9 HEIs (University of Hull, University of St Andrews, University of Glasgow, Oxford Brookes University, University of Edinburgh, University of Oxford, University of Southampton, University of Leeds and University of Lincoln) and 7 Data Centres (Archaeology Data Service, Cambridge Crystallographic Data Centre, ISIS/ICAT – STFC, UK Data Service, Visual Arts Data Service, UK Energy Research Centre and Natural Environment Research Council).

Phase 2 volunteers – 5 HEIs (University of Sheffield, University of Bath, University of Nottingham, Lancaster University and University of Bristol).

Phase 3 volunteers (in addition to all the above 14 HEIs and 7 Data Centres) 11 HEIs – (Sheffield Hallam, Royal College of Art, King’s College London, University of Cambridge, University of Stirling, Aston University, Cranfield University, University of Sussex, University of Warwick, University of Liverpool, Open University).

That’s 32 organisations contributing to the project, in addition to other organisations who have been involved in some way, such as the Natural History Museum, figshare, British Library, Thomson Reuters, Australian National Data Service and the Digital Curation Centre.

To have so many organisations involved shows the interest and importance of a Research Data Discovery Service.

Finally

If the situation changes and development work can continue on the Discovery Service I will announce it via this blog and the project mailing list. However, for now, the project is on hold until approximately the middle of 2018.

If anyone has any questions or feedback on the above please do not hesitate to contact me at christopher.brown@jisc.ac.uk.

Beta update – 30 June 2017

The beta version of the Discovery Service is updated every fortnight and a list of changes is sent to the project mailing list and posted on this blog.

Here’s the latest update  from our developer Mark Winterbottom.

Technical Update (30 June, 2017)

Key Improvements

This sprint, Mark has been working on creating the custom harvest extension. Here are the items that are completed and working:

  • Database table models setup.
  • Database setup command.
  • Creating new harvest endpoints.
  • Updating and deleting harvest endpoints in the system.
  • Adding custom field mappings.
  • Creating new harvest jobs in the job table.

Mark also made some progress getting the asynchronous tasks working, however this is still in progress.

Focusing on next

Mark is moving to the Research Data Shared Service project for a few weeks so the current development work in RDDS is on hold. The rest of the team will be using the time to focus on testing and ensuring the current set of requirements are updated and prioritised for the next phase of development. Unfortunately, there won’t be any technical updates during this time. The two weekly cycle of technical updates will restart once Mark is back on the project.

Other notes

I gave a presentation and demo of the beta version of the system at the Research Data Network event (27/28 June in York). The slides are available at https://www.slideshare.net/JiscRDM/a-discovery-service-for-uk-research-data.

All the slides from the RDN event are on the RDN programme page at  https://research-data-network.readme.io/docs/4th-research-data-network-york-university. For those who weren’t able to attend, I’d recommend checking out the presentations and I hope it will encourage you to attend the next RDN event later this year. Check the #JiscRDM stream for an announcement.

That’s the end of this update. If you would like any further information about the project, or contact details, check the project page on the Jisc website.

Beta update – 16 June 2017

The beta version of the Discovery Service is updated every fortnight and a list of changes is sent to the project mailing list and posted on this blog.

Here’s the latest update  from our developer Mark Winterbottom.

Technical Update (16 June, 2017)

Key Improvements

The focus of this sprint has been on the following:

  • Deploy the CSW harvester and geo-spatial map search to the Staging site.
  • Debugging issues with the spatial search.
  • Getting the AWS CloudWatch logging working to have visibility of the container logs (RDD-276)
  • Deployed fix (to staging) that was preventing harvest for University of Edinburgh (and a number of others) RDD-181
  • Downloaded the database from the old Alpha site and requested that it is shut down.
  • Investigated bug with solr when harvesting 400k+ datasets. For now, removed CCDC as a work around and created a separate ticket (RDD-327) to implement a long term solution.
  • Harvested data for Aston University, University of Edinburgh on Staging.
  • Added UK Energy Research Centre to staging (RDD-310).
  • Added the ability to create new harvest endpoints in the custom harvest extension (RDD-320)
  • Had a productive sprint retrospective and planning meeting with Chris and Dom where we decided on some process improvements for the bi-weekly sprints.

Focusing on next

The next sprint will focus on:

  • Deploy the CSW and bug fixes to live.
  • Add more harvest endpoints to live where applicable.
  • Continue work on the dynamic harvest endpoint.

Other notes

Just a heads up that Mark will be deploying changes to the live site on Monday morning. In order to add support for geo-spatial search, he needs to re-install the RDS Database Instance to upgrade PostgreSQL to a newer version. As a result, this will require some downtime (1 hour maximum).

That’s the end of this update. If you would like any further information about the project, or contact details, check the project page on the Jisc website.

Beta update – 2 June 2017

The beta version of the Discovery Service is updated every fortnight and a list of changes is sent to the project mailing list and posted on this blog.

Here’s the latest update  from our developer Mark Winterbottom.

Technical Update (2 June, 2017)

Key Improvements

The focus of this sprint has been to get the remaining institutions into the system. Here are the main items completed in this sprint:

  • Got the CSW Harvesting working on the Staging site.
  • Got the geo-spatial search working.
  • Enabled PostGIS Postgres extension in RDS.
  • Created automated build for solr docker image.
  • Harvested data from CCDC.
  • Harvested data from remaining HEI’s and Data Centre’s where endpoint is available.

The staging site has been rebuilt and I’ll be sending out some emails asking for feedback from specific sites early next week.

Focusing on next

In the next sprint I will focus on creating a dynamic harvester which gives us more control over field mappings in the web interface in CKAN. We’ll be publishing a blog post shortly to explain how this will work.

That’s the end of this update. If you would like any further information about the project, or contact details, check the project page on the Jisc website.

Beta update – 19 May 2017

As mentioned in the previous post, the beta version of the Discovery Service is updated every fortnight and a list of changes is sent to the project mailing list and posted on this blog.

Here’s the latest update  from our developer Mark Winterbottom.

Technical Update (19 May, 2017)

Key Improvements

Items completed in this sprint:

  • Fixed a bug where some harvesting processes were becoming unresponsive. The issue was related to a bug in the Terraform setup scripts and is now resolved (RDD-309).
  • Created a process for adding unit tests for CKAN extensions which are executed on the Strider Continuous Integration server (RDD-150).
  • Got the CSW harvester working in the docker containers. However there is still some work to do on this until I can deploy it to the live state (RDD-306)
  • Added Science and Technology Facilities Council (STFC) to the site and harvested metadata (RDD-61)
  • Fixed issue with the ‘format’ and ‘language’ fields being duplicated (RDD-302)
  • Attempted to harvest from Natural Environment Research Council, Archaeology Data Service and UK Data Archive however I am having issues with the endpoints (I’m working with the site contacts to resolve these).
  • Worked on CKAN extension that allows us to dynamically configure the field mappings in the config (RDD-303).

Focusing on next

The next sprint will be focused around adding the remaining Phase 2 data centres (when the endpoints are accessible) and the Phase 3 participants.

That’s the end of this update. If you would like any further information about the project, or contact details, check the project page on the Jisc website.

Beta update – 5 May, 2017

If you attended the recent webinar for this phase of the project, or read the report on the webinar, you will know that we have launched the beta version of the Discovery Service and will be providing regular updates on the progress of the project.

We are now re-harvesting metadata from participants involved in phase 2 of the project, before harvesting from new participants who volunteered in phase 3. Following the re-harvesting, the focus will be on reviewing the prioritisation of existing and new requirements. The work is being done in two week sprints on our test server with fortnightly updates to the beta system. When there’s an update to the beta, an email is sent to the project’s mailing list to inform participants on what’s changed or new in this version.

From now on we’ll be posting the updates via the blog, as well as the mailing list. If you want to get more involved in the project you can subscribe to the mailing list (JISC-UKRDDS@JISCMAIL.AC.UK) or check this blog for regular updates.

Here’s the first of our updates from our developer Mark Winterbottom.

Technical Update (5 May, 2017)

The focus so far has been about making the system stable, reliable and scalable as well as harvesting data on the new beta site.

Key Improvements

Other Noticeable Changes

The following features have been disabled in order to fix bugs and update them to work with the new version of CKAN:

  • CSW Spatial Data Harvester.
  • Map search widget.
  • Reporting extensions.
  • Automated Resource scanner.

Focusing on Next

Here are the items I will focus on next:

  • Fix a bug in the OAI-PMH harvest where it fails to process the MODS metadata format (this is why data from the University of Hull has not been harvested yet).
  • Harvest data from Oxford Brookes University (I’m currently waiting for confirmation of endpoint details).
  • Fix any metadata mapping issues based on the feedback I get for the Phase #2 HEI’s.
  • Start adding Phase #2 Data Centres to the new system.
  • Re-enable spatial data harvesting.

That’s the end of this update. If you would like any further information about the project, or contact details, check the project page on the Jisc website.

Phase 3 – first webinar report

Introduction

The following post is a report from the first webinar (held on 27 April 2017) for the third phase of the Research Data Discovery Service project. The aims of the webinar were to welcome new participants, provide an update of the project, introduce the new beta version (http://researchdiscoveryservice.jisc.ac.uk), highlight progress and review requirements from phase 2 and 3.

Note: slide numbers are shown in red to show how the text corresponds with the following presentation.


Welcome and introductions

All new participants and existing participants were welcomed to the webinar (Slide 3). Participants from phase 2 were thanked for agreeing to continue to be part of the project. Some of the content from previous webinars and workshops is repeated for the benefit of new participants. This is the first in a series of webinars. Future ones will be providing project updates and encouraging open discussion. There are still plans for face-to-face workshops during the project, but only when there is a need and beneficial to the project AND participants.

The project team for phase 3 were introduced and all contributed to the webinar. They are as follows:

  • Christopher Brown – Project Manager
  • Catherine Grout – Project Director
  • Dom Fripp – Metadata Developer
  • Ade Stevenson – Technical Innovations Coordinator
  • Mark Winterbottom – Technical Developer

In phase 2 there were 9 HEIs and 6 Data Centres funded to participate in the project and a further 5 HEIs who volunteered later in the project. Since publicising the project in phase 3, asking for further volunteers, there are a further 9 HEIs and two organisations. (Slide 4/5). The aim is to include all HEIs in the UK with a research data collection, this will include all Shared Service pilots and IRUSdataUK pilots too.

Project update and overview

The project (Slide 6) is developing a platform that enables the discovery of research data from across UK higher education institutions and data centres, which will bring a number of benefits to these organisations (Slide 7). These benefits (Slide 8) include an increased visibility and transparency of research data. The project has been through a number of phases. Following the initial pilot (Slide 9), phase 2 funded a number of participants from HEIs and Data Centres to provide metadata for harvesting and work with the project to determine the requirements for a discovery service. There were a number of outputs from phase 2 (Slide 10), including the alpha test system with data harvested from participating HEIs and Data Centres.  In phase 3 (Slide 11) the project will move from a test to a production ready, tested service, include metadata harvested from more data sources and implement further requirements. A beta version of the service is now available (http://researchdiscoveryservice.jisc.ac.uk). This will be used as the basis for further development and include a complete re-harvest from all data sources.

So far, within phase 3, the focus has been on promoting the project to expand the number of participants and a lot of technical work has been going on behind the scenes (Slide 12). The following technical update summarises the work that’s been going on to produce this latest beta version:

  • Mark has been back on the project for 2 months and working on improving Infrastructure.
  • Alpha site was running on a single server which worked fine for showing the concept but had a few issues:
    • More than 8 services squeezed onto one server.
    • Single point of failure.
    • Disk was filling up with logs and data.
    • Harvesting process was taking a long time.
    • Database continued to grow without regular backups.
    • Manual process for deploying changes (slow and painful to push new updates)
  • Needed a solution that was secure, scalable, reliable and backed-up.
  • Decided to split the service up into containers using Docker
    • Can spread the services across multiple servers.
    • Can expand services when doing heavy processing like harvesting and resource scanning.
    • Can shrink resources when not running process intensive services.
  • Implemented Continuous Integration
    • Automate the process of pushing new versions to live.
    • Automate unit testing.
    • Improve speed at which we can iterate through bugs and features.
  • Make use of AWS hosted services such as RDS:
    • More stable, optimize DB with automated backups.
    • Offloads database maintenance to Amazon.
  • Since back, been working on configuring infrastructure and creating container apps.
  • Next steps:
    • Still need to add each organisation and harvest data.
    • Work through bug and feature tickets.
    • Add new HEI’s
    • Continue with dev process from phase 2 where we work in 2 week sprints and bi-weekly updates are sent to the email distribution list.

System status – Review of latest updates to the service

All organisations from phase 2 will have their metadata re-harvested to the new beta system (Slide 13). This includes the volunteer HEIs. Once this is complete we will start harvesting new participants from phase 3. The endpoints for all participants are listed in a Google Doc (http://bit.ly/RDDS3_harvest_status). This includes the current status for harvesting from each endpoint. The objective is to have all these working as soon as possible. When there is an issue, the JIRA ticket listed will provide the relevant details. All participants are included in this document. The new participants are currently in the “backlog” and will be added ASAP. The tickets will be set to “Done” (closed) when complete. Further issues will result in new tickets or tickets could be reopened.

Requirements (Slide 14) are listed and tracked using JIRA (https://jiscdev.atlassian.net/projects/RDD/). The categories of requirements were defined early in phase 2 after requirements gathering at the first workshop. User stories were collected and MoSCoW prioritisation (Slide 15) was used. Requirements were extracted from these user stories and from the HEI/Data Centres requirements reports (Slide 16). Following the latest re-harvesting, the focus will be on reviewing the prioritisation of existing and new requirements and implementing them in two week sprints. These will be implemented on the beta site with an email going out to the project mailing list showing what requirements have been implemented. The current issues are harvesting and metadata mapping (Slide 17) and we’ll look at other issues once these have been resolved.

Metadata

The two key aims for phase 3 centred around metadata concern the quality and representation of the harvested metadata within the CKAN client (Slide 18).

At the end of phase 2, we launched a vote for which metadata fields in the application profile would be of most benefit to a user of the service. The results of this vote are important for two reasons. Firstly, it gives a broad consensus around what fields are considered most important for discovery and what the minimal metadata for a record should contain.

Secondly, the vote can be used to order the metadata on screen so that a user is accessing the important metadata first. This can help simplify a record at the point of discovery (good UX), enable accurate citation, and, hopefully, encourage users to click through to the original repository record, which is desirable when there is additional metadata content as source, which might be of use.

In addition to this, the University of Glasgow will be conducting a piece of work in developing clear information and guidance for service users about the complex area of dataset rights and licences. This work will broadly follow the work that has been done in the cultural heritage sector recently to solve a similar problem (see http://rightsstatements.org/en/)

There has also been discussion with CORE (https://core.ac.uk/) to compare the services and look at potential ways of working together, especially in connecting data to papers.

The poll (http://www.tricider.com/brainstorming/2mnbqfgcOJp), on which metadata fields participants think should appear at the top of the record, was reopened following the webinar to allow new participants to have the opportunity to vote. For further information, see the core metadata schema (https://goo.gl/vWCX0z) and the UKRDDS metadata profile mapping document (https://docs.google.com/spreadsheets/d/1mjatKZKdhp_tFm6xnYJFpBgPLMNDdAue9FGy-oKFBYk/edit?usp=sharing).

Phase 3 (next steps)

The next steps for phase 3 (Slide 19), includes the implementation of requirements, listed in JIRA (Slide 20), via prioritisation and development sprints. The work still required (Slide 21) includes the following:

What are the aspirations for the future service (Slide 22)? The Discovery Service fits within the umbrella of the Research Data Shared Services project (Slide 23), which, under Research @ Risk, is developing a shared service (provided by Jisc) for effective Research Data Management. This offers a number of benefits:

  • Cost savings and efficiencies
  • Common approaches and practice
  • Research system standardisation and interoperability

The discovery service fits within this as a national aggregation service. We will be looking at integrating with the shared service further into phase 3. The “caterpillar” diagram (Slide 24) shows Jisc’s R&D process. Following the discovery and alpha stages, we’re now in the beta stage. The next step is to deliver this as a service. This is most likely to involve the Discovery Service being established (Slide 25) within the Jisc Digital Resources directorate’s set of services (https://www.jisc.ac.uk/content). This will involve consideration of a number of areas including:

  • Establishing a service team and how this fits with Research Data Discovery Service activities
  • CKAN production specific installs – e.g. sandbox or user acceptance test machines
  • Ongoing OAI and other endpoint harvesting configuration, management and documentation
  • Setup and ongoing management of admin and Discovery Service user accounts, updates, patches, and security (firewall, intrusion detection, DDOS etc.)
  • Various other system admin tasks such as backup, disaster recovery, log config and rotation, DNS, proxying, caching, mail routing, system performance testing, system monitoring
  • Set up of any required service supporting applications, e.g. wiki for documentation, blogs etc.
  • Dealing with ongoing developments including necessary developments in response to essential new requirements or ongoing service enhancements
  • Community building for use of the service
  • Training / Workshops
  • Promotional events and social media.

An essential part of the project is ensuring participants provide feedback on how the system is developing, confirm the requirements are implemented and checking their metadata (Slide 26). In phase 2 there were a number of advisory groups set up to support the project. Originally, there was going to be one advisory group in phase 3, but so far there hasn’t been a need as all communication is shared via the mailing list. However, we will set up groups as required, especially when we need a more focussed discussion on areas such as technical development or metadata, for example. The JISC-UKRDDS mailing list will continue as the main communication outlet and there will be further webinars to update everyone on progress. Workshops will be held as required for feedback and face-to-face discussions.

Some useful links to support the project (Slide 27) include:

Questions

Comments were made during the webinar and these were followed up via email by participants. However, a number of questions were asked and these are collated here.

What are the plans for working with Pure/Elsevier and new Pure API (v5.9), due out in June 2017?

There have been ongoing discussions with Elsevier as part of this project and the Shared Services. The service did work with a previous version of Pure via OAI-PMH. We will endeavour to use the new functionality within v5.9 to harvest into the Discovery Service.

I note the service is still linking directly to individual files. As ever, still don’t think this appropriate! Are we retaining this model?

We will look into this functionality to see if it can be improved once the harvesting and mapping work is complete. We want the system to be as easy-to-use as possible and this includes accessing the underlying data. We’ve also been looking at how other data portals work, particularly those built using CKAN.

Do we know when next RD Shared Service pilots’ day is?

This is still to be determined but the Shared Service project will contact all the pilots to let them know.

Closing comments

Participants were thanked for joining the webinar and contributing to the project. They were reminded that progress updates will be sent to the mailing list and are encouraged to actively engage with the project. There will be a demo of the Discovery Service at the next Research Data Network event and participants were encouraged to attend. This event will be held at the Ron Cooke Hub, University of York on 27/28 June 2017 (Slide 28). You can register at https://www.jisc.ac.uk/events/research-data-network-27-jun-2017. The programme is available online at https://research-data-network.readme.io/docs/4th-research-data-network-york-university.

 

UKRDDS Phase 3 – enabling discovery of research data

In my previous post, I described the latest phase (three) of the Research Data Discovery Service. In this post I’d like to describe in more detail the plans for harvesting metadata from other UK HEIs and Data Centres with research data collections.

In phase 2 metadata was harvested from 9 HEIs (Hull, St Andrews, Glasgow, Oxford Brookes, Edinburgh, Oxford, Southampton, Leeds and Lincoln) and 6 Data Centres (Archaeology Data Centre, Cambridge Crystallographic Data Centre, ISIS/ICAT – STFC, UK Data Service, Visual Arts Data Centre and NERC), all funded to participate in the pilot. The participants also provided a set of requirements for a discovery service, provided harvestable endpoints and helped test the alpha system as it developed. As the project progressed, a further five HEIs (Sheffield, Bath, Nottingham, Lancaster and Bristol) volunteered to be involved and we started to incorporate their metadata into the system near the end of phase 2.

I’m glad to say that all of the participants from phase 2 are keen to continue to be involved in the project. In phase 3 we plan to add as many UK HEIs and Data Centres, that have research data collections, into the Discovery Service. There are a number of institutions that are also part of the Research Data Shared Service and the Research Data Metrics for Usage projects and, if they haven’t already been involved in this project, we will be looking to include them as well. However, at this point we would like to hear from any other institutions that have a research data collection and an endpoint that we can use to harvest the metadata into the Discovery Service. It was clear in phase 2 that some institutions have well established research data management policies and practices, while others are less well advanced. It doesn’t matter what stage of this process you have reached, we would still like to hear from you.

We will be working to enhance the current test service, adding functionality to match requirements, and ensuring there is a fully functional and tested system ready to transfer to a production service (provided it meets the relevant criteria and the business case is agreed within Jisc). Incorporating other participants’ metadata (potentially for all HEIs with research data collections) is an important objective of the project.

If you are interested in being involved in this latest phase of the project, or would like to discuss this further, please contact Christopher Brown.

 

UKRDDS Phase 3

The latest phase of the UKRDDS will run from October 2016 to September 2017 and follows on from the second phase of the project. This post summarises work from the second phase and what’s planned for this third phase.

Phase 2

This Jisc-led second phase of the project ran from March 2015 to September 2016 and included support from the Digital Curation Centre and the UK Data Service, on HEI and Data Centre engagement respectively. It built on the pilot work with the aim of running a test UK Research Data Discovery Service. The main aim of the second phase was to lay the firm foundations for the service by harvesting metadata from 9 HEIs and 6 Data Centres, each funded to participate in the project. These pilot organisations provided metadata of their research data collections for harvesting, provided a set of user requirements and helped to test the alpha system. The alpha service was made publicly available during development to ensure the research community had the opportunity to test its functionality. This phase came to an end with a final workshop for all participants where the alpha system was tested, requirements were reviewed and the plans for the next phase were presented.

Phase 3

The third phase of the project has the following objectives:

  • moving the test service from alpha to beta;
  • enhancing the service by adding further requirements;
  • incorporating other participants’ metadata (potentially for all HEIs with research data collections);
  • running as an enhanced beta service to allow for further testing;
  • at the end of the project have a fully functional system ready to operate as a service (provided it meets the relevant criteria and the business case is agreed within Jisc).

It’s hoped that all participants from phase 2 continue to be involved during phase 3 of the project. In phase 2 there were three Advisory Groups – User, Researcher and Technical & Metadata. However, in phase 3 there will be one Advisory Group with voluntary participation from all those HEIs and Data Centres having their metadata harvested into the service. It’s expected that sub-groups could form to discuss specific issues, for example metadata mapping. This structure will ensure the project continues to get input and feedback from participants to ensure the system satisfies the needs of its users.

At the final workshop (see previous post) valuable feedback was provided as to how to make sure the project is a success. This includes ensuring the project engages with researchers and other users as soon as possible to further test the system and make sure it is satisfying their needs and not just those of the participants. Also, requirements have been mainly coming from the data collection perspective, but these need to be gathered from the user perspective sooner rather than later. Phase 2 focussed on primary types of data but we should look at secondary types of data (see scope of datasets) in the context of researchers using the service. Other use cases need to be considered, such as those from a funder’s perspective. Other questions raised included: What about all the other data internationally in subject based data centres – do we want that or not? Is the distinction between UK and non UK data important? For now, the focus remains with UK datasets.

This work will allow us to move from a test service to a production ready one. We will be able to harvest from more data sources, do more formal and informal system testing, look at further requirements (refining and implementing them), develop a business case for the service with the ultimate aim of delivering a more mature and tested service to Digital Resources (the area of Jisc that runs and supports services, such as the Archives Hub).

In developing additional functionality we will review existing requirements set to “won’t” (from the MoSCoW prioritisation process performed early on in phase 2) and out of scope, gather further requirements from the final workshop, and potentially other requirements, integrate more closely with the Research Data Shared Service work and the IRUSdataUK project.

How to get involved

In phase 2 metadata was harvested from 9 HEIs and 6 Data Centres funded to participate in the pilot. A further four HEIs volunteered to be involved in the project and their metadata was added to the system near the end of phase 2. In phase 3 the plan is to add more (if not all) HEIs and Data Centres that have research data collections. This will necessitate a set of requirements to join the service. These are currently being finalised, but the minimum requirements are:

  • Research data metadata can be provided
  • There is a harvestable endpoint
  • It’s a supported schema
  • A named contact is available for support to
    • Check harvest and metadata
    • Report issues
    • Request manual harvest, if required
    • Liaise with the developers when adding metadata to the service
  • Jisc will provide a developer/admin to liaise with the support person

If you are interested in being involved in this latest phase of the project, or would like further information, please email Christopher Brown.

UKRDDS Phase 2 Final Workshop

The project team and representatives from all participating pilot HEIs and Data Centres convened in London for the third workshop of the UK Research Data Discovery Service on 13 October 2016. This was the final workshop of what is now known as phase 2 of the project, which ran from March 2015 to September 2016 (extended from the original end date of July 2016).

The objectives of the workshop were to review the second phase of the project, discuss what still needs to be achieved in the next (third) phase of the project and how people can be involved and engaged.

Prior to the workshop, all the relevant sources of information were collated on the workshop’s padlet. This includes links to the shared notes, an online app for collecting sticky notes, all supporting documentation and slides.

To collect as much feedback as possible during the workshop, in addition to the exercises, posters were put up titled Questions, Issues, Ideas and a FLAP (Future considerations, Lessons learned, Accomplishment and Problem areas) board for phases 2 and 3. Any notes added to these posters have been transcribed into the shared spreadsheet mentioned in the group exercises.

Presentations

The day started with Catherine Grout describing what had changed in the landscape since phase 2. The research data discovery service sits within a suite of Jisc work called “Research at Risk”, which offers tools, services, advice and guidance to those involved with research data management in the UK. In particular the Research Data Shared Service will offer a simple solution that meets the needs of institutions and the requirements for funders.

The project is managed by Christopher Brown and he summarised the work of phase 2. This phase had brought the pilot into alpha status, laying the firm foundations for a potential service. A further year of work will make this a more hardened system with further testing and user feedback, making the service more valuable and useful. Further HEIs have been brought into the project, in addition to the original participating pilots.

User stories, supported by a “MoSCoW” prioritisation process, have driven the development of a range of outputs resulting in the alpha system and associated research and documentation (links to the latter, and a list of participants, are available via the padlet).

Recent focus has been on system testing, with changes made on the staging server and using the live server as a benchmark for testing. Harvesting continues, alongside development on other requirements and specific issues (NERC, VADS) and the addition of other HEIs (Nottingham, Sheffield, Lancaster, Bath and Bristol). Feedback on the project and participating pilots’ involvement will be an important method of assessing phase 2 and directing phase 3.

Dom Fripp has worked on metadata mapping for the project. A new “metadata profile document” has been circulated and is open for comments and questions (currently on version 1.1), alongside a mapping document. These documents inform the work of our developer in building the metadata schema into CKAN. The mapping exercise is very important work and is of interest globally – comments are very welcome. This is still a live process, and issues that arise should be shared and reported to be addressed in future development (the example of issues related to migration between DataCite 3 and 4 was noted). In future this documentation will be migrated to github.

Group Exercises

The main focus of the workshop wasn’t to listen to presentations but for participants to engage in a number of group exercises.

The first exercise was to assess and test the current alpha system on the staging server. Delegates could work alone or in groups at their tables. There were four tables and reporting back was done one table at a time. The areas suggested for testing included – your organisation’s metadata; any fields missing; is the harvested data correct; search functionality; presentation of results; usability. These were suggestions and other areas could be tested.

Notes were added to a poster under the categories of Bug, Error and Feedback. These have been transcribed into the following shared spreadsheet (along with notes from the Requirements exercise). These will be reviewed and checked against existing JIRA tickets. For any new issues a new ticket will be created.

The second exercise was a follow on to the first and delegates were asked the following questions:

  • Does the service satisfy the requirements of your organisation?
  • What further requirements should be added?
  • What should be improved?

They were asked to write their answers down on sticky notes and put them on a poster under the following categories: Drop, Add, Keep or Improve.

As with the first exercise, the notes have been transcribed into the shared spreadsheet.

Both exercises provided valuable feedback on the current system and ideas for future requirements.

The Road Ahead

The day finished with Christopher Brown describing plans for the next phase of work and how participants could be involved.

In phase 2 we engaged with participants and gathered user stories, prioritised and implemented requirements based on these user stories, evaluated software and chose CKAN, developed an Alpha system, harvested metadata from participants into this system and are now moving to Beta.

Phase 3 will run from October 2016 to September 2017 and will allow us to move from a test service to a production ready one. We will be able to harvest from more data sources, do more formal and informal system testing, look at further requirements (refining and implementing them), develop a business case for the service with the ultimate aim of delivering a more mature and tested service to Digital Resources (the area of Jisc that runs and supports services, such as the Archives Hub).

In developing additional functionality we will review existing requirements set to “won’t” and out of scope, gather further requirements from this workshop, and potentially others, integrate more closely with the Research Data Shared Service work and the IRUSdataUK project.

It’s hoped that all participants would continue to be involved during phase 3 of the project. This would be at a level expected from all new participants wishing to have their metadata harvested into the discovery service.

The day ended with a thank you to all the participants and the project team for the help and support in running an engaging and productive workshop, and to all the participants who have helped throughout phase 2 of the project.