Introduction

This document contains consolidated Wf4Ever user requirements from all sources.

Astrophysics user requirements

This section contains requirements articulated by astrophysics researchers. The information is sourced from project deliverables D4.1 and D6.1.

Id As a... I want... So that... Benefit Impact Source Comment
(for self) (for science)
UA1 comparator to know details about the data used as input to a workflow execution I can know under which conditions was the original data retrieved, and the provenance of that data itself. It is extremely relevant to know the provenance/ conditions of your data in a sense these two factors are related with quality of the science involved in the workflow. 4 5 D4.1 [1]
re-user 4
evaluator 5
UA2 comparator, re-user, evaluator to know the details about the data transformations that take place during the workflow execution I can check all the transitory data. For astronomers this is quite important since they are looking to see what happens when one upgrades properties of objects in catalogues, and how this updates propagates to other quantities. 4 5 D4.1 [1]
UA3 comparator to know the status of the decision points in the execution of a workflow I can see the branches taken in a workflow execution and the rationale behind them 3 2 D4.1 [1] OK as long as it does not affect the quality of the science
re-user, evaluator 4.5
UA4 comparator, re-user, evaluator to move (back and forth) from different versions of the same RO/workflow I can inspect their differences. This is very relevant, as the different versions of the same workflow may lead to better quality of science. 4.5 4 D4.1 [1]
UA5 comparator, re-user, evaluator to navigate relevant metadata information by following the links among them and to have an advanced visualization of this information I can have a more human-oriented presentation of and access to the metadata information. This is really important for the users since we are talking about visual influence over the user independently of his/her role; having all the data available with the touch of a click is a fundamental idea. 5 2.5 D4.1 [1]
UA6 comparator, re-user, evaluator to visually see the evolution of Ros/workflows over time I can see how they change from version to version. A picture is worth a 1000 words, and this is something that is valuable for Astronomy. 5 5 D4.1 [1]
UA7 comparator, re-user, evaluator to express a (subjective) opinion about the quality of external data sources I or my colleagues can use these external data sources with care. This is very controversial matter, as assessing the quality of data/articles/software is quite subjective. (Users might have biased opinions about data/services that make this a very complicated matter.) However, overall the importance for science is huge, as good quality data / reliable services all improve science. 3 4.5 D4.1 [1]
UA8 comparator, re-user, evaluator to know the Quality of Service of external data sources I can choose the most reliable ones among services of similar functionality. 3 4.5 D4.1 [1] See UA7
UA9 comparator, re-user, evaluator to reproduce a scientific experiment consistently. I can obtain the same results if the inputs and transformations are the same. This is very important; it means that if a workflow is clear enough to be reproducible and consistent, it saves time, and it is trustworthy. It also means that if you fully understand it you can improve it. If methodologies become easily repeatable, the science results can be easily verified by independent teams leading to accuracy and quality. 5 5 D4.1 [1]
UA10 comparator, re-user, evaluator to debug a workflow I can fix it if the results of their execution seem to be inconsistent, incomplete, etc. This is very important, as this directly affects the quality of you final product 5 5 D4.1 [1]
UA11 General User to identify and choose the platform interface. D5.1 [2] The collaborative platform should differentiate the specific basic stage of the RO lifecycle (Live Objects, Publication Objects, Archived Objects). Since the potential actions and decisions to be taken by the user depend on these different levels of the RO lifecycle, the user should have the possibility to choose whether working in a living RO platform or consulting the digital library of RO.
UA12 Collaborator, Re(User) to build, manage, re-use and enact scientific workflows. D5.1 [2] Given the current state of the art of astronomical workflows as characterized in this document, astronomers will need a tool for these tasks, as it is already the case with other scientific communities using software as Taverna. This workflow management tool should integrate access to both VO data repositories and VO compliant web services, considering communication and interoperability with widely used VO software as Topcat1, Aladin2 and VOSpec3 The re(user) would also like to have the possibility to concatenate existing workflows and to extract useful components from RO (data, processes, web services, scripts) in order to use them in his own research.
UA13 Collaborator to create a and share and research information in a research group D5.1 [2] Seamless integration of newcomers in the Wf4Ever platform
UA14 Reader, Comparator, Re(User), Evaluator, to search and retrieve research information D5.1 [2] The way searching and retrieving of research objects is performed would not be very different from current procedures for searching bibliography, allowing search in fields like author, abstract, keywords, publication dates, etc. Since a primary goal is to bring astronomers into the platform, thre are more chances of succeeding if proposing well-known friendly procedures. This is particularly relevant, and the concept of “familiarity” is a shared concern amongst astronomers and biologists (as mention in D6.1). In addition, providing the user with a good tagging system is also a common necessity amongst biologists and astronomers.
UA15 Collaborator to update and delete research information or part of its content D5.1 [2]
UA16 Publisher to publish research information with appropriate supporting details D5.1 [2] The way to define how the RO is "advertised" or indexed should be an efficient and easy task for the publisher.
UA17 Collaborator to control and manage access to research information D5.1 [2] In general Publication Objects may be freely accessible to all users for re-use and Live Objects should be subject to more restraint access privileges. Functionalities for access management should be provided, modular and flexible enough to deal with users and groups, and at different levels of granularity in the components of the RO.
UA18 Collaborator, Re(User), Evaluator to store and identify different versions of the research information or its components D5.1 [2] Versioning may depend on whether we deal with Live Objects, Publication Objects or Archived Objects. When working with Live Objects, the platform should allow the users to choose not to backup full versions of the RO when a modification is made in the components (data or processes involved). Because of the big volumes of data involved it may be reasonable not to store multiple versions of the same RO. On the contrary, it may be useful in this case to allow versioning of individual components of a RO, those not suffering of heavy sized issues. For example, automatic versioning of metadata (contributors, access privileges, modification date, etc.) may be implemented to trace the timeline provenance of these metadata. Publication Objects or Archived Objects may depend on external resources that may evolve or being suppressed with time. In this case versioning of these RO may be useful to check their integrity and authenticity.
UA19 Collaborator, Publisher seamless integration of metadata D5.1 [2] As opposed to other users, astronomers are not familiar with Linked Data principles and RDF syntax; because of this all declarations of metadata information related to a RO should be transparent at the moment of creating a Live Object, publishing a Publication Object. The platform should provide a protocol or interface for an easy publication of a RO with assistance on metadata, with the aim of not discouraging contribution. The ROBox developed in WP1 is a very good example of how research objects and users are seamlessly integrated in the Wf4Ever platform in an unobtrusive way.
UA20 Evaluator to rate, comment, recommend research information D5.1 [2] These actions apply not only the whole RO but also to specific components of it, relating to quality criteria in reproducibility, repeatability of the results and re-usability and usefulness.
UA21 Re(User) to take advantage of the modular and decomposable nature of Research Objects. D5.1 [2] It should be possible to import other RO or their components into a Live Object. In this respect, RO should not be closed entities but fully decomposable with seamlessly accessible components.

Bioinformatics user requirements

This section contains requirements articulated by bioinformatics researchers. The information is sourced from project deliverables D4.1 and from researcher-described scenarios.

Id As a... I want... So that... Benefit Impact Source Comment
(for self) (for science)
UB1 re-user to search => find (parts of) existing workflows that do something similar to what is required for my experiment I can get ideas for my own work 4 4 MR [1] Consider benefit to researcher for now, keeping wider impact in view, in expectation that will flow through later. Try to capture both aspects in “So that...” column.
I can compare with my work ('related work') 2 4
I can reuse in my own analysis 5 4
I can understand a method 3 3
I can reuse the results 5 4
UB2 re-user to reference the experiment that provides the input to the workflow (including paper reference, if any) I can acknowledge data providers 2 5 MR [1]
UB3 re-user to record my initial experimental hypothesis I can link the experimental design to its purpose (onset) 2 4 MR [1]
others can find my work based on its purpose 2 4
UB4 re-user to assemble conceptual workflow I can discuss my plans with supervisors/peers 3 MR [1] “conceptual workflow” is a concrete workflow that makes reference to unimplemented or unspecified elements; e.g. Taverna w/f with reference to dummy elements (e.g. can fake with beanshell)
I can start searching for components
UB5 re-user to adapt an existing workflow to my needs I can avoid having to learn (or remember?) everything about creating workflows (note: newbie scenario) 4 4 MR [1]
UB6 re-user to search for workflows by keywords about purpose, context, and outcome of experiment I can find workflows relevant to my experiment 5 5 MR [1]
UB7 re-user to run an existing workflow with different data I can understand the workflow 5 4 MR [1]
re-user I can get more biological results efficiently
UB8 re-user to compare results of workflow run with other results I can understand the results in biological terms, compare with competition 3 4 MR [1]
UB9 re-user to create new workflows with my own scripts I can ''get on with it' without trying to understand other people's work 5 4 MR [1]
UB10 re-user to run workflow scripts provided by colleagues I can perform experiments for which the methods are familiar to my direct colleagues 5 3 MR [1]
re-user I can shorten the start-up phase and lower the risk of unexpected bottlenecks
UB11 re-user to record notes relating to experiment (design and run-time log) I can document my considerations during design and execution, linked to design, run, and input/output data. 2 5 MR [1]
UB12 re-user to record revised experimental hypothesis I can link experimental design and execution to the purpose of the experiment 4 5 MR [1]
UB13 re-user to search for 'follow-up' workflows I can create a new workflow or revise my own to test the improved hypothesis or new questions derived from the previous results 4 3 MR [1]
UB14 re-user to record reasons for workflow revision Link new workflow to previous workflow 2 4 MR [1] Expand benefit?
UB15 re-user to link test results and interpretation to initial hypothesis retrieve past information about and interrelations between hypotheses 4 5 MR [1]
UB16 re-user to see the rationale for existing workflows I can better understand an existing workflow, and find help to interpret results of my own workflows MR
UB17 re-user to annotate data with information from global databases I can relate results to work from other experiments MR (needs clarification/example)
UB18 re-user to record biological interpretation of results, linked to experiment and data used I can organize my information for me, my colleagues and later publication MR
other researchers interested in investigating similar biological phenomena can find and re-use my work.
UB19 re-user to record the origin of experimental data used for analysis ?? KH [1] In this scenario researcher Kristina will take the result of a genome wide association study experiment (GWAS) and a metabolomics experiment (SNPs related to metabolites) and map it to biological concepts using text mining technology. The GWAS and metabolomics experiments were done by others than Kristina. She makes a note of the origin ...
UB20 re-user to record information that must be kept confidential KH [1] … (including the still confidential research proposal for the study)
UB21 re-user to adapt existing workflows for a new study I can benefit from the knowledge and previous work of colleagues KH [1] Kristina is new to workflows, so she wishes to start with an existing workflow. When she started on the project she was introduced to people that perform related work. She knows that one of her colleagues is making a workflow for the mapping of SNPs to genes, and then onto pathways. She also knows that another colleague has created an abstract workflow for parts of the text mining technology that she wants to use.
UB22 re-user to understand existing workflows I can benefit from the knowledge and previous work of colleagues KH [1] She wishes to understand these workflows and asks her colleagues to explain their functionality to her.
UB23 re-user to build a new workflow from more than one existing workflow I can benefit from the knowledge and previous work of colleagues KH [1] She decides that she would like to build a workflow that would integrate the text mining technology with the one already implemented for the SNP-gene-pathway pipeline.
UB24 re-user to record an abstract (not-yet-implemented) workflow I can explain my needs to colleagues who will implement processes I shall need KH [1] However, the abstract workflow for the text mining technology will probably take a long time to implement since the underlying web services that are needed do not actually exist yet. She is dependent on another colleague to implement these for her. At this point she decides to make an abstract workflow of her own, that she will use as a reference when explaining her research and analysis steps.
UB25 re-user to run a workflow with an alternative processing tool ...OR... to partially execute a workflow, with unavailable sections replaced by manual processes ????? I can make progress in my research while waiting for processing tools or other workflows to be implemented KH [1] This tool is however in need of an update, which she will perform. In the mean time, the ordering customer, which is the scientist that provided the original data, is getting impatient and wants results. She therefore decides to run the analysis with the version of the tool that is available. She uses the output from the already available SNP-gene-pathway workflow to map SNPs to genes, ending up with a list of genes for a metabolite. This lists she feeds into the tool.
UB26 re-user to record information that is discovered from other external sources I can keep track of reasons for decisions made in the conduct of my studies KH [1] Ultimately, the workflow (or at this point the tool that is actually available) suggests for example that one gene in the set of genes associated with a specific metabolite actually is associated with the metabolite via the Gluconeogenesis biological process. It is one of the two main mechanisms humans and many other animals use to keep blood glucose levels from dropping too low (hypoglycemia), information he finds on the Internet. She makes a note of that and goes back to the tool to see if Gluconeogenesis is a process that is known to be associated with MetS.
UB27 re-user to repeatedly re-run a workflow with different input parameters and/or data I can look for potential links between my results and other known biological mechanisms KH [1] She will perform the same type of analysis for the different genes associated with the metabolite, and also for the list of genes as a whole. These analyses will lead to a number of candidate biological processes hypothesized to explain the gene-metabolite associations.
UB28 re-user to record an interpretation of results obtained, linked to the experimental record and input data used KH [1] This is now her interpretation of her experiment that she wishes to link to her experiment and the processed data.
UB29 re-user to record notes for future work KH [1] Kristina notes that in a next cycle she will want to perform another workflow that compares her results using text mining with the results from the already available SNP-gene-pathway workflow.
UB30 re-user link workflow results to previous results, procedures and hypotheses I can subsequently retrieve information about the relationship between hypotheses and results obtained KH [1] In this case she will link the result from this workflow to the previous experiment and in particular to the initial hypothesis. At some point, she wishes to be able to retrieve this past information and the interrelationships among his hypotheses.
UB31 re-user Extract information from a working RO for publication I can easily use my working notes as a basis for creating the distilled information needed for publication. KH [1] Assuming her finding and new hypotheses are valuable and new, she will publish her results. The publication has cleaned information, sufficient for evaluating her hypothesis and rerunning the one workflow and the one dataset that lead to this result.
UB32 investigator the repository submission process to help me arrange data according to accepted standards subsequent use of preserved information is or can be enhanced MR [2] submission is the best moment to help users follow standards (for useful preservation; i.e. enhance usage later on)
UB33 investigator archive submission of genetic variants that are part of a paper to suggest a standard encoding of genetic variants the papers mentioning a genetic variant can be semantically linked to each other, and to database records MR [2] For comparison: when genetic variants are submitted to an archive and as part of a paper, we can suggest the proper encoding for genetic variants (this is non-trivial). This semantically links the papers mentioning the variant to one another and to any database using the same encoding for variants.
UB34 investigator archive submission of a workflow to be matched with the [prevailing? experiment?] Research Object model, requiring a user to correct mismatches and resolve ambiguities (common format elements facilitate subsequent processing?) MR [2] For workflows I imagine that submission would be the moment of matching a workflow with the RO model. When mismatches occur, we can ask the user to disambiguate or add missing information.
Assumes some kind of RO model specific to the experimental circumstance?
This may come with some 'pain', because it may require a mapping for existing resources. The process is facilitated by publishing the RO model.
UB35 investigator every submission of an artefeact and/or notes to a “Living RO” to capture provenance (who, when, etc.) about that submission Information is properly prepared and annotated for subsequent archival. MR [2] Example for living ROs … I would like to see these in the context of helping users keep a proper notebook. In my vision, the notebook creates the archive (or prepares for archival). Every submission of an artefact+notes comes with storing provenance information, that a user can later use to look up information before publishing. So for instance, when a workflow produces an output that we wish to keep, we don't have to worry about its timestamp and metadata remaining linked to it.
Provenance information can be used to locate and keep track of saved information prior to publication In summary: I expect that the process of submission for living ROs would look like an aid in keeping track of your experiment for users
Useful workflow outputs acquire a provenance history that support the result.
UB36 author publication ROs to be created from living ROs by a process of copying, pruning and editing (the publication RO is clearly and explicitly linked to day-to-day lab work??) MR [2] Example for publication ROs … I expect that the process of publishing is a matter of copying, pruning, and editing a living RO.
UB37 author assistance in generating a publication RO from a living RO by matching living and publication RO models the process of preparing a publication object is facilitated MR [2] This may be facilitated by matching to the potential difference that may exist in the RO model instance of the 'living' and 'published' ROs. It can guide in what to remove (e.g. intermediate datasets),
UB38 author enforcement of minimum requirements for a publication RO to be enforced. (adherence to community standards for publication is maintained) MR [2] and it may be more strict about which information should minimally be provided, similar to the strict requirements for publications
UB39 investigator, author RO submission to suggest/encourage provision of information that will later be needed for publication subsequent publication of the results of an investigation is made easier MR [2] (In practice this is the time where as user you feel sorry that you didn't provide this information to the living object earlier.) Taking E's case as an example. She is producing a workflow from the compendium of R scripts that she used to explore the effect of so-called CpG islands (specific sequences often found in front of genes; their locations taken from an on line data source) on the change of gene expression in brain regions affected by Huntington's Disease. She has done several analyses for different kind of statistical plots (we assume these are in the living RO4), and now wishes to fix the workflow (in Taverna) and select the results for publication.
UB40 investigator tools to use standard formats for representing underlying experimental data and models interoperability between systems and users is facilitated MR [2] Note: within our department we are developing a tool for generating interfaces for command line tools; the underlying model is in RDF. For each tool (e.g. Galaxy tools) a mapping is made in a template language. This is a bottleneck, but the RDF part facilitates interoperability.
UB41 investigator captured information to conform to standard semantic “study framework” models ?? MR [2] It may be worthwhile to align with Study Capture frameworks here. I suggest to try to use the same semantic models under the hood as much as possible.
UB42 investigator working ROs to be presented and managed as a personal workspace I can work privately with my notes and intermediate results until I am confident of having some result worth sharing MR [2] It may help if living RO's seem entirely personal, until a user is ready to share. Eleni is a typical PhD student/scientist; feeling uncomfortable sharing until she is entirely confident about the result. NB, following the wet-lab analogy this is not acceptable; notebooks are in principle property of the department and open to it (in practice this is not nastily adhered to though)

General user requirements

This section contains requirements of unrecorded origin, sourced from project deliverables D2.1 and D3.1.

Id As a... I want... So that... Benefit Impact Source Comment
(for self) (for science)
UR1 Creator to create workflows I can automate and streamline aspects of my investigation D2.1 [1] D3.1 [2]
UR2 Creator to collect data I can conduct an investigation D2.1 [1] D3.1 [2]
UR3 Creator to aggregate existing resources (workflows, datasets, experiments, etc…) I can conveniently access related resources from a single place D2.1 [1] D3.1 [2]
I can be sure that I have a matching collection of resources
UR4 Creator to reference data stored elsewhere I can aggregate data that is larger/more complex/restricted D2.1 [1] D3.1 [2] @@CHECK: “restricted”?
UR5 Creator to describe the relationships between aggregated resources other researchers can see how the resources fit together D2.1 [1] D3.1 [2]
UR6 Creator to describe the relationships between aggregated resources I can facilitate the automation of processing of aggregated resources D2.1 [1]
UR7 Creator to be recognised as the creator of an a given scientific output or result I get credit D2.1 [1] D3.1 [2]
UR8 Creator to assign a persistent URL to an aggregate scientific content I can include the link in my book D2.1 [1] D3.1 [2]
UR9 Creator to record which web services were used by workflow I can track web service changes D2.1 [1] D3.1 [2]
I can give citations to external resources used
UR10 Creator to embed (links to?) other’s publications in my working notes I can later find related reference material/citations D2.1 [1] D3.1 [2] @@TODO: reworded, check
I can get information when designing my experiment
UR11 Creator to record notes while designing a workflow I can later pick up my thoughts around a part of workflow D2.1 [1] D3.1 [2]
I can disseminate reasoning behind my design decisions
UR12 Creator to annotate experimental results using semantic models I can find/show links to other, relevant research objects D2.1 [1] D3.1 [2] @@TODO: What is meant here by “semantic models”?
UR13 Contributor to provide a workflow it can be incorporated or used in an investigation D2.1 [1]
other researchers can review the processing performed
other researchers can repeat the processing performed
UR14 Contributor to provide new or updated scientific data/results investigations are up to date D2.1 [1] D3.1 [2]
UR15 Contributor to modify contents I can fix a known error with a workflow or investigation D2.1 [1] D3.1 [2]
UR16 Contributor to be credited for my contributions to a research publication I get credit D2.1 [1] D3.1 [2]
UR17 Contributor to have access to the work and scientific content carried out by another researcher I can contribute to shaping the work before it’s public D2.1 [1] D3.1 [2]
UR18 Collaborator to provide content it can be incorporated or used in an investigation D2.1 [1] D3.1 [2]
other researchers can review the processing performed
other researchers can repeat the processing performed @@TODO: how does providing content help here?
UR19 Reader to find relevant materials I can understand the field D2.1 [1] D3.1 [2]
UR20 Reader to browse an overview I can determine whether there is something useful for me D2.1 [1] D3.1 [2]
UR21 Reader to survey the field check whether something has been done before D2.1 [1] D3.1 [2]
UR22 Reader to examine the relationships between resources I can understand the relationships between resources D2.1 [1]
UR23 Reader to access data I can look at it and use it for my own purposes D2.1 [1] D3.1 [2]
UR24 Reader to access metadata I can see where data/methods came from D2.1 [1] D3.1 [2]
UR25 Reader to follow the steps taken in certain research activity I can understand the investigative process or method D2.1 [1] D3.1 [2]
UR26 Reader to find workflows by purpose I can investigate different approaches to the same problem D2.1 [1] D3.1 [2]
UR27 Reader to find workflows according to their reputation I can investigate approaches that have been acknowledged as being correct for the same problem D3.1 [2] @@TODO: or reputation of their creator?
UR28 Reviewer to rerun an investigation I can validate that the results are as given D2.1 [1] D3.1 [2]
UR29 Reviewer to examine the relationships between resources I can validate those relationships D2.1 [1] D3.1 [2]
UR30 Reviewer to access data I can validate the data used D2.1 [1] D3.1 [2]
UR31 Reviewer to check if external data has changed I can determine if results are still valid D2.1 [1] D3.1 [2]
UR32 Reviewer to follow the steps taken I can validate the investigative process and identify any problems D2.1 [1]
UR33 Reviewer to examine the resources related with a given research I can determine the source of those resources D2.1 [1] D3.1 [2]
UR34 Reviewer to rate research content I can recommend materials to colleagues D2.1 [1] D3.1 [2]
UR35 Comparator to compare some scientific data/results with others I can determine whether the investigation is novel D2.1 [1] D3.1 [2]
I can understand the differences between investigations
I can consider reusing it in the future
UR36 Re-user to build a new workflow based on an existing one I can do something new with less effort D2.1 [1] D3.1 [2]
I can use an existing, known, validated methodology
UR37 Re-user to build a workflow using components/parts of another workflow I don’t have to investigate how to use a service D2.1 [1] D3.1 [2]
UR38 Re-user to run an existing workflow with new data I can get new results by using existing procedures D2.1 [1] D3.1 [2]
UR39 Re-user to rerun parts of a workflow I can avoid re-running long-processing parts of workflow when only some of the data has changed D2.1 [1]
UR40 Re-user to use results from an existing investigation as input to a new one I can build on existing results D2.1 [1] D3.1 [2]
I can build on existing data
UR41 Re-user to see and navigate versions of a workflow I can use the latest working version D2.1 [1] D3.1 [2]
I can better understand a workflow by understanding how it has evolved
I can see how the latest version of a workflow differs from an earlier version I may have used
UR42 Re-user to extract content from a collection research details I can reuse that content for other investigations D2.1 [1] D3.1 [2]
UR43 Re-user to be able to work in a collaborative fashion on the same workflow and associated scientific material I can work with my team without having to carry out the manual synchronization process based in the interchange exchange mails with custom scripts, workflows and draft versions of papers and having to update D3.1 [2]
UR44 Re-user to find workflows and their associated scientific material according to their reputation I can reuse approaches that have been acknowledged by the community as being correct for the same problem D3.1 [2] @@CHECK: duplicate above?
UR45 Re-user the system to take into account my past workflow and dataset selections in next iterations with the system it filters scientific material that doesn’t match my past criteria D3.1 [2]
UR46 Publisher to publish a scientific result with appropriate supporting materials it is available for others to see or use D2.1 [1] D3.1 [2] @@CHECK: reworded
UR47 Publisher to provide references to existing scientific data/results and supporting materials they can be cited (leading to credit) D2.1 [1] D3.1 [2] @@CHECK: reworded
UR48 Publisher to be able to advertise certain scientific data/results It reaches its target audience D2.1 [1] D3.1 [2]
UR49 Publisher to restrict access to subparts of a research work publication complies with license restrictions D2.1 [1] D3.1 [2]
data owners are happy