Software levels

Guidelines specific to different software levels

Alternative approaches to software levels

  URL to definition Description
0 https://oceanrep.geomar.de/id/eprint/59801/1/2024-01-16ResearchSoftwareCategories.pdf see also
    https://elib.dlr.de/200635/
    Druskat, Stephan (2023) Application Classes in the DLR Software Engineering Guidelines. AK-KFS: 1. Treffen Arbeitskreis Kategorien von Forschungssoftware der FG Research Software Engineering in der GI, 2023-11-20 - 2023-11-21, Braunschweig, Germany.
1   These levels are not formalised in our community, but all of them exist. On the other hand, they also overlap, for instance we have RSI that actually comprises analysis sw, for language analysis like natural language parsing services, or machine translation services.
6 https://arxiv.org/pdf/1712.06982  
7   OpenAIRE does not typically use the specific terms “Analysis Code,” “Prototype Tools,” and “Research Software Infrastructure” as formal categories in its documentation or community guidelines. However, OpenAIRE does focus on various aspects of research software, including tools, platforms, and infrastructure that support open science and data management.
9   We know the software levels but have not (yet) formally integrated them in common practices, except in our external communication about software management plans.
10 https://doi.org/10.5281/zenodo.1344612 The DLR Software Engineering Guidelines are just an example.
    The community is divers in its definition of software and the software quality maturity. Usually definitions are provided within the large research infrastructures (such as ESFRIs, international collaborations and consortia).
11 https://doi.org/10.14278/rodare.2748 We are following the DLR software classes, which have one more level, but in general follow the same approach
12   The definitions handles at the moment are the same ones as provided above, but many people in my group are still unaware of them
15   I feel like these levels exist but without any clear definition. People tend to make a difference between codes, especially if the code is developed/supported directly by a research infrastructure.
18 - https://www.embl-hamburg.de/biosaxs/manuals/eom.html  
  - https://www.mdtraj.org/1.9.8.dev0/index.html  
  - http://www.shiftx2.ca/  
  - https://biopython.org/wiki/Documentation  
  - https://www.ibs.fr/fr/communication/production-scientifique/logiciels/flexible-meccano-en?lang=fr  
  - https://github.com/KULL-Centre/CALVADOS  
19   I’m not aware of any explicit categorization, but the notion that all code is not the same (some is tested more, some is documented better, some is modular, some is spaghetti etc.) and that the purpose changes (and with that some attributes of the code might change as well), is definitely there.
27 https://rodare.hzdr.de/record/2748 See reference above.

Importance of the software level

Importance of the software level

Application of quality practices applied

Application of quality practices applied

Guidelines analysis code

  Scope type of guideline Description
1 General guidelines sw engineering principles, FAIR guidelines 1) In language technology, the researchers are all sw engineers, so the general guidelines are not published, but assumed as a matter of course
      2) specifically in Language tech. / NLP open source and FAIR approach is extremely strong historically, so publishing both sw and data is practically required as part of submissions of research papers.
2 General guidelines    
3 General guidelines    
8 General guidelines; Research-specific guidelines Checklist for repository setup Internal KM3NeT checklist for the setup of an analysis repository, based on various checklists available online like https://medium.com/semantixbr/checklist-to-set-up-a-data-science-project-repository-b3ce1ea3bbe7
9 General guidelines; Research-specific guidelines coding, reproducibility, citation, etc. https://book.the-turing-way.org
10 General guidelines https://docs.github.com/en/repositories/creating-and-managing-repositories/best-practices-for-repositories Again the community is diverse and especially on the analysis code level, just general guidelines are provided from within the experiments and their user communities. As example the GitHub best practices are linked above.
      There are training events however, e.g. the ESCAPE Summer schools, see https://escape2020.github.io/
11 General guidelines HZDR Software Policy https://doi.org/10.14278/rodare.2748
12 General guidelines; Research-specific guidelines (not sure which types to select from). General guidelines While recent, I have been sharing https://zenodo.org/records/10047401 for FAIRness.
      https://citation-file-format.github.io/ for building CFF files, together with https://guides.github.com/activities/citable-code/
      https://codemeta.github.io/codemeta-generator/ for guiding codemeta creation
      https://inbo.github.io/tutorials/tutorials/git_zenodo/ to show how to create GitHub-Zenodo bridges for releases
      https://semver.org/ to show how semantic versioning works in releases
      http://www.creativecommons.org/licenses and https://spdx.org/licenses/ to explain which license to use in code
      I also point to https://scientificpaperofthefuture.org/materials.html for general training
      -Domain specific- We also have validators like https://oops.linkeddata.es/ and https://w3id.org/foops/ for ontologies, which review best practices and provide suggestions.
14 General guidelines General guidelines - Code readability
      - Documentation
      - Ensuring re-usability
15 General guidelines good code practices, making the code runnable for myself others  
16 General guidelines    
18 Community-specific guidelines MIADE - https://pubmed.ncbi.nlm.nih.gov/37400558/
19 General guidelines; Research-specific guidelines; Community-specific guidelines   I can’t think of anything to point to directly. I guess you sort of absorb a set of guidelines/practices as you go. Be it from some intro courses, the team you’ve joined, the environment you need to interface with…
       
      I guess one research specific guideline, especially where you research is of computational nature is to have some sort of experiment management system. You generally want to run multiple experiments in parallel and be somehow able to map the results to the changes between individual runs. This can vary in complexity from reasonable conventions to elaborate tools.
       
      I don’t know if ‘use relevant standards’ (e.g. for data input/output/interchange) is a general or community specific guideline. Views on what’s relevant might vary inside a community.
20 General guidelines; Research-specific guidelines Guidelines for the development and publication of a pipeline as Python package for high throughput data analysis . In general, software analysis, involves the use of Jupyter notebooks (https://jupyter.org/)) for quick start of data analysis with Python. This is an easy way where the developer can explore the data, manipulate it in many ways and clean it if necessary. Data visualisation is also often involved, with the use of many already available packages such as matplotlib, plotly, and many others.
      Then usually the notebook is transformed into a python package, creating a pipeline for data analysis. This package can then published in BioConda (https://bioconda.github.io/)) where then other researches can access it and use it.
      If the pipeline is meant for high throughput analysis, then it can be installed in a Python environment of a remote HPC cluster, where the whole data to be analysed resides.
21 General guidelines    
23 General guidelines reproducibility, documentation, and proper citation of tools, coding standards like PEP8, FAIR principles I follow a combination of general software development guidelines, such as coding standards (like PEP 8 for Python), version control practices with Git, and thorough testing and documentation. For research-specific needs, I prioritize reproducibility by using tools like Jupyter Notebooks, managing data carefully, and employing containerization. I also adhere to community-specific guidelines, such as best practices in data science for model tracking and bioinformatics standards for reproducibility. Additionally, I incorporate principles from the Research Software Engineering community and the FAIR guidelines to ensure that my research code is sustainable, collaborative, and accessible.
24 Other   No guidelines, more learnt from others practises (informally).
25 General guidelines coding guidelines - document code
      - use sensible variable names
      - modular structure
      - etc
26 General guidelines; Research-specific guidelines    
27 Research-specific guidelines Institutional Software Policy https://rodare.hzdr.de/record/2748
  Are you aware of any guidelines or mechanisms that help to improve analysis code towards the prototype tool level? If yes, describe and/or provide links.
1 no
8 no
9 Practical guide to Software Management Plans: https://zenodo.org/records/7589725
10 No clear guidelines, but examples are given e.g. at the community event “Workshop on Open-Source Software Lifecycles” https://indico.in2p3.fr/event/21698.
11 Training via HIFIS
12 Materials like https://www.freecodecamp.org/news/how-to-create-and-upload-your-first-python-package-to-pypi/ show how to create packages and uploading them to package managers. I have been using them to help make tools in our lab into prototype tools.
15 - introducing unit tests
20 The conversion from the notebook to a package surely helps in the creation of a prototype.
24 No. We run with different compiler settings for error checking (Fortran).
27 that are many…

Guidelines for prototype software

  Scope type of guideline Description
2 General guidelines; Community-specific guidelines    
4 General guidelines    
8 General guidelines general software packages best practices Collection of guidelines from the WOSSL workshop: https://gitlab.in2p3.fr/escape2020/wp3/wossl/-/wikis/Best-Practices-for-software-development
9 General guidelines; Community-specific guidelines coding, software engineering https://guide.esciencecenter.nl
10 Research-specific guidelines https://gammapy.org/contribute.html The prototype tools are typically handled within the experiments, but some are actually open-community driven. An example for that - the community software gammapy - is given in the URL above.
11 General guidelines; Community-specific guidelines HZDR Software Policies, Project agreements https://doi.org/10.14278/rodare.2748
12 General guidelines; Research-specific guidelines (not sure which types to select from). General guidelines Same as above
15 General guidelines Packaging the software to make it easy to use  
19     I guess most of the ‘Analysis answer’ applies as well. One addition would be to document better.
  Are you aware of any guidelines or mechanisms that help to improve prototype tools towards the software research infrastructure level? If yes, describe and/or provide links.
4 No
9 Practical guide to Software Management Plans: https://zenodo.org/records/7589725
10 This is an important step that usually also includes certification of the software, however there are no clear mechanisms for the level change.
11 Training via HIFIS, mentoring from project partners

Guidelines for research software infrastructure

  Scope type of guideline Description
0 Community-specific guidelines https://rse.dlr.de/guidelines/00_dlr-se-guidelines_de.html  
1 Community-specific guidelines CLARIN B-centre requirements, LINDAT project requirements for web services CLARIN ERIC has requirements for data repositories to be certified. That includes a set of conformance tests. http://hdl.handle.net/11372/DOC-78
       
      LINDAT-CLARIAH-CZ RI has a simple set of requirements for sw that is a candidate to deployment as an official RI web service: https://github.com/ufal/lindat-common/wiki/Service-Development-Guide
2 General guidelines; Research-specific guidelines; Community-specific guidelines    
6 General guidelines; Research-specific guidelines; Community-specific guidelines; Other    
7 Community-specific guidelines Guidelines for Software Repository Managers The OpenAIRE Guidelines for Software Repository Managers 1.0 provide orientation for software repository managers to define and implement their local software management policies in exposing metadata for software products. These guidelines are intended to provide indications on how to make software products citable in order to make them first-level citizen of an Open Science, interlinked scholarly communication ecosystem.
      https://software-guidelines.readthedocs.io/en/latest/
9 General guidelines; Community-specific guidelines coding, software engineering, community management Practical guide to Software Management Plans: https://zenodo.org/records/7589725
10 Research-specific guidelines https://www.ivoa.net/documents/ Usually this aspect is handled in a software management plan - if availble. E.g. see https://www.ctao.org/for-scientists/technical-specifications/.
       
      For the IVOA (International Virtual Observatory Alliance), there are even standards - even certification is necessary for specific infrastructure software.
11 General guidelines; Community-specific guidelines HZDR Software Policies, Community agreements https://doi.org/10.14278/rodare.2748
15 General guidelines; Community-specific guidelines Guidelines provided by the research infrastructure developing and supporting the software - authorized languages for applications and software
      - style and code guidelines: https://ctapipe.readthedocs.io/en/latest/developer-guide/code-guidelines.html
      - software maintenance guidelines
17 Community-specific guidelines   The software are easy readable and editable and using opensource software paid attention at documetation
20 General guidelines Documentation for users and developers, reusability, use of standard terminology, interoperability, efficiency, long-term maintenance, mechanisms for feedbacks and bug reporting Documentation for users and developers should be comprehensive and accessible, including user manuals, API references, and example use cases to ensure all users can easily understand and work with the software.
      Adopting standard terminology enhances clarity and reduces the learning curve for users and developers alike. Interoperability is achieved by supporting common data formats, protocols, and APIs, enabling seamless integration with other tools and systems. Prioritising efficiency means optimising code to minimize resource usage and ensuring the software can handle large datasets or complex computations without performance degradation. Long-term maintenance requires planning for regular updates, securing funding, and providing detailed developer documentation to ease future enhancements and bug fixes. Finally, implementing mechanisms for feedback and bug reporting, such as issue trackers in GitHub, allows users to contribute to the software’s continuous improvement, ensuring it remains responsive to their evolving needs.
21 General guidelines    
22 Community-specific guidelines Community specific guidelines Community specific guidelines for development and contribution are prescribed and maintained for the ACTS project by the respective core development team. These are listed in the documentation for the project with hyperlinks that help people understand the guidelines and follow them where necessary.
      https://acts.readthedocs.io/en/latest/codeguide.html
27 General guidelines; Research-specific guidelines Software Policy, Programming Guidelines, Licensing Guidelines See above

Level of confidence for answers in this section

Level of confidence for answers in this section

Guidelines for different livecycle stages

  Description URL Stage Scope
1 Maintenance: Specifically for sw deployed as REST web services of the RI LINDAT-CLARIAH-CZ we have a short guideline. For other stages we do not have real guidelines. https://github.com/ufal/lindat-common/wiki/Service-Development-Guide Maintenance Community-specific guidelines
3 The code has to be well organised, simple and well commented. Detailed documentation is mandatory in order to help users but also this would also lead to better maintenance.   Development General guidelines
7 Software entities should contain a PID in order to be traceable. https://docs.google.com/spreadsheets/d/1mKs-Pg_JuLcpqEkQqlSCs2gGC7nEEbhxdTbIoGcU6NI/edit?gid=0#gid=0 Planning; Maintenance; Archiving General guidelines
10 As deliverable of the EOSC working group on “Infrastructure for Quality Research Software”, this document sets the scene of a software lifecycle and lists many different articles for further reading https://doi.org/10.5281/zenodo.8324828 Planning; Development; Maintenance; Archiving General guidelines
11 HZDR Software Policy https://doi.org/10.14278/rodare.2748 Development; Maintenance General guidelines
12 Planning/development/maintenance. In GitHub there are best practices for using projects to open issues, plan for milestones and https://docs.github.com/en/issues/planning-and-tracking-with-projects/learning-about-projects/best-practices-for-projects Planning; Development; Maintenance; Archiving General guidelines
14 Adoption of software engineering best practices in research software development. Key elements include writing automated tests (unit, integration, and regression tests) to ensure code correctness, using version control systems, and following code review processes to improve code quality. These practices help ensure that research software is reliable, maintainable, and reproducible.   Planning; Development; Maintenance General guidelines
15 publication of software in the OSSR https://escape-ossr.gitlab.io/ossr-pages/page/contribute/onboarding/ Maintenance; Archiving Research-specific guidelines; Community-specific guidelines
16 Planning: : Emphasizes iterative development, collaboration, and flexibility. Guidelines suggest how to break down projects into sprints, create user stories, and prioritize tasks.   Planning General guidelines
19 Similar to data, use a repository that assigns PIDs to its submissions to capture significant versions of the software (e.g. version producing results described in a research paper).   Archiving Research-specific guidelines
20 As general guideline for software development and deployment is to keep the software packages and dependencies updated as much as possible, as working with outdated software can pose risks on the security of the software, especially if developing a front-end or back-end web infrastructure. Moreover, in general updating the dependencies and the overall environment can improve the execution time, and increase the efficiency of the software.   Development; Maintenance General guidelines
  For deployment of a web application, all security measures needs to be employed, from encryption of the end-to-end communications using SSL, to more advance ones such as protection from Cross-Site Request Forgery (CSRF)      
22 Planning is facilitated via regularly held workshops where all developers and contributors are encouraged to attend, brainstorm and come to a consensus on how to proceed with new ideas. There is also an annual workshop that summarizes the status quo of the entire project to give everyone a clear picture of the project in its entirety as well as discuss upcoming plans for each sub group.   Planning Community-specific guidelines
23 I follow containerization guidelines to ensure consistency, portability, and reproducibility across different computing environments. I use Docker to create lightweight containers that encapsulate all dependencies, tools, and configurations required for my experiments.   Development; Maintenance General guidelines
27 The SW Policy is providing information and support for all phases of the lifecycle and links to more detailed guidelines. https://rodare.hzdr.de/record/2748 Planning; Development; Maintenance; Archiving Research-specific guidelines
  Decription URL Stage Scope
3 It is essential to have often update versions of the code in order to keep up with the latest package versions and software needs.   Maintenance General guidelines
10 This document constitutes deliverable D3.7 of the ESCAPE project (H2020 Grant Agreement No. 824064), the license, provenance and metadata guidelines for the Open-source software and service repository (OSSR). https://zenodo.org/doi/10.5281/zenodo.7540575 Archiving Community-specific guidelines
12 Establishing a GitHub-Zenodo bridge for archiving code releases https://docs.github.com/en/repositories/archiving-a-github-repository/referencing-and-citing-content Archiving General guidelines; Research-specific guidelines
14 Development of a Minimum Viable Product (MVP) to gather feedback early in the project lifecycle. By quickly developing a functional prototype with core features, researchers can solicit user input, iterate based on feedback, and adjust the development path to better meet the needs of the research community or stakeholders. This approach is used by newer businesses and reduces wasted effort and ensures the tool evolves in a direction that is most useful to its users. It can be adapted by researchers where customer feedback is most important   Development General guidelines
16 Implementation   Development General guidelines
22 For the development and contribution stage there are prescribed set of software guidelines listed by the core development team that need to be adhered to in order for a new pull request for any contributor to be merged into the main project. https://acts.readthedocs.io/en/latest/codeguide.html Development General guidelines
  Decription URL Stage Scope
12 Software Heritage archiving URL for code repostories https://archive.softwareheritage.org/save/ Archiving General guidelines
16 Deployement: Best practices for automating the deployment process, including staging, production environments, and rollback strategies.   Development  
22 Project maintenance is done mainly by the core development team. This includes deprecating old code, resolving conflicts created by newly merged code, maintaining project builds, dealing with package dependency conflicts, file formatting related conevntions etc. https://acts.readthedocs.io/en/latest/versioning.html, https://acts.readthedocs.io/en/latest/formats/formats.html Maintenance General guidelines

Auto-created summary

Extracted Practices

The following is a bullet-point list of practices mentioned in the text, ordered by relevance:

  • General Software Development Guidelines
    • Includes software engineering principles and FAIR guidelines, coding standards, documentation, version control, and reproducibility practices.
  • Community-Specific Guidelines
    • Guidelines provided for specific communities, such as CLARIN requirements, LINDAT project requirements, and OpenAIRE guidelines.
  • Research-Specific Guidelines
    • Guidelines catered to research software, focusing on data management and code citation practices.
  • Repository and Release Management
    • Practices include maintaining repositories, setting up GitHub-Zenodo bridges, and version control using semantic versioning.
  • Quality Assurance and Testing
    • Introducing unit tests, code reviews, and continuous integration for quality.
  • Documentation and Citation
    • Ensuring thorough documentation, proper citation formats, and using guidelines to create citation files.
  • Software Packaging and Deployment
    • Guidelines for creating Python packages and deployment practices, especially in HPC environments.
  • Lifecycle Management and Software Archiving
    • Guidelines cover the entire software lifecycle, emphasizing maintenance and archiving with emphasis on PID assignment.
  • Experiment and Project Management
    • Use of project management tools for planning and tracking, such as GitHub projects and Agile methodologies.
  • Security and Compliance
    • Recognizes the necessity for web application security and adherence to guidelines for authorized languages and code style.
  • Mentoring and Training
    • Providing training via platforms like HIFIS and workshops to enhance software development skills in the community.

URLs Table

Below is a table of all linked URLs with descriptions of each link:

URL Description
https://oceanrep.geomar.de/id/eprint/59801/1/2024-01-16ResearchSoftwareCategories.pdf Definition and categorization of research software
https://elib.dlr.de/200635/ DLR Software Engineering Guidelines
https://arxiv.org/pdf/1712.06982 Research paper related to software levels
https://doi.org/10.5281/zenodo.1344612 Example description of DLR Software Engineering Guidelines
https://doi.org/10.14278/rodare.2748 HZDR Software Policy guidelines
https://www.embl-hamburg.de/biosaxs/manuals/eom.html Documentation for a specific scientific computing framework
https://www.mdtraj.org/1.9.8.dev0/index.html Documentation for MDTraj, a molecular dynamics trajectory analysis library
http://www.shiftx2.ca/ Resource related to SHIFTX2 software
https://biopython.org/wiki/Documentation Wiki documentation for the Biopython project
https://www.ibs.fr/fr/communication/production-scientifique/logiciels/flexible-meccano-en?lang=fr Scientific software description
https://github.com/KULL-Centre/CALVADOS GitHub repository for the CALVADOS software
https://rodare.hzdr.de/record/2748 Reference to institutional software policy guidelines at HZDR
https://book.the-turing-way.org Best practices for reproducible research
https://medium.com/semantixbr/checklist-to-set-up-a-data-science-project-repository-b3ce1ea3bbe7 Checklist for setting up a data science project repository
https://docs.github.com/en/repositories/creating-and-managing-repositories/best-practices-for-repositories GitHub best practices for repositories
https://zenodo.org/records/10047401 Description for maintaining project FAIRness
https://citation-file-format.github.io/ Guidelines for creating Citation File Format documents
https://guides.github.com/activities/citable-code/ Guide on how to create citable code using GitHub
https://codemeta.github.io/codemeta-generator/ Codemeta generator for creating metadata for software projects
https://inbo.github.io/tutorials/tutorials/git_zenodo/ Tutorial for creating GitHub-Zenodo integration for code releases
https://semver.org/ Guidelines on semantic versioning for software releases
http://www.creativecommons.org/licenses Creative Commons licenses for software
https://spdx.org/licenses/ SPDX identifiers for licenses in code
https://scientificpaperofthefuture.org/materials.html Training materials for scientific research
https://oops.linkeddata.es/ Ontology Pitfall Scanner for Linked Data
https://w3id.org/foops/ Fuzzy OOPS! for ontology validation
https://pubmed.ncbi.nlm.nih.gov/37400558/ Scientific paper related to MIADE infrastructure
https://github.com/ Entry point for creating and managing GitHub repositories
https://zenodo.org/records/7589725 Practical guide for Software Management Plans
https://indico.in2p3.fr/event/21698/ Event workshop link for the lifecycle of open-source software
https://www.freecodecamp.org/news/how-to-create-and-upload-your-first-python-package-to-pypi/ Guide on creating and uploading a Python package to PyPI
https://escape2020.github.io/ Page for the ESCAPE project summer schools related to software development
https://gitlab.in2p3.fr/escape2020/wp3/wossl/-/wikis/Best-Practices-for-software-development Source of best practices for software development in the WOSSL workshop
https://guide.esciencecenter.nl eScience center’s guidelines for coding and software development
https://gammapy.org/contribute.html Contribution guide for the Gammapy community software project
https://archive.softwareheritage.org/save/ Instructions on archiving software in Software Heritage
https://jupyter.org/ Jupyter project homepage for developing notebooks
https://bioconda.github.io/ Bioconda channel for Conda package manager ecosystems covered by life science containers
https://acts.readthedocs.io/en/latest/codeguide.html ACTS project code guidelines and contribution processes documentation
https://doi.org/10.5281/zenodo.8324828 Document setting the stage for a software lifecycle as a deliverable for EOSC working group
https://www.ivoa.net/documents/ International Virtual Observatory Alliance’s documents page covering specific standards
https://www.ctao.org/for-scientists/technical-specifications/ Technical specifications for the Cherenkov Telescope Array Observatory
https://software-guidelines.readthedocs.io/en/latest/ OpenAIRE’s guidelines for Software Repository Managers
https://ctapipe.readthedocs.io/en/latest/developer-guide/code-guidelines.html Code guidelines for the CTAPIPE project
https://escape-ossr.gitlab.io/ossr-pages/page/contribute/onboarding/ Onboarding page for contributing to OSSR
https://docs.github.com/en/issues/planning-and-tracking-with-projects/learning-about-projects/best-practices-for-projects GitHub’s best practices for managing projects
https://doi.org/10.5281/zenodo.7540575 ESCAPE project deliverable on license, provenance, and metadata guidelines
https://github.com/ufal/lindat-common/wiki/Service-Development-Guide Service development guide by LINDAT-CLARIAH-CZ project
http://hdl.handle.net/11372/DOC-78 CLARIN B-center certification requirements document link
https://acts.readthedocs.io/en/latest/versioning.html Guidelines for versioning in the ACTS project
https://acts.readthedocs.io/en/latest/formats/formats.html Formats used in the ACTS project
https://rse.dlr.de/guidelines/00_dlr-se-guidelines_de.html DLR guidelines for software engineering
https://zenodo.org/doi/10.5281/zenodo.7540575 ESCAPE project deliverable on guidelines for research software infrastructure