5 comments on the Statement of Principles on Data Management

The Canadian Institutes of Health Research (CIHR), the Natural Sciences and Engineering Research Council of Canada (NSERC), and the Social Sciences and Humanities Research Council of Canada (SSHRC) (“the Agencies”)  recently released a Draft Tri-Agency Statement of Principles on Digital Data Management that is worth reading.

It is good to see movement on this in Canada, and the initiatives at UofT and other places to respond to it. The Canadian Association of Research Libraries (CARL) for example has posted an interesting response here.

I fully support the statement. Below I am attaching five comments meant to strengthen the desired impact of the principles, which I also submitted directly. It would be interesting to hear others’ views on the statements and whether there are major concerns with the current draft principles!

  1. Data are significant and legitimate products of research and must be recognized as such.” – Indeed. The importance of data as key element in research is recognized in the statement, but at a place that many readers may not even reach: It should have a much more prominent place higher up in the document, since it provides the rationale for the principles and invites the reader to acknowledge the importance of what follows.
  2. Defining research data as “recorded material that validates research findings and results, and enables reuse or replication”, even if in accordance with RDC, presents a dangerously narrow concept that will result in valuable research data being missed: Many stakeholders, in particular outside the sciences, will assume that their resources do not constitute research data since they may not ‘validate’ findings or not yet truly ‘enable replication’. This seems misleading and risky. I recognize that adapting the definition may be perceived as a big step, but consider the lost opportunities. What if it isn’t clear if a data set by itself in fact validates findings? What if replication is known to fail due to computational difficulties or is simply not a concern at this point – does the resource in question cease to be research data? Of course not.
    Compare this, for example, to the inclusive definition of Borgman, “entities used as evidence for phenomena for the purposes of research or scholarship”, which does not require  ‘validation’ nor proof of replication to recognize data as such, and is broadly applicable across sciences and humanities.  (Christine Borgman (2015): Big Data, Little Data, No Data. MIT Press)
    An inclusive definition like this makes the statement more inclusive and relevant; ensures that a broader set of stakeholders can recognize the rising importance of data management and sharing; and facilitates the evolution of research communities’ understanding of what can constitute research data to enable them to seize emerging opportunities. I urge you to adopt a more inclusive definition in this foundational statement of principles.
  3. Current data management plans, generated from templates, often are mere shelfware designed to fulfill regulatory requirements for funding, not provide value and meet research needs. This section of the statement has the opportunity to emphasize a proactive perspective. However, the statement “Data management plans are key elements of the data management planning process.” is circular and says little.
    Plans by their nature are prescriptive, not descriptive: They prescribe what shall be done. Ensuring that it is done is one of the key challenges that current practice in DMPs often fails to address. The interest group on Active Data Management Plans of the Research Data Alliance is currently debating the urgent need to make data management plans more active, actionable, living entities that evolve and are well-supported in their evolution. (https://rd-alliance.org/groups/active-data-management-plans.html)
  4. Not all data should be preserved “well beyond the duration of the research project“ as suggested by the statement on Collection and Storage. Some should be discarded almost instantly depending on cost, benefit and risk. In the current phrasing, this stands in contradiction to later statements on cost-efficiency that  acknowledge the need to assess and select what to preserve.
  5. Finally, responsibilities of research funders include the very funding of data management activities. It is reassuring to see that the statement clarifies that this also needs to be part of peer review guidelines, but the statement of principles would do well to explicitly acknowledge that the costs accruing with data management, curation and sharing are valuable and eligible, even expected elements of research initiatives and grant proposals.

Christoph Becker