It has been a bit over a month since we submitted our essay to the Data Science Journal and made it
available for open review. We have been overwhelmed by the response. Through
comments on this blog, posts on other blogs, and direct contact, we now have
some 50 pages of review comments from about two-dozen individuals. The reviews
range from a few casual comments to very thorough and detailed critiques. And
we still need to receive the formal reviews from the DSJ editor. If we have
done nothing else, we have succeeded in sparking a conversation.
We are waiting for the formal reviews from DSJ before we
revise the paper, and we anticipate some major revisions. Meanwhile, though, we
can offer a few clarifications, observations, and mea culpas regarding the direction of the conversation
First, we must clarify that this is an essay. It is an essay supported with evidence, but it is an opinion
piece not a research paper (Indeed, it seems many information science papers
could be better classified as well-evidenced essays rather than formal research
results). We believe there is still much research to be done in this area, but
we hope we have helped frame some of that research. We also hope that we have
led some people to re-examine some of their assumptions about current practice.
Despite this being an essay, it is clear we need to be more specific
and precise in our language, and we need to better define our terms. We admit
to being rather cursory in our analysis. We note that the four paradigms we put
forward could be more deeply considered. Also, the paradigms are not all
describing exactly the same thing and are not directly parallel or mutually exclusive.
Paradigm or data management approach may not be the right term. Maybe something
more like a production pattern or communications space. We will tighten up our
language and discussion. We also missed some important work that we will
reference next time. These include papers by Lawrence, de Waard, Baker, and
others in context.
We remain skeptical of the data publication metaphor, but it
is clear that not everyone shares this skepticism. We may have struck a few
nerves. Much seems to revolve around the definition of “publication” (e.g., big
P vs. little p), and therein lies the rub. It is clear that the community has
not converged on a solid definition(s) of data publication. We argue that we
need to broaden our thinking before we start converging too much on any given
approach. We don’t believe we can assume that the current scholarly communication
process is durable or even fully relevant. We believe we need to critically
pick and chose desirable aspects from many frames of thinking. We also feel
that we did not adequately convey how metaphor can limit how people think. We
will beef that section up a bit, but we don’t expect to end the debate. Rather we
hope the debate continues, but we did see some level of consensus.
It is clear that this was an important question to ask, and
that this was a good time to ask it. Further, cognitive science, social
science, and philosophy were recognized as important to consider in data
science practice. Many agreed on the need for more metaphors, even those very
fond of the publication metaphor. Several alternative metaphors were suggested.
Many are intrigued by the ecosystem metaphor, but it remains inchoate, and it
is unclear how it would work to encourage desired behavior by data providers. All agree that we need data to be
preserved, recognized, and more fully considered in the scholarly process. We
are all working in that direction, even if we are not all on the exact same
path.
Clearly, these sorts of discussion are very valuable. We
continue to welcome feedback and will consider everything we receive up until
the time we receive formal reviews and begin revising the paper.
Thanks again for the many, thoughtful comments,
—Mark and Peter
No comments:
Post a Comment