Monday, September 17, 2012

Revised metaphor paper submitted

Peter and I have just submitted a revised version of our essay, "Is Data Publication the Right Metaphor", to the Data Science Journal. We hope that it will be out in late October as part of a special issue from the inaugural conference of the World Data System. Meanwhile, we provide a preprint.

The paper is substantially revised, rewritten really, and much longer with many more references. We tried to address (minimally) all the great feedback we got from the community. I think it has much more intellectual rigor as a result. Thank you all!

Our core message remains the same. Metaphors are necessary, restricting, misleading, and enlightening. We need more of them.

The other big message is that we really value the conversation. Please continue to share your thoughts on this blog and, better yet, write challenging papers, do relevant research,  and build and operate better socio-technical systems. It's a great time to be a data scientist.

Speaking personally, it was a big challenge and a big privilege to write this essay. I hope it gets "published" :-)

Thanks again. Keep it going...

Tuesday, January 24, 2012

“Is data publication the right metaphor?”—An Update

It has been a bit over a month since we submitted our essay to the Data Science Journal and made it available for open review. We have been overwhelmed by the response. Through comments on this blog, posts on other blogs, and direct contact, we now have some 50 pages of review comments from about two-dozen individuals. The reviews range from a few casual comments to very thorough and detailed critiques. And we still need to receive the formal reviews from the DSJ editor. If we have done nothing else, we have succeeded in sparking a conversation.

We are waiting for the formal reviews from DSJ before we revise the paper, and we anticipate some major revisions. Meanwhile, though, we can offer a few clarifications, observations, and mea culpas regarding the direction of the conversation

First, we must clarify that this is an essay. It is an essay supported with evidence, but it is an opinion piece not a research paper (Indeed, it seems many information science papers could be better classified as well-evidenced essays rather than formal research results). We believe there is still much research to be done in this area, but we hope we have helped frame some of that research. We also hope that we have led some people to re-examine some of their assumptions about current practice.

Despite this being an essay, it is clear we need to be more specific and precise in our language, and we need to better define our terms. We admit to being rather cursory in our analysis. We note that the four paradigms we put forward could be more deeply considered. Also, the paradigms are not all describing exactly the same thing and are not directly parallel or mutually exclusive. Paradigm or data management approach may not be the right term. Maybe something more like a production pattern or communications space. We will tighten up our language and discussion. We also missed some important work that we will reference next time. These include papers by Lawrence, de Waard, Baker, and others in context.

We remain skeptical of the data publication metaphor, but it is clear that not everyone shares this skepticism. We may have struck a few nerves. Much seems to revolve around the definition of “publication” (e.g., big P vs. little p), and therein lies the rub. It is clear that the community has not converged on a solid definition(s) of data publication. We argue that we need to broaden our thinking before we start converging too much on any given approach. We don’t believe we can assume that the current scholarly communication process is durable or even fully relevant. We believe we need to critically pick and chose desirable aspects from many frames of thinking. We also feel that we did not adequately convey how metaphor can limit how people think. We will beef that section up a bit, but we don’t expect to end the debate. Rather we hope the debate continues, but we did see some level of consensus.

It is clear that this was an important question to ask, and that this was a good time to ask it. Further, cognitive science, social science, and philosophy were recognized as important to consider in data science practice. Many agreed on the need for more metaphors, even those very fond of the publication metaphor. Several alternative metaphors were suggested. Many are intrigued by the ecosystem metaphor, but it remains inchoate, and it is unclear how it would work to encourage desired behavior by data providers. All agree that we need data to be preserved, recognized, and more fully considered in the scholarly process. We are all working in that direction, even if we are not all on the exact same path.

Clearly, these sorts of discussion are very valuable. We continue to welcome feedback and will consider everything we receive up until the time we receive formal reviews and begin revising the paper.

Thanks again for the many, thoughtful comments,

—Mark and Peter