Last year, my dear friend and colleague Eva Jakobs spent a few weeks in Austin. Among other things, we decided to collaborate on a small study of 24 Quicklook reports: technology assessment and commercialization reports used to determine whether a given innovation might have a chance of being valuable to specific stakeholders in a specific market. My colleagues at IC2 have an archive of such reports, and since Eva and her group are interested in revisions, versioning, and text mining, we agreed to provide an archive of drafts to her group.
The results were interesting. These reports are written by contracted business analysts who interview stakeholders and write a heavily templated report over approximately 40 hours. Then each report is sent to the program director, who typically comments on the draft and sends it back for revision. Usually the revision cycle involves just a few rounds, but some involve far more. (See the paper for details.) Eva's team was able to characterize these revisions.
But we also were able to (a) use the textual analysis to identify the sections where the comments most frequently occurred and (b) code the comments with an emergent coding scheme to identify their purposes (co-creation, argumentation, the writing process, and text quality). Based on this work, we were able to characterize the kinds of comments and identify how the parties synchronized expectations.
Eva was lead author; my contributions typically were supplemental.
For me, one of the most useful parts of this exercise was in scaling. My research has typically been in the qualitative case study mode: examining observations, interviews, and artifacts. And that mode does not scale well because the sheer amount of work is difficult to sustain. Text mining, on the other hand, scales quite well as long as one can ensure that the pattern matching means what you think it does. (Structure is not necessarily an indicator of meaning.) For this project, we were able to pair the two approaches, yielding an analysis that scaled up to a larger dataset while maintaining an interpretive analysis.