Last summer I started thinking about translating historical texts into ‘data’ and wrote a short blog post about it when I worked on the Cultural Heritage Informatics project. At that point, I pondered about the limitations of my recently minted Master’s thesis, where I analyzed tourism advertisements and travel stories to understand how individuals ‘mapped’ places with cultural, colonial, and personal significances.As a heavily theoretical cultural history project, I enjoyed the textual, literary, and tentativeness of such an endeavor. However, I could not answer many structural questions about the nature of Vietnamese tourism and genre of travel stories more broadly because of the qualitative nature of my project.
I wonder if others have had this experience in their own historical research? I myself am drawn to digital tools and more quantitative ways of thinking as a way of offering a broader perspective to my often very textual-based questions. With a deep yearning for more ‘concrete,’ quantitative data, I hope to create a digital history project within this line of thinking: translating historical text to data.
Much of my research (and many current researchers on modern Vietnam) use 20th century Vietnamese newspapers as one of the main primary source for historical analysis. However, there is still a gaping hole in the understanding of the print industry in Vietnam. Outside of the handful of famous editors, radical newspapers, and awareness that Vietnamese intellectual life was interwoven into the print industry, the newspaper business remains a disorderly mystery. I seek to create a project that provides more of a structural understanding of Vietnamese print industry based on the visualization of data (such as beginning and end dates, contributors, location, type, themes, language) on publishing houses roughly between the 1890s-1940s.
Where is my data?
This potential project is quite labor intensive, and requires a good amount of wrangling various sources for data. At the moment to the extent of my knowledge, there is no open source anthology or compilation of Vietnamese publishing houses during the colonial period. There were some attempts in the past of compiling this information by Vietnamese historians, (whom have been difficult to reach) but I have scanned and OCR’ed this information. I have been reaching out to other historians who work on print culture more directly and hope to also draw inspiration from another print culture digital project on Indonesian elite networks. I would still have to do a good amount of inputting by hand, names of streets and editors, dates, and other pertinent information.
How should this look?
Furthermore, I am debating about how I would like to display this information. At first I considered a spatial representation of the print houses, but was not sure if I could find a good basemap and geo-locate everything. I’m still familiarizing myself with the many geospatial tools. I also considered creating some type of network map and/or visualization of the data on key figures of print houses and (their associations, relations to various newspapers, number of issues, location, etc).
Looks like at the moment I have several ideas that are still quite broad and some more feasible than others: (combination of basic tools to help with understanding my data such as papermachines, tableau, cartodb)
1. Collect, compile, geocode publishing houses. (limited number of sources, challenge of historic locations, temporal factor) [still debating on tools]
2. Collect, compile, publishing houses with focus on key figures (editors & publishers) for network analysis/temporal dynamic [still debating on tools]
3. Creation of central, standardized hub to input information on publishing houses (will still input my own data as a starting part, find collaborators) [Drupal, basic shared ?]Back to square one: Data?
I’ve been reflecting on the possibility of collaborating and/or crowdsourcing the intellectual workload of gathering information on print houses. I know that many scholars of Vietnam have already compiled their own little lists of metadata on print houses, but tailored specifically to their historical topic and recorded in a specialized manner. “Sharing data” seems to be a touchy subject for many scholars, especially those who are up and coming or not well-published. Nevertheless, print culture stretches across so many fields of historical inquiry and a fuller understanding of print houses can help scholars immensely. I truly believe that a robust collection of this information can open the doors for ‘big data’ type of analyses.
While 1 and 2 are narrower since I would be compiling the data myself, it might be good to build this type of project as an example for others to see the incredible usefulness of collecting extensive metadata about publishing houses and the strengths of digital tools to analyze this information. 3 seems simple enough to build, but difficult in the side of finding collaborators and contributors.
I would really appreciate any and all feedback from others—-I hope I made some sense here as I am still working out lots of ideas in my head! Thank you!