Post-project work reflections

There are few reasons why I have chosen data visualizations as a theme to explore during my project work for the #UMEDH course. Firstly, I assumed that it can be a very rewarding tool, particularly for the improvement of the visual aspects of the study. Secondly, I took it for granted that the historical data I am using will automatically convert into colorful graphs. This fitted well with the lack of programming experience, which seemed to characterize the other tools and methods. Finally, it had always struck me how comprehensible the figures can be comparing to dozens of layers of analytical text. Having said that, the #UMEDH course helped me to verify some of these thoughts, if not to understand how wrong I was. The processes of selecting the tool, converting the data, and working out the results test scholar’s patience. Still, however, I would argue that there is much more fun involved in trying these tools, although the results are not always what we were hoping for.

Lukasz

Intellectual property management

Dear all. I’m sorry that I don’t have the possibility to come to Umeå this time around, but I’ll see you all on the Skype-link.

I hope I’ll be able to present the work I’ve done so far with my course project. For my PhD I don’t have data that are easily quantifiable, and I thus can’t make these fancy-pancy and beautiful visualizations many of you have come up with. I can’t either make these smart text mining things, since most of it is music. In addition: my material is under quite heavy copyright restrictions, so I can’t make an online exhibit of it either without getting into trouble. I decided to instead make a project on intellectual property and rights management in the digital age. So far I’ve been going through a lot of laws and regulations and I’ve contacted several management societies about interpretation of the laws. Mostly I’ve been concerned with Norwegian regulations.

Digital historians often write what can be called compound texts, that is texts combining different modes of communication:  words, images (pictures and figures), sound and film. For all of these modes there exist different regulations, and when working with 20th century material, copyright thus often becomes an issue. The following table sums up some of the regulations.

Type of work Protection time in Norway Management in Norway
Text 70 years after death of author Regulated by publisher or Kopinor
Images – “photographic works” 70 years after death of photographer Regulated by photographer or photo agency
Images – “photographic pictures” 15 years after death of photographer, or at least 50 years after the work was published. 25 years after the picture was taken if this happened before the law change in 1995.
Music composition, lyrics 70 years after death of composer/lyricist Regulated by TONO
Sound recording, performance rights 50 years after recording issued for the first time Regulated by GRAMO
Mechanical rights to sound recording 50 years after recording issued for the first time Regulated by NCB
Film 70 years after death of longest living member of the central production team (director, producer, writers etc.) Regulated by Nowaco

I will present this more in detail for the meeting next week, give examples and discuss some problem cases. Are any of you working with recent material in your UMEDH-projects where copyright is an issue?

 

Data visualizations – to be continued…

Hi!

In this post I thought I would share with you some of the experiences I have had while working with the visualizations. The first thing I have learned is that a successful visualization requires simple data structures. This catchphrase I have got to know only recently after dozen of attempts to create or transform numbers into individual cells, rows, columns but also NumberLists, OrderedNumbers and other constellations. This step is necessary to make numbers recognizable by the system but the problem is that there is no easy way to understand how they should be transformed. In addition, Data converters designed to help do not respond or point to some unidentified errors. The beginnings were very promising. I have chosen Quadrigram tool which promised not only to create custom data visualizations in “an intuitive way” but also to do it “painlessly”. It is not doubt that this system was chosen also because of its superlatively polished visualizations. So, as a diligent student, I went through the ‘Getting started’ course, analyzed the templates and started to impart local file with data. This went all fine but as soon as I reached the point of conversion, my data seemed to be too complicated. The transformation into different structure did not work either. I understand that it is all about individual motivation, competences and experience but…common, I did try. I am sure it would work easier for someone with more advance understanding of programming. In the end, I gave up all hope of ever having super fancy, shining, blinking visualizations and I turned towards a simple but well-tried solution. Thanks to Finn Arne IMB Many Eyes was the new choice and it hit me almost like a ton of bricks how easy is to upload my data and choose a visualization there. This saved me a great deal of time. Here are some of the results which I would like to share with you:

– The share of nationalities selected by the Swedish delegation from the UNHCR camps between 1968-1972

– The occupational composition of migrants selected by the Swedish delegation from the UNHCR camps between 1968 and 1972

– Directions of departures from refugee camps for the period from October 1969 to May 1970

Lukasz

Digital exhibition from an authentic site

I work on a daily basis in a house that was used as a Gestapo Headquarters during WW2. The authenticity of the place is therefore quite central in our permanent exhibition. I therefore find it very interesting to see how other historical sites use their authenticity when they publish digital exhibitions. Normally digital exhibitions are most about showing artifacts, documents and photos. An authentic site has another dimension to display.

On this background I will recommend to take a look at “The Secret Annex online” published by Anne Frank House in Amsterdam: http://www.annefrank.org/en/Subsites/Home/ This is an interesting way to try to implement the authenticity to a digital exhibition. Combined with the “digital walk” in the Anne Frank House, we also have the possibility to see historical documents and listen to a voice that teaches the historical context.

I am not quite sure how successful I find this digital exhibition. But it is an interesting way to exploit the authenticity of a building in a digital way.

The Other Side of the Coin, Or: When It Just Won’t Work.

Hei Folks!

I’ve been busy the last few weeks with finishing one dh-project (the digital edition of an early modern German guidebook to complimenting) and starting with the digital history Ph.D. course-dissertation-project-joint-venture. My initial idea was to use computer assisted text analysis (incl. topic modelling) for a corpus of scholarly critical reviews and review articles of scholarly editions of literary works by well known German writers from the past. The editions are from the last two to three decades. I wanted to see whether or not a computer assisted, ‘automatic’ distant reading of ca. 150 reviews could give me some ideas or starting points for further in-depth analysis. Basically, I wanted to exploit text mining and topic modelling while still in the ‘context of discovery’. My overall focus in The Thesis is on the normative framework of modern German textual scholarship (“Editionsphilologie“) and I was curious what might be ‘hidden’ in the corpus of critical reviews that could be used in the scope of my survey.

To make a long story short: It did not (and it still does not) work the way I want. At all. I’ll give you a brief description of what went wrong and point out some solutions (in general, not for me because of time…) and thus hopefully provoke a discussion on the issue!

1. The Corpus

145 German scholarly critical reviews of scholarly editions of the works and writings of German-speaking authors. Publication dates of the reviews are between 1990–2012, the scholarly editions have been published more or less in the same period. The literary writers include, among others: Franz Kafka, Georg Büchner, Georg Trakl, Paul Celan, Conrad Ferdinand Meyer, Georg Heym, Achim von Arnim, Heinrich von Kleist. The authors of the critical reviews are often scholarly editors themselves or are quite familiar with textual scholarship and editorial theory. Others are ‘experts’ on the mentioned writers, or literary epoques or genres. The length of the reviews range between ‘short’ reviews (1-2 pages in print, ca. 500–1000 words) and ‘long’ review articles (up to 23 pages in print, ca. 11.500 words), while the vast majority is ca.  2500–3000 words (5–6 pages in print) in length. They all include footnotes (German scholarly convention) and also list any kind of references in footnotes (no in-line reference in parantheses).

2. The Data

All of the reviews appeared in printed journals first (editio. Internationales Jahrbuch für Editionswissenschaft (1987–), Arbitrium. Zeitschrift für Rezensionen zur Germanistischen Literaturwissenschaft (1983–), both De Gruyter Publishing Company; Editionen in der Kritik. Editionswissenschaftliches Rezensionsorgan (2005–), Weidler Buchverlag; Text. Kritische Beiträge (1995–), Stroemfeld Verlag; as well as some Author-Yearbooks). 2/3 of the reviews in my corpus have since been digitised and OCRed by the publisher (mainly: De Gruyther) and are available as Pdf-files. The others had been manually digitised (i.e. scanned or first photo copied and than scanned) by me and than OCRed with either Adobe Acrobat Professional® or Abbyy Fine Reader Express® (for Mac) and are also Pdfs.
The publisher generated pdfs are partly retrodigitised (i.e. they did the same as I did) or have been generated from the .doc or .docx files the authors of the reviews submitted for publication. (Note: each pdf-file by De Gruyter costs 39,95€ if your unfortunate and your university library doesn’t provide full access!)
I assumed that the pdfs produced by the publishers were proof-read after the OCR had been done. I also assumed that the quality of the pdfs I produced was very good. I was wrong both times. These were the issues:

i) All retrodigitised pdfs from the publisher were a) erroneous (letters not recognised correctly, additional spacing within words, not distinction between text and text of the footnotes, letters and sometimes words not recognised at all, i.e. blank space), or b) incomplete (text had been cut at the beginning or the end; text had not been scanned or been scanned so badly that it was unrecognisable).

ii) All pdfs I OCRed with Abbyy Fine Reader were total crap: not ONE word was recognised correctly.

iii) All pdfs I OCRed with Adobe Acrobat Pro were more or less ok but still with too many errors.

iv) in ALL pdfs (mine, the publishers; new and old ones) hyphened words were not recognised as one word but as two distinct ‘words’ and there was no distinction between text proper and text of the footnotes.

Bevor I learned of the problems with the pdfs, I tried out the papermachines add-on for Zotero. Nothing usefull came out of it, not even real words! So, my conclusion and ‘solution’ for the moment would be: be careful what kind of digital text(s) you use, especially when it’s not ‘plain text’ but pre-formatted text (pdfs etc.) because it most likely will ruin your text data. If you decide to do the work yourself with scanning and optical character recognition keep in mind that the results depend heavily on the quality of
a) the print (colour, paper, font),
b) the scan (low-res, high-res, grayscale, black/white, colour, TIFF, JPEG, PDF etc.),
c) the overall formatting of the text (are there footnotes? or marginalia? or pictures? or strange fonts?),
d) the language or languages of the text (English works ok, by try Polish or Russian, or even Danish!),
e) the performance of your software (the ‘professional’ Abbyy Fine Reader is crap; my old Adobe Pro works better but is still not good enough) and how well it works with your hardware and platform of choice.
And last but not least: Considering all of this, is the work and time and nerves you have to put into ‘cleaning’ your data before you can do anything cool with it worth the (potential) outcome of the whole endeavour?

For now I won’t continue the work on this specific set of texts for the reasons I mentioned above. Nevertheless, I am looking forward to your comments and suggestions!

P.S. I’m going to tell you about what it’s like to have your website hacked and infested with phishing content and legal stuff that comes with that and above all: not having a well groomed online presence for almost 6 weeks in another blog post. Soon.

Digital visualization of study results

The graphical representation of study results is not the first thing that comes to mind when one thinks of historical research. It is however an interesting alternative for historians dealing with the mixture of qualitative and quantitative data.

My thoughts about the visualizations of study results evolved after the visit in the Labour Market Board archives.  I knew that in order to stress the importance of diverse material found in the archive the figures and texts have to be combined and presented in visual diagrams, tables or other digitally-generated graphical structures. An inspiration to employ visualization methods came from Martyn Jessop’s article Digital visualization as a scholarly activity and Alan Liu’s When Was Linearity?: The Meaning of Graphics in the Digital Age, both listed in the PhD course on Digital History. Both authors stress that the graphic presentation of numeric data and volumes of texts has a lot in common and presented several examples of how the visualization in the humanities can be achieved. TAPoR text-analysis portal and IBM Many Eyes data-visualization tools are two examples that have been presented to visualize the results of the analysis and the texts themselves.

Inspired by these examples, I will try to transform my archive material into a dataset and then present the results through digitally-generated designs. As easy as it may sound for some of you, I am going to enter a completely unknown field.  Thus, I have prepared a short overview of data with and a number of problems which I hope can be answered with some help of the graphical representation.  Any advice, or reference to previous examples, will be highly appreciated.

– Reports after five intakes of migrants.

Problem: The diversification of social composition during various intakes (each intake had different amount of intellectuals, blue-collar workers etc., list of professions and amount of migrants

OLYMPUS DIGITAL CAMERA

 

– Reports after the visits in eleven migration centers

Problem: Work placement problems (in each report the officials reflected on the problems with the work placement, this can be related to the social composition in this place during the time of the visit, quantitative data regarding the social composition, description regarding the problems with work placement)

OLYMPUS DIGITAL CAMERA

– Weekly registers of the departures from the camps

Problem: Changes in the destination through time (the amount of migrants that were placed to work or were sent to other destinations, quantitative data)

OLYMPUS DIGITAL CAMERA

– Accommodation after certain time interval (the dynamics of migrants after the departure from the reception centers, address and the date of the arrival in the new place)

Lukasz

Some thoughts on the use of an up-to-date net presence

Hi all! Finn Arne suggested I should share this experience with you, so here it goes…

One thing we talked about during our first meeting in Umeå was the importance of creating and maintaining a professional net presence, or net persona. Some of us were already there, more or less actively, others were pretty new to the digital communication and self-branding arena.

As for me, I set up an account on Academia.edu a couple of years ago and decided, when starting working on my dissertation a year ago, to keep it updated with papers, talks, articles etc as they come along during my project. If nothing else, it is a good way for me to keep my doings sorted and to keep track of what’s up in my fields of interest. Recently, though, it turned out that the use of this net display can be even greater. A couple of months ago I was contacted by a scholar unknown to me, who had seen my profile on Academia and offered me to write a chapter in an international anthology very relevant to my field. Of course I sent an abstract which was discussed and eventually approved the normal way, but the thing is: This scholar would probably not have known about me or my research hadn’t it been displayed on the web. Then, the other day I was contacted through Academia by an international scholar in another field very central to my PhD project. He invited me to give a paper in his session in a big international conference, since he thought my aspects as shown in my profile were interesting to his theme.

This not to put forward my own work, but  to give an argument for spending a few minutes a week updating one’s net presence. I can’t help but thinking of the tourist couple at the Spanish stairs in Rome (not far away from where I am working right now); the husband climbed the stairs to the top, and once there his warm and tired wife called him in a loud voice: “Bob! Bob!! Is it worth it?”. Returning to my case mentioned above I would say – Yes, it’s worth it.

 

Project update

OK, finally a sign of life from this silent horizon… I have been busy moving myself, my family and loads of research material to Rome, but am now starting to get on top of things little by little. Wifi situation is generally pretty bad here which is frustrating for a wifi addict, but we are working on improvements (hopefully).

Brief note about my project:
I have met other scholars here working on the Grand Tour in premodern time, though none as early as my period (17th century), and they are also very much into historical maps etc – however not at all using digital tools (yet!; I have a mission!).
Fiddling around with Omeka and Neatline, I am still on track with trying to visualize one day or one stay with a couple of travelers – I think I will pick the two most informative ones and create a comparison based on maps, routes, highlights described, and comments/reactions (if any). To this, I will add the vademecums or guidebooks used by visitors to Rome at the time – also quite fun material visually – and perhaps some info the local guides conducting the tours. Still quite premature, but hoping to develop this during the coming weeks!

Regarding the essay, I am quite tempted to try the possibility to publish something as offered by Finn Arne, but want to make sure I have something interesting enough to say first, and that I have sufficient time during my time here in Rome. Will get back on that, too!

I have blogged on various themes, often with bearing on our course in one way or the other – please comment if you like! holysmokephdblog.blogspot.se

This in a hurry from hot and humid Rome, now facing a thunderstorm (don’t even want to think about what that means for wifi…). A presto!
Helena

Better late than never…

I am sure it is all too late introduction, but as the saying goes, better late than never.

So, my name is Lukasz Gorniok and I am a PhD candidate in history at Umeå University. My research deals with the emergence of Swedish active orientation in world politics by the end of the 1960s and early 1970s. The study aims to review and evaluate Swedish foreign and migration policies through examining the politics which shaped the Swedish response to the events in Czechoslovakia and Poland, and refugees fleeing their communist countries. It is based on multi-archival research in the National Archives of Sweden: archives of the Swedish Ministry for Foreign Affairs, Swedish embassies in Warsaw and Prague, and comprises of diplomatic correspondences between these institutions, public and confidential reports, memoranda and minutes of the meetings. In principle, these records require pure qualitative methods and until now my work focused on various analyses of these data.

The presentation of these results is another story. I hope to use the Digital History course as a springboard for digital presentation of my research. In other words, my aim is to improve study quality. One of these days I will present a more detailed overview of how it will be done.

Lukasz