Every now and then since I started studying literature, the rumour has popped up: There are people out there applying STATISTICS to literature. This is usually followed by an alarmed or blank facial expression, involuntary shuddering and, frequently, someone giving the closest wooden table a discreet knock. If someone mutters “statistics” we answer “bless you” and continue ignoring the subject to death. The statistical scholar of literature is like some embarrassing cousin we have an unspoken agreement about never to mention. He or she is our Asher Lev, a traitor overturning the True Spirit and Purpose of literary “research” (our euphemism for personal explorations of the eternal and existential mysteries of…well, existence), someone who has seemingly converted to the Other Side and joined the enemy. As my own closest acquaintance among the list of honorable dead white men (guess who) puts it:
…thou shalt not sit
With statisticians nor commit
A social science.
This last point understandably caused me some qualms the day I sat down to have a coffee with Digital Humanities enthusiast Annika Rockenberger, organizer of the PhD seminar “What Are Digital Humanities” which took place last month at the University of Oslo. The purpose of the seminar was to explore the role of digital methods within humanities research, to exchange information on interesting programs and developments within the field, and to initiate a network for scholars within (or interested in) digital humanities in Norway. I wanted to go because, despite my interest in the field, my grasp of the kind of technology, ideas and projects that even exist is superficial at best. Luckily, Rockenberger was both friendly and informative, and agreed to sign me up for the seminar. My main goals in going were simple: What kind of methods and programs exist? And where can I learn to use them? Otherwise, I hoped to soak up as many ideas and as much information as possible.
Here is some of the new info I thought I’d share:
At the University of Hamburg, a team led by Jan Christoph Meister has created a program for “Computer Aided Textual Markup and Analysis” – CATMA. You can log in with your Google username, and according to Meister “all you need is to be able to think and click.” Here’s how you use it: Paste in the text you want to analyze, then highlight and annotate as your research question requires. You can insert label and different markups, and in addition there is some kind of “query builder” (I haven’t tried this yet). The only requirement for using the program is that you are willing to share the tags and annotations you make: The project is built on the principles of openness, collaboration, sharing and exchange. As the manual says, the idea behind CATMA is to “generate, exchange and exploit complementary and even competing descriptions of the textual phenomena which we observe and analyse.” You find the start page here and the manual here. Knock yourselves out.
CATMA was developed within the framework of the project CLÉA – Collaborative Literature Éxploration and Annotation. (There’s a funny story behind the É, but I’m not sure I should put it on the blog…) The main idea behind CLÉA (copied from the website) was to “use the advantages of a collaborative web based approach not only for the storing of source texts, but also for the creation, collection, aggregation and analysis of meta data, above all Tag definitions and annotations.” Doesn’t that sound exciting? According to Meister, the team hopes to eventually create algorithms for “automatic heuristic functionality”, i.e. a kind of “Propprian function detection” that suggests tags and markups FOR us. The idea is that the program will become smarter the more we use it: When we use CATMA, we feed it our data, contributing to the program learning to recognize tag sets and functions. As a result, the program can eventually feed suggested annotations back to the user. (Come to think of it, perhaps this is a feature already? Is there anyone out there using the program already that could tell me?)
Meister in fact refers to annotation, or “enrichment of data”, as the “core activity in the humanities.” That should be food for thought.
He ended his talk with a Kleist quotation: “One could distinguish two classes of men: those who are capable of metaphors, and those who are capable of formulae. Those who are capable of both are too few; they do not form a class.” (Auden would surely have seen such a class as united on the basis of schizophrenia, as he saw the two different mindsets as “Related by antithesis / A compromise between us is / Impossible.”)
I think I’ll stop there for today. Perhaps I’ll write more about the other projects and talks later. If you’re interested in the field, here are some more links that you can explore on your own:
NETDRAW (a program for visualizing social network data)
NODEXL (another visualization and social network analysis program)
Dokumentasjonsprosjektet (fulltekster til norske klassikere på nett)
DARIAH (Digital Research Infrastructure for the Arts and Humanities)
Jill Walker Rettberg’s blog (Professor of digital culture at the University of Bergen)