[MUSIC PLAYING]
Welcome, Silke Schwand, to Leipzig.
Once again, not sure when you have been here the last time.
I think--
For the historic attack last year?
Yeah.
Oh, historic attack, yeah.
And also for a digital humanities event,
I think two or three years ago.
Yeah, that's true.
Great to have you back.
I know you very well, but I think
this is not the case for everybody here in the room.
So I will do what, well, is typically
expected when important people like you are introduced.
I will tell people about the most important stages
in your academic career.
So I think it all started in Bielefeld.
So Silke Schwand is a trained historian.
And I think the next stage of your academic travels
was London, and from London to Frankfurt.
I think it was also Frankfurt, where you did your PhD.
And I think in Frankfurt, you already
started to get more digital, because you
started as a traditional historian back in the day.
And we will see you digitized yourself and your research--
not yourself, but your research--
through the years.
After finishing the PhD in Frankfurt,
I think there was another research project in Frankfurt.
And then you came back to Bielefeld.
I think first as a [SPEAKING GERMAN]
whatever that means in English, as a lecturer.
And then you were appointed a professorship
for digital humanities and medieval history.
It's digital and medieval history.
Now it's only digital history.
Yes, that was my point.
There is a difference.
Showing the incremental digital digitization
of the researcher Silke Schwand.
Because I saw in your CV, like two years ago,
you switched the denomination to just digital humanities.
So no more--
History.
It's digital history.
Digital history.
Sorry, I know.
Sorry, I'm very particular about this.
Still a historian, but a digital historian.
And no more medieval history.
Well, as you will see later, there
is some medieval stuff going on.
So I bet it is digital history.
So that's for the very brief, well, academic career.
As I said, I also know Silke from, well,
the digital humanities community very well.
Actually, there's a couple of projects I could mention.
One of my-- I will mention my favorite project, which is--
I think it's called eTARDIS.
I never understood what the acronym stands for.
I will tell you later.
It's in the presentation.
OK.
So can I spoil it right away?
So it's about virtual reality for visualization purposes
in history, which is an amazing topic.
And yeah, you can expect some very nice demonstrations,
I guess.
But Silke Schwand is not only, well,
an applied digital historian.
She's also very much interested in the theory
of digital humanities and digital history.
And this is where we, I think, had our last encounter,
collaboration. Well, collaboration sounds weird
because it was a staged dispute between a historian
and a computer science person.
And we didn't really dispute about algorithms.
And I introduced algorithms from a computer science perspective.
And Silke argued that traces of algorithmity
can also be found in humanities and especially in history
research because you have procedural, very systematic
workings.
And maybe you will also come back to this in your lecture,
I guess.
Yeah, so many more important stuff to tell you about Silke.
But I think she will present some interesting research
herself.
Maybe one last important thing, which
is very recent in her activities.
So Silke can be really happy that she's here
because she's really busy right now because she's planning
the digital humanities conference,
the German digital humanities conference, DHD.
Maybe some of you know it.
And it will be in Bielefeld next March, March 3rd to 7th,
I guess.
Yeah.
And yeah, whoever of you was involved
in organizing a conference or even just a workshop
knows it's a lot of pain.
She's like the head of the local organizing committee.
And it's really a pleasure to have you here,
despite these efforts of the organization.
And if you're ready for your talk, I will hand over to you.
Thank you very much.
Thank you very much, Manuel, for the kind introduction.
And I kind of second the war invitation
to Bielefeld next year.
And the motto of the conference is under construction, one,
because the campus in Bielefeld is very much under construction.
And two, the whole discussion about digital humanities
or digital history, whatever it is in this regard,
it is our understanding it should be more
about being under construction.
So trying to build together approaches
on different disciplines, a very formal approach from the computer
science or computer linguistics, so a very formalized approach
when you work with computers, with something
that is very interpretive, very hermeneutic, very also proud
to be vague and not trying to kind of pinpoint
each and every single data point.
So that's where the motto under construction comes from.
And this could also have been the title of my talk today.
But I decided to do something similar content-wise,
but to start at a different point.
So today, I will talk about figuring out the past.
Do numbers tell stories?
What you can see on the slide is the only hint
at artificial intelligence that I'm going to do today.
So this is a AI-generated image that you get when you ask--
in this case, it's OpenAI's Deli--
to give you a picture about figuring out the past.
And I thought this was really interesting because it had
all by itself the idea that it's about books and figures.
But I'm not going to go and delve into discussions
about artificial intelligence.
We just had at the beginning, Manuel and I,
a little conversation about how much we despise
the term artificial intelligence in and of itself.
I'm not going to open that box, not today.
I will start today with telling you
something about where the first part of the title comes from.
It is a quote.
It's the title of the book by Peter Turchin and Daniel Hoyer.
It was published in 2020.
And it is called "Figuring Out the Past--
The 3,495 Vital Statistics That Explain World History."
And I have this book, and it sits on my desk,
and I always wanted to talk about this book.
So this is basically where my idea--
to make this lecture about the question,
what is quantifiable in history, or what kind of quantifiable
data can we use when we want to tell histories
or when we want to build historical narratives?
That's what my lecture is going to be about today.
I quote from the introductory remarks
by Peter Turchin, who himself is quite the accomplished
historian.
And he has a big project about collecting historical data
that I'm also going to introduce in a short while.
So back to the quote.
"We are used to thinking about history
in terms of stories, who did what to whom.
Yet we understand our own world through data,
vast arrays of statistics that reveal
the workings of our societies.
Why not the past as well?
Figuring out the past turns a quantitative eye--"
and figuring out the past is the book--
"turns a quantitative eye on our collective trajectory.
Behind the fleeting dramas of individual factions and rulers,
it looks for large-scale regularities.
It asks how key social and technological innovations
spread around the world.
And it pinpoints outliers from general trends."
And I think this quote really sort of opens up
the space between two poles, the very quantitative, very
statistics, very data-oriented interpretations that we maybe
associate with social sciences rather than with history.
And we tell stories about wars and people and interactions
in a very sort of unclean kind of way that we,
at least according to this quote,
have a way to tell stories about history.
You can see this in the behind fleeting dramas
of individual factions and rulers.
I mean, I think this is an image of history
that at least I don't share.
And that has been maybe present in earlier centuries.
We don't do history like this anymore.
But I think it's a really interesting starting point
for discussing what I wanted to talk to you about today,
how data about history and historical work
as a historian-- so building historical narratives,
making historical arguments--
how that works together, if it works together,
and where it maybe also does not work very well together.
I was hinting at the project that is behind this book.
It's SESHAT, the Global History Data Bank.
I don't know if anybody has had--
came across this before.
I think also maybe within this particular center
here at Leipzig University, this could be an interesting look
at.
The idea of this project is to collect lots of data points
or information about history, about global history.
And it started with building a global historical sample,
as they call it.
They looked for 10 world regions distributed as widely
as possible across the Earth's surface.
And within each of those regions designated three so-called
natural geographical areas with discrete ecological boundaries
in average about 10,000 square kilometers in size,
thus creating an initial sampling
scheme of 30 such areas around the world later extended to 35.
This is taken from the home page in the description
of the project.
So this is the areas that they collect data about and data
from.
They do this in a very sort of a combined process
of working together social science data,
but also working on historical sources,
reading, extracting that kind of information
from what we traditionally would look at.
And I quote the home page again on the slide.
Most variables in SESHAD require the data
to take the form of a number or a numerical range.
Or they specify a feature that can be coded
as absent, present, or unknown.
So this is sort of the way that they
described how the data bank, the data model actually works.
So they usually put in numbers, and they
can define categories as absent, present, or unknown.
And all data are linked to scholarly sources,
including peer-reviewed publications
and personal communications from established authorities.
So they also try to pick up on sort
of the known and established historical narratives.
This is what the home page looks like.
So everybody, please feel invited to have a look.
If you scroll down, these are some more of the categories
that they look for.
They have general variables.
And this all-- maybe this is important--
it all sort of evolves around those regions.
So they have data for a certain region.
And everything is sort of linked to a geographical area.
They identify 47 of such regions.
They started with 30.
They have now 863 polities that were present in these regions
at one point or another.
They look at general variables, warfare variables,
social complexity variables, human sacrifice,
lots of different sort of things that could be of interest.
They themselves in this database claim
that they don't offer a pre-constructed narrative.
I would always set a question mark
behind something like that, because the way that you present
your data, the way that you collect your data,
the way that you build the data model always already
refers to some kind of narrative or pre-constructed
understanding of the things that you are looking at,
especially if you take it from text that you're reading.
So it's not like a survey or just sort
of counting heads of people.
So this is the database for the book.
Now let's have a short look at the book.
This is the table of contents.
And I can share this later, because you don't really
have to read this now.
But as you can see, they have society profiles
starting with Egypt.
Or they have society profiles, and they
have sort of three categories or two categories,
ancient and medieval.
Because the book is trying to give us
information about historical periods
that we don't know so much about,
claiming that about the modern world, we know so much
and we have so much data that would be comparable to this.
So essentially, one could argue that with this book
and also with the Global History Database,
they try to make pre-modern societies comparable
to modern societies in a way that we
think of society as being able to be expressed in data.
Whether or not this is a good idea,
we can discuss maybe later on.
But this is sort of the general setting.
I will show you quickly something medieval,
because this is where I know most about.
And we'll have a look at the rankings,
and we'll have a very short look at also the regional adoption
and the maps in the end.
So this is the table of contents.
And this is what the book looked like.
Very authentic, because I just took pictures of the book.
So this is what they offer us for France,
Carolingian Kingdom, short introductory remarks.
And then there's several categories
where they just input data.
And although the data bank says we only
have categories or labels that we fill with the numbers,
there's some text in here.
So it starts with the duration, 752 to 987 CE.
Being a medieval historian myself,
this is debatable already.
This is like the first entry, and we would say, yeah, OK.
Why this, why not other?
I mean, I probably don't have to explain this
to the people sitting here, but it's debatable
whether this is the correct date.
Then they have the total area.
Again, at what point in time, what is the extension,
how did this change?
So as you can see, being a medieval historian myself,
I immediately start to question not the facticity,
but maybe sort of the point in time
that this data point represents.
So I want to know, why did you decide
to say this is the duration?
I want to decide why, or I want to know
why they decided this is the area that it covered.
Then we have institutions, legal code absent.
The Monumenta Germania Historia in Munich,
which is an institution that at least medieval
or pre-modern historians know very well,
are just on the way of doing a re-edition of the Carolingian
legal codes, the Capitularia.
So again, I'm not so sure.
I don't want to bash this, but just to make a point
that they're trying to data find something that might not
fit this general idea of how we deal with data points
that we might have today.
And if one of the arguments is to make a book like this
in order to make pre-modern societies
comparable to modern societies, is that the right way
to compare?
Is it even the right way to describe modern societies
if we look at modern societies through a data-fied lens
or a data perspective?
Just because it's so much fun, we're going to look at this.
10 largest field armies, ancient, medieval, early
modern, in the entire pre-industrial area.
This is actually interesting because I have no idea
about military history, so this is blank for me.
I cannot argue as much as I could with the previous slide.
But what is interesting is that this is actually not--
for my students, I would use this
to show them that they need to get
rid of their European focus.
Because medieval history in German universities
is mostly Latin Europe.
And you can see immediately that at least withstanding armies,
there is not so many European polities in the list.
Of course, here's the Romans.
And there's lots of East Asian areas on the top.
This changes.
So here's the Ottoman Empire.
And up there, like for the early modern period,
we have the Bourbons on top.
Interesting.
They have a larger standing army than some of the regions
that were bigger before.
So I forgot to show you the map.
Maybe I can do this over drinks afterwards.
Sorry, it's gone.
So Turchin argues that we understand our own world
through data, and this is an argument
that he makes for his contemporary world,
vast arrays of statistics that reveal
the workings of societies.
I would then like to ask him, what kind of data do you get?
How do you collect it?
I made this excursion into the data bank
to give you an idea of how this generally works.
The statement makes it sound as though dealing with data
and doing statistics is a very contemporary thing,
or at least a very modern thing.
Again, maybe this is a void argument within this audience,
but obviously-- or of course, it is not.
So there have been lots of historical data practices
before.
And even Amina Say, who some of you might have read,
he published a book called Musta in German,
now translated into English as Patterns,
a theory of the digitized society,
argues that digitization, and with it
the understanding of data that is most common today,
is not something that kind of started when we--
at a point in time where the technology was far ahead
enough to represent what we understand when
we talk about digitization.
It's basically ones and zeros.
So it's trying to find something that is quantifiable,
that I can count.
Digitization comes from digitus, taking your fingers.
So it's basically counting, right?
And these kinds of practices were in the world
long before what we today understand as digitization,
and also long before that what we talk about--
what we think about when we talk about data or statistics.
I think this is important to understand
when we try to evaluate how or which role quantifiable facts
or data plays in constructing historical narratives, which
is where I will be going by the end of this talk,
that it's not something that is here just because we
have computers now.
And we've been talking about this earlier.
There's a long history in historiography
also with social history and economic history
of the '70s and '80s where people counted things,
historical demography.
There's lots of different areas that
are part of what we understand as history
as a discipline where quantification is very present.
We've had these discussions whether numbers tell stories
or not decades ago.
We have them now again.
And nobody talks about why don't we relate the two discussions
to each other.
So why do I still have to justify
that I work with quantification--
setting, machine learning, and all these fancy stuff aside?
Basically, it's quantification when
we had this discussion like 70 years ago
and again and again and again.
So also for history as a discipline,
this is an interesting observation.
I don't quite know what to do with it yet.
But I think there's also a narrative there
that we should have a look at.
So if we follow Amin Ase and not only him,
he argues that digitalization is not only
a contemporary phenomenon but generally helps societies
to deal with and reduce complexity
by using coded numbers to process information.
Make something quantifiable.
Make it countable.
This makes it easier to get an overview, to get control,
or apparently get control over complex situations.
Which are the data practices that we know historically
and how do we use data practices in history?
I've started to kind of get into that question
with my earlier remarks.
Just a few very random examples.
This is a contemporary map from 1903.
I did a seminar, bachelor's seminar,
at Bielefeld University where we worked on election data
from previous German elections.
And the students were really surprised that in 1903, we
already had all these kinds of diagrams.
They have bar diagrams.
They have pie-- they have it all.
So why is this?
I mean, this is 120 years ago.
So there was some irritation, which
I think is good to start to get them
into questioning certain things.
So the way that we deal with data
and also visualizations that are so natural to us
have been there for quite some time.
Another example, all the way back at the beginning
of the 12th century, "Doom's Day Book."
I think most of you have already maybe heard of this.
This is the first survey of a register
of the British population, stating also the land
that belonged to certain people.
And this is a register that is still more or less valid today
because you can go back all the way to the time of William
the Conqueror just after 1066 when he started to record this.
I think the first survey of the "Doom's Day Book" is 1080,
so at the late 11th century.
So counting people, trying to pinpoint areas
in which people live, and thinking
about who is in charge of which area,
how does the politics kind of work
on the basis of data and statistics is nothing
that we haven't seen before.
It has a long tradition.
So what about the way that we deal with this today?
And how does that--
maybe sort of reopening the discussion
about quantification in history, or quantified sources,
or like tables and numbers as sources in history--
what does that have to do with digital history?
And why do I think that we still need to think about this?
Here's another example.
And this is important because I'm getting back to this
when I talk about the VR project.
This is one page within a big book.
It was first published in 1952 by Arno Peters and his wife.
Arno Peters was a historian and geographer,
and he collected lots of information
again about historical events in different world regions,
and he published a book about it.
And it's a big book, and when you open it,
you always have one of these big pages,
as you can see on the screen, in front of you.
And this always covers 100 years from here to here.
The middle section gives you people
who lived within the century according to their birth dates
and death dates.
And also you have certain events that are
categorized in different colors.
This down here is wars and revolutions.
Up there is inventions and economics, and also something
that we would call maybe history of ideas
or intellectual history.
This is a practice where he tried to also kind of open
this idea of collecting data about, in this case,
historical societies, and not only having a European view,
but also making this--
I'm always sort of hesitant to use the term global,
because it's much more complex, I think,
than what I usually say with it.
But it's trying to sort of open a perspective that
goes beyond sort of the European perspective
and give you information about historical events, whatever
that is, and people that were present in a certain century.
This is a picture of the data collected
that is in the background.
The [SPEAKING GERMAN] contains 12,162 events.
Some of them are geo-referenced, but not all of them.
There's information about persons.
There is a keyword register that is something like a dictionary,
but there's no real understanding
of why these words are the important ones.
There are also connections within the data set,
so which of the entries is sort of related to another one.
And these are the entry cards that the people
made in the '50s and '60s when the book was sort of put
together.
And you can see this is all typewriter.
This is all German, unfortunately.
And then you have different categories.
You have some handwriting for the datings
and all that kind of thing.
So you can think of a big [SPEAKING GERMAN]
that they had in order to make this work.
And of course, this has been digitized now.
And although I say this myself, it basically looks the same.
Same effort, same data set, same representation.
So when I first opened this, I was a bit disappointed,
because I thought, OK, you have the big data set,
so what do you do with it?
First of all, and this is something
that happens in digital humanities and digital history
most of the time, it can be read as something that
is called a proof of concept.
So I show you that my digital version can do the same thing
that the paper version can.
And once I've established that, I can take the next step
and show you what the digital version can,
what the paper version cannot, which is basically,
in this case, rearranging information
and digging deep into the connections that
are present in the database.
So again, I explained this before.
So here's the people.
Then there's events in these different categories.
You have some pictures.
You can now-- this is interactive,
so you can click on it, and then you
can see the information that was entered.
So this is the first upgrade that the digital version
has over the paper version.
And this is-- oh, sorry, I have this in bigger,
so you can have a closer look.
This is the 14th century.
Again, there's the medieval training
coming through at some points.
This is more interesting because this is a visualization
that the digital version of the Zunkornapturshir world history
offers us that gives us a different view on the data
sets that they have.
This, going back one, is still a representation
of history in a straight line.
This is a very traditional, very linear, very kind
of [SPEAKING GERMAN] history of events kind of approach.
Also, important people, most of them are male.
I'm just mentioning this because it's part of the truth.
But now, you can kind of rearrange the data points
and play around with the perspective also on history
as something that happens in time.
What this visualization does is you
have one of the events in the middle.
In this case, this is the same event
that I had the information card about.
It's 1331, the invention of canons,
or the first use of canons in Priole in Italy.
And then you can see other events or entries
in the database that are related to the one in the middle.
It's not linear anymore.
It's more layered, right?
So I have sort of time layers that surround
the event in the middle.
So the further away on the layers,
the further away in time.
And this can be either before or after.
So it's not a linear--
so this is not early, and this is late.
It's just sort of the absolute distance.
And the size of the nodes represents
the semantic relation or the closeness of the events
to each other in a more hermeneutic way.
Interesting is we try to understand
how this works computationally, and it's just random.
It's really random.
But the idea is that the bigger the node, the closer
the events are to each other, like semantically
or hermeneutically speaking.
Just one example down here is the Hussites
who use the haubitsa for the first time.
So this, I get, is quite similar as a event.
So they developed a weapon.
They're using it for the first time.
OK, right, so that those are more or less related.
I get that.
So I think this shows that data sets and the way--
or rather, the way that we use, in this case, visualizations
and representations of data sets that already exist
or that we kind of also generate changes our perspective
on the kinds of stories that we tell.
If I use this, I'm not going to start linear.
I'm going to start in a relational way.
So this is a representation of a relational way
to tell historical or build historical narratives,
rather than just having it in a linear way.
And this is, I think, one takeaway for me.
Using these kinds of visualizations
and playing around and exploring different kinds
of visualizations for the same data sets
helps me to sort of open up different perspectives
and maybe gain more than just one perspective
on a certain historical topic.
So I already started with my last example
to go into the complex of questions
that I want to talk about now.
Is it really data versus stories?
Or how do stories maybe also consist of data?
Or how do certain data arrays or arrangements, data sets,
visualizations, guide the way that I tell stories?
I could now also give you a lecture
on what historians usually do, but I'm not going to do that.
Because I see some faces here that
have done this for generations of students already.
So we're not going to go down there.
But I think it's important to reflect--
and this is why I made so clear that I'm a digital historian.
Not only a humanist, I'm a historian first.
Because I have a certain approach and a certain kind
that I ask my questions when I look at data sets.
And I think this is important, because this
is different to my computational literary scholars, colleagues.
They work with data sets as well.
They work with the same methods, but they
have different sets of questions.
And they want to sort of tell other stories
if we phrase it like that.
OK, so what we do as historians is--
and I'm going to do this in just one sentence--
we try to build plausible narratives about the past.
It's important that they are plausible.
They are not facts in and of itself.
They consist of facts.
They consist of data.
But there's not just one story about something.
So it's important to me that we try--
that we sort of focus on the plausible.
Because I think that for historians
and for other hermeneutic disciplines,
some kind of vagueness and some kind
of sort of interpretation and leeway
is really important to the way that we work.
We always try to differentiate things.
It's just not one way.
There's the most plausible way, but it's not the only way.
And this is something that is really complicated
to match to data sets, because data sets usually are one way.
It's one data point.
It is one category.
They might have different properties.
I can try to kind of build a complex data model where
I have lots of different layers in my annotation scheme,
and it goes like a tree in sort of different directions.
But basically, I always have to make a decision
if it is yes or no in the end.
And this is something that can be very painful for historians
if they have to, say, make that particular decision
in a historical case.
And because it is so difficult and sometimes painful,
it is important to dwell on this maybe just for three seconds
longer.
Historical narratives might be based on data
or are basically always based on data
if you think of all kinds of information
can be data that represent a certain perspective.
It's always made.
Johanna Drucker, one of, I think, the most interesting
and also sort of already long-term digital humanists
in the United States, in one of her publications,
she claims that we should stop calling it data
and start calling it captor from Latin "catching,"
because data is always a given.
That's sort of the literal translation.
And she says we should rather call it captor because it
is something that we take and that we form
and that we sort of decide what it should look like.
And I think this is important.
Every database has a perspective.
It's not just there because it is there.
Even the decision that the people of the Global History
Data Bank made--
do I calculate heads?
How big is the area that I define as the area?
This is all part of that particular perspective.
And this is true for narratives as well as for data sets.
There's no difference.
Maybe narratives can be more elaborate about this.
Data sets need a really good documentation
that people can follow in order to understand this.
If this is true, that they are all
dependent on a certain perspective,
they are oriented in knowledge orders and models.
So there's a certain understanding.
I think I started this talk also with making the point
that it's always also about sort of a hidden understanding
of something.
If I make a database, I have presumptions.
And I think that what historians can learn
in the process of creating databases that
need a good documentation is to be
more precise about what we think that these presumptions are.
Because we usually work in footnotes,
which are also very differentiating.
Maybe there's still interpretation and arguments.
And I think we can learn from documentation to process
the research decisions that we make,
document them more closely.
And I think this is like another takeaway if you work with data.
Not to say digitally, because we all work digitally,
but to work with data.
I'm coming back to that perspective maybe in a bit.
The last point on this slide is something
that I will come to later when I give you
some examples of visualizations.
I think that one point or one element of also
the digital version of the PTAS helps to understand this--
let's call it perspectiveness or being
bound to a certain perspective.
This is interactive.
I can change the perspective, and I see the image changing.
This is really important to give users this understanding of,
if I stand on another point in the data set,
I see other things around me.
And this is basically the starting point
for the virtual reality project, which basically works like this.
I can stand within a knowledge graph on a certain node,
and then I can move.
And then the whole thing changes around me
so that I understand I can only see in this cone in front
of me.
Maybe I have to turn around to see something
that is almost as interesting.
But I've never looked at this part of the knowledge graph
before.
So narratives are one perspective.
They might be multilayered and have
integrated more than one perspective.
But in the end, if you tell the story,
it comes to one certain set of perspectives.
Same is true for data sets.
Can we then use data sets maybe also in the same way
and make them tell stories?
Or how do we-- and this is one of the questions
that I started off with.
How do we combine data and storytelling?
And one format that we use also in teaching is data stories.
This is nothing that we invented.
Data stories is a thing, just not in history.
But in lots of different areas, there's
data journalists that basically produce data stories every day.
In businesses that sell data, they
sell data by telling stories with that particular data
to make a selling argument.
So data stories is a format that is
something that is very present ever since we
have lots of data sets that need to be told or told about.
And we tried to use data sets in a seminar
again for a historical data story.
And this data story is based on the Blumenbach online.
Blumenbach is a-- let's call him ethnographer maybe,
early or late 18th century.
And he collected human remains, skulls in particular,
in order to categorize humans in ABCD and so on categories.
This is an event or a situation that is often also retold
in stories of racism and the idea
that you can see at people and put them
into certain categories.
And he collected these skulls all over the world.
And the Blumenbach archive in Göttingen,
they have his correspondence.
They have the roots that--
they have the roots that skulls took.
They have lots of metadata information on the skulls.
And we used these data sets, or this particular data set,
all surrounding the collection of Blumenbach
to create a data story that basically looks like this.
And this is German again.
I apologize, but my students usually work in German.
So we haven't yet had the chance to translate this.
It's about collecting, describing, categorizing,
about the--
I think I gave it an English title--
about the treatment of human remains
at the beginning of modern science,
because what the students were most interested in is,
how does this collection of skulls
relate to what we today think about how
we deal with human remains?
So this is the kind of story that they wanted to tell.
And we had a data set.
And then we tried to make a story out of it.
And this is what the students did.
So it has this opening teaser, basically.
And then they decided that they wanted to track
the movements of the skulls.
And I think this is something that we in digital history,
but maybe also, Manuel, in your group in computational history,
what we often do is listen to someone who has an interest
and has some material, and then try
to design bits and pieces of computational methods
or visual representations that can help in telling the story.
Most of the time, the digital history project
does not just consist of one method,
but it's a set of methods that represent different steps
in the research process.
So what we did is we wanted to follow
the journey of the skull.
So we decided to do this with GIS methods, representations.
So we designed interactive maps along three different points
of the journeys.
The first map is about where the skulls come from.
And because I told you that, for me, it
is important to make this interactive
or to give the users possibilities to find
their own interests.
In the data sets, we have marked down all the locations
where the skulls came from and linked them
to the metadata information about the skull
that we had in the database.
So this is basically information retrieval, right?
It's just pretty.
You could do this also with a search field in the database,
but this is more nice.
It doesn't do much more than that.
The second thing that students wanted to look at--
and in between these visualizations,
they tell the story.
They have the historical information,
and they give the narrative, and they give you more information
about certain skulls.
And then they wanted to show the places
where the skulls went to and the routes that they took.
This is not well done because the routes all
seem to be very straight.
Obviously, that's not what happened.
This is historically very incorrect, right?
But this represents the points of data that we have.
So the available data points often
are where does the skull come from,
and then we maybe have one or two brokers in between.
So they were shipped to St. Petersburg,
which is basically where, if you zoom into this,
you can see this here, St. Petersburg, this person here.
Just lots of connections.
There's also lots of correspondence
between him and Blumenbach.
Niko Karpinski, he was one of the brokers
that Blumenbach used to bring the skulls back to Göttingen.
So we have some stations, but this is not
a historically correct route that the skulls took, right?
Part of the story that the students took
was that one of the skulls, very old one,
he was on the last bit of road just before Göttingen.
The coach broke down.
And then the box where the skull was in fell off the coach,
and then it broke.
So it had a very long story, but then it
never arrived in Göttingen.
This is what the students tell you in the story.
And then the last step was that we were thinking about, OK,
so we have the point where they start.
We have the route that they take.
Then they arrive in Göttingen, but then we would have just one
bubble in Göttingen.
So again, the logic of the visualization
also helps you tell the story in a way
that people can understand what you're trying to say.
It's not an obvious sort of visual representation.
It's always also a decision, part of the storytelling.
So what we did was we went back to the information
of where the skulls came from, but the colors and the letters
represent the categories that Blumenbach later invented
or that he built from the skulls.
And then you can have your own thoughts about whether or not
this works at all.
But again, you can click on them,
and then you get sort of the database information.
So there is a way to still tell stories with data
and integrate data in your stories
with interactive visualizations.
It's just a different format, right?
It's not telling stories.
It's not using data.
It's just trying to explore ways to combine the two.
Yeah, OK.
I'm going to do this quickly because this is boring.
This is what I do every day.
So for me, it's boring, at least.
I work on court records, medieval court records,
from the so-called justices in Ayr.
That's basically the king's judges in the 11th century.
They started to not only stay at Westminster Court
and receive all the litigants at Westminster,
but they started to go to the people
and bring them the king's justice.
That's more or less the story.
And they have this very long way that they covered.
This took about two years to go from one end to the other.
And this is-- I'm just going to show you this quickly.
This is now an example for not using existing data,
but me being a medieval historian interested in court
records, trying to turn this into data that I can then
use with quantitative methods.
So what we do is we train machine learning algorithms
in reading the scripts or transcribing
the handwritten scripts.
We turn this into machine readable text.
And then we use lots of different natural language
processing tools to treat the texts with.
So named entity recognition, automatic recognition
of person names, of locations, for example,
trying to then build networks.
We try to look at or to understand
what are the legal terms that are
present in the earlier records versus the later records.
So trying to find how do these different legal categories
stabilize over time.
And because I have a big corpus, they
started to collect these kinds of court rolls from about 1180
until the 19th century.
I'm going to stop somewhere way before that.
But you could do this in a really large scale study
if you had the tools, the time, the resources
to digitize it all in a way that would make this possible.
And we deal with metadata and textual data when we do this.
We also deal with person names.
And what you usually do if you identify people,
you build networks, right?
Because this is a go-to method if you
want to think about the relation again.
I think I started off with--
also, I made this point earlier that I'm
about relations and relational history.
So I did this little plot.
It's not one network.
As you can see, it's lots of really tiny networks
that just connect three or two or four nodes with each other.
What you can see here, every gray node is a court case.
And the other nodes are the people
involved in the court case.
So it's usually two people, one case.
Sometimes it's more, but that's sort of the usual thing.
And then the color of the nodes tell you
which social strata or category these people come from.
Because I'm interested to see how the social strata that I
am expecting in a medieval society
is represented in these court records.
Is it villages against villagers?
Is it villages against regional landholders?
Is it knights against clerics?
What are the social connections that
are built in the situation in court?
Because these are people that usually maybe not talk
to each other on the street.
But if they have a legal battle, they need to.
So what is the area the court, as an area
of social interaction does to this?
And as you can see, these cases connect different people.
But sometimes, people also connect cases.
And sometimes, only people connect to each other.
And these are different observations
that I can make on a pattern level, pattern recognition
mechanisms that I can use that help
me ask questions that will then bring me back to the text.
So this is not a result of any kind.
This is just one further step in the research process.
If I'm interested in the kinds of networks
that are represented, I would now go back and see, OK,
this one here, is this a mistake?
Is there just no case mentioned?
I did go back.
And it's actually a retrial of a previous case.
So the case is not mentioned anymore.
It's just about the people.
They come back to court like a week later and just continue.
But because it's further down in the document,
it appears as a single entry.
So I learned something about the structure of the document
also from the personal network.
And I think this is very important that you learn
how to read the visualizations and learn
how to read the kinds of data points
that you have in order to get all the factors into your story
again.
Because I think that-- or I observe
that one of the main arguments against working with data
often is, what do you do with context?
If you isolate something as a data point,
how do you get the context information back in,
which is one of the most important things
for historical interpretations?
And this is one way that I use these visualizations that
bring me back to the context and that I understand the way
that the documents are structured, for example.
So for me, digital history and also the way
that data is used in historical storytelling
is quantitative methods.
I would rather call myself a data historian
than a digital historian.
Because I think the differentiation is much clearer
if I do that than just say I work digitally,
because who doesn't?
But just a few points on this slide.
So when I use these kinds of methods for interpretations
of data that I created, it is important to think about this
as methods of pattern recognition.
They allow you to make observations that you cannot
make when you just read a text.
Because texts are organized sequentially.
It's linear.
It has a certain order.
It represents-- and it's not--
I'm not saying that you don't read anymore.
It's just a different perspective
that you then have to bring together again.
And it's about modeling of research questions
and thinking about how can I operationalize
certain sets of my-- or certain questions within my research
focus in a way that I can use formalized and quantitative
methods.
These networks don't make sense for all the questions
that you could ask for the court records, right?
It's just one particular set.
Unfortunately, this means that research questions
need to be formalized in order to be able to do this.
And this is true for any kind of data set.
If you want to match a data set, even social strata data
or social surveys that are already existing,
you kind of have to match your research interest
with something that is very formalized, that
is very strictly organized.
And sometimes our questions are not.
Maybe, again, there's a chance to bring these two kinds
of perspectives together.
And I say it again because you always
have to make people aware.
Formalization needs documentation.
You have to really make sure that everybody understands
the decisions that you made so that they can use
the data that you produce.
It's an iterative process.
It's just part of the research process.
Visualizations and these kinds of things that I'm showing you
is never, for me, the end of the line.
I produce a new set of information
that I then use in order to combine quantitative modeling
and qualitative interpretation.
This is the virtual reality application now.
Maybe we can do this over drinks.
I thought I would show you this.
I will show you later.
And I will solve the riddle of the acronym.
What we try to do is in order to link quantitative observations
and sort of data visualization and information
I get from looking at things to understanding that I am always
situated is basically exactly this.
So you are in a space that looks like space.
And you can move around these knowledge graphs.
This is taken-- so we started to model this
according to the [SPEAKING GERMAN]
that I showed you, or [SPEAKING GERMAN]
The problem is that we understood
that the way that these networks are built are totally random.
Because there's no--
I mean, it's sort of computational, but they calculated
the size of the nodes, which represented
the semantic closeness of two events to each other.
And they didn't use the same keyword every time.
They had a very complex set of calculations,
what is an important keyword, and it
was just not understandable, which I translate as random.
It's obviously not random at all,
but it doesn't have a good documentation,
so you can't really understand what the relations are.
Wikipedia, Wikidata, and DBpedia have a very much better
documentation of the kinds of relations that they make.
So this is basically what you see here.
This is a medieval 100 years war knowledge graph
taken from the DBpedia database.
And if we have some time later in the discussion,
maybe I can also show you this.
But to solve the riddle of the etard is, I just point to this.
And actually, to give you a little anecdote,
so the professor that I started this project with,
he said he was from computer vision and computer
visualization.
He did the whole 3D calculations.
And he said, OK, Silke, we can do this,
but it has to be Dr. Who.
So we knew the thing had to be called TARDIS
in some kind of combination.
And it is now--
it was the Exploración--
yes.
Sorry, I'm laughing about myself.
It's the Exploración Temporale und Reumlich Haddad
in Emeziven Cenarion.
So it actually has a title that later
then came to this acronym, but the acronym was first.
And I do believe that lots of projects work like that.
You have an acronym, and you make the title work to them.
So maybe we can have a look at this later.
To come to an end, because I think
I've been kind of approaching the end of this,
do numbers tell stories?
I don't know whether this is a yes or no answer,
as most historical questions are not a yes or no answer.
I can maybe leave it like this.
What I wanted to point out, and what is important to me,
is that stories and narrations have many forms.
I showed you a format of data stories, for example.
But a data set also is basically a representation of a story.
And so far, as it is linked to certain presumptions
representing certain perspectives,
it's just not neutral, because it's in the world, right?
Important for me also, because I think a lot
and write about the role that visualizations
have in this whole process of interpreting
data sets for historians.
Visualizations also tell stories.
You always decide what you can see and what you need to hide.
They represent data models and make certain information
visible and others invisible.
And this is what narratives do, right?
So you tell a story.
You pick out something that is really important,
maybe because it's exemplary or representative,
or maybe it's the outlier, and that's
what's interesting for you.
But you will never give us the whole picture.
And this is what visualizations do.
They give you patterns.
And then you can zoom in.
You can zoom out.
You can include something, exclude something.
You can say, just give me the unknown data points,
because I'm interested in all the facts that we don't have,
and try to think about why we don't have this.
So I think this also helps us to give a new perspective
to our own stories.
Visualizations help with the analysis of data,
because it just helps you to see.
But it's also really important to think about visual literacy
to understand what happens when you see, right?
And to make sure, again, that people
know that this is a game of hide and seek, basically.
So figuring out the past--
and with this, I don't mean the book,
because I think I said lots about the book.
It's really interesting.
But figuring out the past maybe as a metaphorical phrase
is a multilayered process involving
different steps and methods.
And some of them can be computational.
Some of them can relate to data.
And you use the data, the stories
that are sort of encompassed in data sets
to give you a new insight, maybe.
Numbers can be part of the stories we tell,
and they can help with plausibleization.
I think in the little text that I sent you to announce the talk,
I said that this is something that we sort of expect
from data sets nowadays.
If you can give it-- if you can assign a number to something,
say 80% of people say this and that,
this is how people plausibleize the story, right?
It's more plausible if more people say this.
Or a YouTube video is especially true,
like if you look at the history channels, when
lots of people like it.
I don't know.
It's a weird way to validate this,
but this is what happens, right?
If you Google something, just because something
is sort of the top answer doesn't make it true or false
or anything.
So these mechanisms are also numbers
that sort of influence the way that we understand the world,
the way that people treat facts, the way that you
build plausible narratives without ever documenting
the way that led you to that particular plausible story.
And authorship and perspective are inscribed
in all forms of storytelling.
And this is the point that I made
with the interactive design of visualizations
and also of our virtual reality application.
I think it's important to let people
know that they can direct their own perspective,
that they can find their own view on things,
and that it's necessary that if they can do this,
then probably the people designing the data set
have already done this also.
So it's not just a neutral relationship.
So yes, numbers tell stories sometimes.
But sometimes also stories produce numbers.
And I hope that you enjoyed this little ride through what
I think data and stories and digital history
have to tell each other.
Thank you very much.
[APPLAUSE]
All right, I think it's now time for discussion.
And I forgot an important hint in the introductory part.
Please don't run away after the discussion
because there will be food and drinks afterwards nearby,
I guess.
So yeah, and we can also continue the discussion,
as you mentioned, with foods and drinks.
But first, discussion.
Thank you very much for this very interesting talk.
I was wondering also, as a historian
and as a practitioner of digital history,
if the problem of ambiguity isn't, in part,
one that is self-created by speaking of data.
Because if you go back to history,
and one will talk about visualizing sources
which come from different standpoints,
different institutions, and so on,
that would always be in our mind.
If one talks about data, it has this unambiguous, almost
positive, positivist connotation,
which one then needs to question in a second step?
But why do you speak about data and not
sources and this connection?
Good point.
Again, it's all about perspective.
So I stress data because I think that the way that I work
with the historical material that I call my sources
is different to what other people do when
they say I read a source because I change the object.
If you think about the medieval manuscript that I showed you,
the core protocol, taking a picture
is already a representation and a move into another medium
because I don't have the parchment.
I don't see the nicks and nacks and all the things
on the margins and so on.
But if you turn this into a TXT file
and then use natural language processing methods where
you don't really see a sequential text anymore,
but you have this bag of words approach
where all links and contexts are deleted
and you use tokens as data points,
I think that your research object has really changed.
It's not a charter or a court record anymore.
It was derived from that originally.
So that's why I always say it is important that we clarify
for us as historians that there's a difference
between source and data.
And I think this is also a discussion that I learned
from research data management, that it's really
important to think about there's a broad understanding
of research data where you say the source is
part of the research data.
And there's a narrow one where you
say it becomes research data once I do something with it.
So those would be the two main reasons.
I think it's a different object.
And it is a different object because I changed it.
So maybe capta would be a better phrase in general,
but that's why I would always say this is data
and it's not just a source.
But you could go back, or maybe you
need to go back and say it's not the original source,
but I can treat it as if it were some kind of source
because I have to be critical.
So I have to know who did it, what is the author,
all the questions that we usually ask
when we use the material from archives or archival research.
More questions?
I might take the chance to ask a question myself.
So great talk.
Thank you.
I was really intrigued by the--
I mean, it's kind of trivial, I guess.
But I mean, at one point, you said--
so it was about using data to tell stories, right?
And when historians use data, they
tell a story from a certain perspective.
And then you switch back to the interactive visualizations
and made the point that these visualizations
can change the perspective in an interactive way.
And I mean, it's not just in digital humanities
interactive visualizations.
It's basically the whole toolbox we
have in digital humanities, right?
We have many ways to change parameters
in topic models or styleometry.
And I mean, in fact, I think we've
seen a lot of abuse of these methods
because very often people change these parameters in a black box
until they get a perspective they like.
And it's very often very random because they
change the perspective, and they manipulate parameters.
But they don't really understand what they changed
and how a specific distance measure influences
the visualization.
And I was just wondering--
so you mentioned how important in history studies
source criticism is.
And I just wonder what you think about the importance of also
being critical about tools and knowing
how switching certain parameters and changing certain, well,
variables influences the actual perspective on the data.
Yeah.
I think it's really eminently critical
that you also learn about tool criticism and think about--
because again-- and this is so weird.
We've been having a conversation earlier today
that we read articles from 10 years ago.
And we have the impression that we talk about the same problems
even though we have all this time past.
And I think that--
I have lots of conversations with students
but also with other researchers about--
and they have this expectation that data sets--
anything that has to do with a computer
is more neutral than if they would read it and write
it themselves or if they could see the person that
is behind the tool, which I think
is one layer why tool criticism is so important.
Because of course, even creating the tool
and manipulating the tool in the way that you just said
is always intentional.
And it has a certain perspective.
So this is really, really important,
which is why in teaching and also in other contexts,
I really like to start exploring with something
like buoyant tools, which is a web application
that everybody can use.
It's an out-of-the-box tool that gives you
some understanding of how these quantitative methods
and visualizations work.
But it doesn't give you enough--
like if you're interested, if you're hooked,
when you all see these beautiful visualizations,
doesn't give you the chance to then go back into and do
all the proper manipulation that you would actually
need to get what you're interested in.
But I think the process and understanding
that these tools just give you something that is a very
particular something and that you
have to understand how this comes into being
and then go back and learn about, OK, how does this work?
And again, for example, in research papers--
and I think we should talk a lot about also data publications,
not just sort of the usual research paper
that historians write, but make it more about the data set,
make it more about the tools, and be more transparent
about the process and document, the several steps that we did.
Like, for example, my little collection of network graphs.
I could have just hidden the one that didn't have a case,
but then I wouldn't have seen something
that really helped me understand the material that these network
graphs were coming from.
So again, we have to have an understanding of the tools,
and we have to be critical about the way
that they were designed because some or most of the methods
were--
the trick is none of them--
I mean, maybe this is polemical, but almost none of them
were designed for historical research.
So we take something that was designed
for a very different purpose, and we
think that we can now use this and apply it
to historical circumstances, contexts, documents, corpora,
and text, and so on.
And that's just too easy because it doesn't work.
So yeah, I think that having an unclear understanding,
critical understanding of tools, data sets, and sources
is eminently important.
Thank you.
More questions.
Daniel?
Yeah, thank you very much for your inspiring talk.
I've got a half question, half comment.
I would like to ask you to comment on my comment.
So you truly criticized or expressed
the importance of data literacy and data criticism, which
is to no offense, which is to some degree,
it's trivial because everyone who works with data
at some point gets to the point to discover that.
But my idea and my thought is data modeling
allows contradictory statements, allows multi-layered analysis.
We've got the tools and the ways of how to model data,
they are there already.
But I don't have an overview of all the digital history studies.
I'm not a historian myself, so I don't
have an idea, but what I know is that it's possible to--
if you annotate in historical text, for instance, as you do,
and of course, there is always a step of interpretation.
But I would argue there's not an infinite numbers
how to integrate that, but more than one at least.
You could do that, it's much more effort,
but one could do that and compare
the different interpretations and the different annotations
and to even model contradictory narratives,
contradictory statements, analysis.
And I'm wondering why a part of--
that's much more effort to do, but that's the chance
we have to write a new history.
And this kind of new history will not
be like a completely new history,
telling completely new things, but comparing
different perspectives, being more multi-dimensional.
And I'm just wondering why are we still on the step
before of that level?
I mean, people do.
And as Manuel Borchardt mentioned before,
people tend to use, let's say, the default parameters.
For example, you say, OK, people just bend things as they wish.
And I would argue other people just
use the default parameters, not being aware of what
would change the output, how the output would change.
So yeah, it was more like a comment than a question,
but maybe you could comment on this.
I can.
You can ask a question.
I can ask a question, no.
I totally agree with what you described as what is possible,
including the statement that it takes a long time
and it's a group effort, really, most of the time.
And I think this is where--
so I totally agree with you.
And in a perfect world, I would always
try to create annotation schemes or sort of data
sets the way that you said, not only matching and making it
all look the same, like, for example,
the figuring out the past book kind of, in my view,
try to do, how do we describe societies today?
Let's take the same thing, describe past societies.
And then it gets shaky, right?
So you could integrate this, and you should integrate this.
But I'm a digital history professor
at a very normal German history department.
I do have a large group, because I've got lots of projects.
But I think that I still make this argument
towards other historians and also towards institutional
contacts to make sure that we need the resources to use
those possibilities.
And if you'd like, for example, the--
I'm going to just show you the video, and now it doesn't work.
But basically, this project kind of came from this idea
to try to design not only another network graph,
but try to design a multi-perspective,
multilayered environment in which I
can interact with data points to paint
a picture of possibilities.
This took forever.
It was very expensive.
It is running.
So visit our web page.
You can actually use this on the screen.
Visit Bielefeld, and you can actually use this with VR
headsets and so on.
So long story short, I totally agree with you.
And I think that the reason why I still start at the point
where I'm starting is because I direct this
towards my own discipline to be more explorative and take
the opportunities that are there more openly
and give us the resources to do so.
All right, thank you.
I think we might have time for a very short last question.
OK, in a room full of historians, no short question.
Oh.
[LAUGHTER]
Read this short question, I'll be back to you.
OK, yeah, getting back to what Julius asked about the--
yeah, what is data?
What are the sources?
And I found it interesting what you said.
So data is what you create yourself, more or less.
And the other things are there also.
And I don't know because the background is that also in our
SFP, we discussed at length what are the data, actually,
that we're supposed to manage and so on,
and how is it different from the sources and material.
And I think what the perspective or the definition
you provided now gives then the impression.
But of course, also the sources are not just there.
Someone did something to them before, put them, selected
them, and so on.
And at the same time, if you want to do that,
put them, and so on, and at the same time,
if you work with the data from others, it's also--
then we call it data, but also someone else created it.
So I'm not sure if I--
yeah, may I also find your distinction a bit difficult?
I understand it, but--
and of course, it's not just a question about denomination
and how do you define things.
But then also to get to the point,
what is different now with the digital is--
I don't know, talking also in the NFT for memory,
how is it different with the digital source criticism now?
How is it different from the traditional source criticism?
So I don't know.
It's also not really a question, maybe a comment.
Yeah, maybe you want to comment.
Like to your last point, what are the differences?
I think this is really not trivial.
Because I think that in the world that we live in
and with the tools that we have at our disposal,
lots of things can be called digital humanities
or digital history already.
If you read something, print it out, or like a book,
and you use your pencil to make notes,
or you use this on a tablet and you use a pencil to make notes,
that's not really a real difference, right?
So you shift the medium, OK, check.
And there are studies that reading digitally
is different to reading books, all that taken into account.
But it doesn't really change our practices so much,
because we still read and annotate.
And I think it is a really--
and I'm not answering the question,
but I think it's really important
to stick with that question and see which kinds of practices
do we just transfer into the digital,
and where does the digital actually
transform the practice?
And I think this has not been sufficiently answered.
And I think this is also why it's not only the digital,
but it's the specific computational methods.
It's sort of the opening of possibility spaces like VR
or whatever you have, where you can really
explore the transformations and see, OK,
what is it that we could not have done before?
And I also don't like that the argument often
is that I can now have 4,000 sources instead of 100.
Yeah, OK, so that's scaling.
Fine, but is that a transformation?
Does that really change the way that we think about stories?
And that was sort of part of the argument
that I wanted to make, that I think
that we have the potential to actually think
about stories in a different way,
and also maybe use this along the lines of digital history
to also teach people more about perspectives,
use these kinds of methods not only for our own research,
but use it to teach people about history and the perspectiveness
of an historical argument.
And I think the source and the data thing, yeah,
maybe data sets should then be treated as sources.
And we need sort of rules for source criticism for data sets.
In this regard, data is also a source.
But the main argument that I wanted
to make towards also Julius's question
is that, for me, it's too easy.
If I ask people with what kinds of data do they work,
and they sort of show me a picture of a medieval
manuscript, then I need something
to start a conversation, right?
So I think it's important to make
the differentiation between a manuscript and the data set,
but both can be sources.
Maybe that's more correct to sort of have a look at this.
All right.
I think that's the end of the official discussion part.
Thank you so much, Silke, for this talk.
Thank you for your question.
[APPLAUSE]
[MUSIC PLAYING]
[MUSIC PLAYING]
(bright music)
♪ ♪ ♪