Digitization and Data

The readings this week involved issues in digital history data such as using digital citations, amount vs. quality of data, and some of the benefits and drawbacks of using online date research in contrast to analog research.

In “A Culture of non-citation: Assessing the digital impact of British History Online and the Early English Books Online Text Creation Partnership,” Jonathan Blaney and Judith Siefring talk about issue with digital citations. I have never really thought about this except to know that I should cite sources and I don’t have a problem citing digital sources although, I have had to rewrite bibliographies when I checked links later that no longer worked. I didn’t realize the other implications of not citing digital references like referencing related text sources instead of digital ones. Citing a print reference, even if the print reference was not used, could send people to an outdated source. Blaney and Siefring give an example of the word ‘hubris’ and how the online definition in Wikipedia is much better than the version in the last printed Oxford English Dictionary. Referencing the printed edition of the dictionary, just because it is in print, would send someone to inferior or outdated information. It also makes sense that the lack of digital citations doesn’t tell anyone how much digital sources are being used for scholarly works. In my own work, I have cited digital sources of printed works of

Christof Schöch wrote “Big? Smart? Clean? Messy? Data in the Humanities,” in which he describes the differences between big data and smart data and how they can each be used. They are both pretty much as they sound. Big data is very large amounts of data but in an unstructured format meaning your searches on it will likely come from the data content, not metadata or other structured fields. Without any prior categorization or structure you have to make sense out of raw data in other ways. Smart data is smaller amounts of data that is more structured with fields, categories and metadata to tell you about the data before you even know what the content is. It is more time consuming to make this kind of data because it takes manual labor. Schöch naturally writes that we should have more smart data to bridge the gap between the two. This would be great if there were not such massive amounts of labor involved. Using tools like databases, schemas and controlled vocabularies can help do some of this work in larger batches. Language processing tools and better optical character recognition would help as well.

Lara Putnam wrote an interesting article called, “The Transnational and the Text Searchable: Digitized Sources and the Shadows They Cast”. She had a few points that caught my attention. One was that historians can now find data without knowing where to look. This relates to her experience of research being tied to geography, which required at least knowing where to start geographically and then having to decide if it was worth it to follow up on other leads. She also cautions against the ability to do too much micro-analysis in digital history without stepping back to look at the big picture. With keyword searching, there is the the temptation to cherry-pick facts that can be assembled into any theory you want but may not be accurate without the study of a broader view and context. That broader context may not be as evident digitally as it is when looking at analog documents in a certain place. However, she points out that removing geography from research can bring up previously unknown connections that you might not find when you are restricted by geography. I think that there is no way right now, you could replace in-person research with digital-only research just for the simple fact that not everything is digitized and even if it were, you would probably lose some contextual information as well as side-glancing leads you might find. I think they can work hand-in-hand extremely well though. In cases where only digital research is possible or when it is known that all available parts of a collection have been digitized and are accessible, the researcher should always ask themselves what else could they be missing?.

The technical exercises this week gave us a chance to see now metadata can describe records in a tool like Tropy and to experiment with a database manager like OpenRefine that can be used to clean up data and find info using the data tools.

Image credit: Dvortygirl, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons

5 Comments

James Beveridge

September 5, 2021 at 6:57 pm

I recall doing the same with citing online sources as well in previous courses. There was always the emphasis on making sure to cite when the source was retrieved and when it was originally published. Print sources as well I recall was always important to cite regarding what edition it was if there was more than one. I guess the question is how often would we need to refer back to our works to see if our sources are still good or not. If we use digital sources, would there possibly be a way where updating them is automated?

May

September 5, 2021 at 9:56 pm

I feel like I have no gauge on your opinions of these articles though. Like, Putnam advanced a lot of different opinions and claims for where the historical field should go. What do you think about any of them?

admin

September 5, 2021 at 11:58 pm

Hi May,

Yes, you are right. I edited my post a little bit but I will try to do better with opinions. Thanks!

Timmia King

September 7, 2021 at 9:59 pm

I think you summarized the topics we covered very well. I agree with you in regards to online citations, I never really thought of the the implications of using online citations vs. print materials. I personally do not remember avoiding citing online materials because of notions of credibility or the others mentioned in the article, but perhaps because of ease in citing in one format vs the other. I think that might have been due to my perceived notion of the documentation available explaining how to cite these objects in one format over the other. But you life and you learn. I do wonder how and if this topic will be addressed in secondary education and beyond and if that is the way forward to begin to address the silence on this topic or if it is not as much as an issue with younger generations where the norm might be digital more-so than print.

Margaret Bisch-Markowitz

September 7, 2021 at 10:21 pm

Julie, I enjoyed reading your post. I have also been repeatedly confused about when to cite the digital source and when to cite the original artifact. In the past, the bias has been In favor of citing the document itself as a more “serious” approach. However, I think that as time goes by and more and more information is born-digital, and as the next generation of scholars enter the field, there will be greater acceptance within academia of on-line scholarship. This will lead to greater internet source citation.

You may also like...

Accessibility, Interfaces, and Data Visualization

Digital Exhibits, Copyright, and Open Access

Digital Art History

Digital Sustainability and Preservation

5 Comments

Leave a Reply Cancel reply