Topic Modeling

Two questions stand out to me as I consider our readings and discussion of topic modeling. First, how effective is it for historians? And second, in what contexts is it useful? Our in-class experiment with topic modeling sparked these questions for me. When I ran the topic modeling tool on the presidential inaugural addresses, one topic stood out to me: congress business labor legislation tariff south question policy race law secure proper trade american work consideration laws make executive currency. While I can sense the general topic surrounding finance, business, and labor, what unifies all of these terms remains somewhat elusive. This topic group is certainly not as clear and useful as the topics that Cameron Blevins identified running MALLET on Martha Ballard’s Diary.1 In fact, none of the topics I generated were as useful as those identified by Blevins. Perhaps this can be attributed to my lack of expertise in using topic modeling. Or perhaps this is indicative of a larger issue within topic modeling in general: scale.

Robert K. Nelson’s Mining the Dispatch project celebrated topic modeling’s ability to “detect patterns within not a sampling but in the entirety of an archive.”2 Similarly, Ted Underwood suggests that topic modeling is likely to “becomes more useful as we move toward a scale that is too large to fit into human memory.”3 After our in-class experiment, I suspect Nelson and Underwood are correct. While there are over fifty inaugural addresses, this is still a relatively small sample of documents. If we were able to do expand the base of presidential addresses, perhaps we would see more unified and potentially useful topics emerge.

This raises the question: in which contexts can topic modeling be most useful for historians? In other words, how can historians best use it? Nelson’s Mining the Dispatch is an excellent example of how historians can best use topic modeling. In this project, Nelson models the Richmond Daily Dispatch from 1860 to 1865 in hopes of better understanding daily life in the city during the Civil War. To represent his findings, Nelson used two types of graphs: one “of the relative space that a topic occupied in the paper over time” and another which “count[s] the number of articles or advertisements where the proportion for a specified topic is above a threshold you specify.”4 This methodology centers around identifiable, unified topics displayed on graphs, much like Blevins’ project. In addition, both celebrate the usefulness of topic modeling for its ability to “reveal patterns that we can’t.”5 Despite the usefulness of this methodology, I do wonder if historians could learn from literary scholars as well. Underwood suggests that historians have tended to look for clarity and unity in topic modeling while literary scholars find “ambiguous” topics more useful.6 Perhaps such methodologies could also be used by historians to get at some of the less tangible facets of our sources.

A challenge to topic modeling is the issue of critique and peer review. Underwood points out that because of the necessity of entering stop words into the program, “The resulting model ends up being tailored in difficult-to-explain ways by a researcher’s preferences.”7 This means that it can be hard for fellow historians to critique projects relying on topic modeling. In his article on this issue, digital historian Frederick W. Gibbs calls this “methodological opaqueness.”8 Gibbs argues that data and its representations should be analyzed and criticized in the same way that traditional scholarship is criticized. For much of the digital humanities scholarship out there, this should be possible without too much trouble. However, as Underwood has pointed out, it is challenging to clarify the multitude of decisions that went into training the software for topic modeling. While this issue does not negate the usefulness of topic modeling, it does raise further questions for those who use it.

  1. Blevins, Cameron. “Topic Modeling Martha Ballard’s Diary.” Cameron Blevins (blog), April 1, 2010. ↩︎

  2. Nelson, Robert K. “Mining the Dispatch.” Mining the Dispatch, November 2020. ↩︎

  3. Underwood, Ted. “Topic Modeling Made Just Simple Enough.” The Stone and the Shell (blog), April 7, 2012. ↩︎

  4. Nelson, Robert K. “Mining the Dispatch.” ↩︎

  5. Nelson, Robert K. “Mining the Dispatch.” ↩︎

  6. Underwood, Ted. “Topic Modeling Made Just Simple Enough.” ↩︎

  7. Underwood, Ted. “Topic Modeling Made Just Simple Enough.” ↩︎

  8. Gibbs, Frederick W. “New Forms of History: Critiquing Data and Its Representations.” The American Historian, 2016. ↩︎

Leave a comment

Your email address will not be published. Required fields are marked *