The Journal of Law and Society in Context: Text-Linguistic Analysis

Christian Boulanger, Naomi Creutzfeldt and Jen Hendry

This post marks the second in a series of three blog posts that accompany our paper “The Journal of Law and Society in Context: A Bibliometric Analysis”, published in the Journal of Law and Society (vol. 51, issue 1, 2024). In these posts, we expand on the methodological aspects of our analysis to share visualisations that could not be included in the published article.

In each of the posts, we will explore three different types of analysis:

Text-linguistic analyses of journal content

In contrast to those analyses that use discrete metadata items (such as author, year, or journal name), as shown in the first blog post, text-linguistic analysis deals with unstructured textual data. This allows us to look for particular search terms to observe their frequency. For example, we can compare the occurrence of the word stem of feminist and feminism, that is, feminis, with that of Marxism and Marxist, i.e., marxis.

Fig. 1: Comparison of term frequency ‘feminis’ vs. ‘marxis’ (Source: JLS corpus, absolute numbers). The blue line represents a smoothed trend over time, the grey shaded area is the confidence interval of that trend.

As Fig. 1 shows, such an analysis indicates that, in terms of their respective socio-legal theoretical applications, there has been a decline in the mentions of Marxist approaches compared to an increase in feminist ones. We should be wary of overemphasising this trend, especially given the relatively small sample size. Also, term frequency cannot tell us about the actual significance of the intellectual tradition from which these terms originate. “Marxism” is a good example in this regard. Hardly any informed observer would dispute the importance of Marxist theory in the canon of socio-legal theory at any given moment in the observed period. Nevertheless, this way of empirically probing into word usage indicates an intriguing direction of travel for a periodical that has been, historically, consistently left-leaning in its political standpoint, while also speaking to the persistent appeal of historical materialism as a theoretical framework. More generally, this comparison can be taken as evidence not only of the proliferation of feminist approaches within socio-legal studies, but also the rise in feminist scholarship beyond traditionally ‘female-interest’ single issues or legal topics, notably family law and domestic violence. Future and more detailed analysis will be able to supplement these findings.

While this approach is very simple, more complex text-linguistic analyses are also possible. For example, unsupervised machine-learning algorithms can identify so-called ‘topics’ within a corpus of texts via a process known as “topic modelling” – an unsupervised machine learning approach that groups together words that appear together in the same document in a statistically significant way. Unsupervised in this context means that the algorithm is not provided with any domain-specific knowledge or instructions (such as a glossary of search terms); Küsters (2023), for example, recently used this method to identify schools of economic thought in the German economics yearbook ORDO.

We made use of this technique to produce the topics in Table 1 of our published article. They were computed using BERTopic, an advanced algorithm that is based on so-called “word embedding”. In contrast to previous approaches that treated words simply as character strings without regard to their meaning, this technology represents an early form of algorithms being able to extract different uses of words based on the context in which they are used, producing results that are much more plausible. Table 1 is the result of running the algorithm over the complete JLS corpus; a more visual representation of the same data can be found in the interactive version of Fig. 2.

Fig. 2: Topics computed with the BERTopic algorithm (Source: JLS corpus)

Most of the generated topics here are unsurprising, i.e. they match what knowledgeable observers would expect to find. While lacking novelty, this is a positive finding – the algorithm seems to be able to reproduce existing knowledge about the dataset. This means that it can be applied – with some degree of confidence – to other datasets, allowing comparative analyses. For example, we use this method to compare the JLS with the German Zeitschrift für Rechtssoziologie in terms of the themes that have been discussed over the course of the journals’ lifetimes.

This brings us to the question of a diachronic comparison, i.e. the development of themes over time. Topic modelling using the BERTopic algorithm can also be used to compute and plot topics over time, but we haven’t found a way to visualize the changes in any meaningful way; we hope that other researchers can pick up from here. Existing software solutions, such as the R Bibliometrix package can generate trend topics using, among other options, bigrams (two-word combinations) from the title (see Fig. 3). However, these solutions typically do not allow to easily understand or tweak the code that produces the visualization and are therefore not suitable for our purposes. Nevertheless, if we were to take Fig. 3 at face value, it would suggest a trend. The figure plots the 31 most-occurring bigrams together with the time span in which they occur, together with a visual representation of the total frequency of the bigram.

Fig. 3 Trend Topics based on title bigrams, generated with the Bibliographix package (Source:

On the basis of this visualisation, one could describe a trend from “local” topics (mental health, criminal law, social justice, social policy, family law, etc.) to more “international” topics (international law, human rights, transitional justice, legal pluralism, transnational private law) over time. Of course, and as can be expected, this leaves out many terms that do not fit into this simple progression. However, the observation would be the start rather than the result of an inquiry into topical development: it could be a hypothesis to be tested with other methods and data.

Concluding thoughts

The figures, codes, and data from our study are available to the public in an open-access GitHub repository. This type of analysis is still in its infancy, and we do not claim completeness; these contributions need to be understood as preliminary. Our aim is to provides the start of a conversation, rather than results set in stone. We expect and welcome critical feedback from both traditions: qualitative-hermeneutical history of ideas scholars as well as those who use quantitative methods from bibliometrics to pursue questions in the history of science.

We encourage readers to engage with findings that have emerged across our three analyses: the next post in this series focuses on network analyses of citation graphs computed from existing and self-generated data.