On Measuring Psalm Similarity: A Case for Word-Level n-Grams


The article offers a comparison between Tesserae (a text-reuse detection tool) and cosine similarity (used here as a measure of similarity between texts) and assesses their applicability to tracking textual affinities of different versions of historical texts on the basis of Early Modern English versions of Psalm 6 found in publications printed between 1530 and 1557. It is shown that cosine similarity is a better tool for the task of identifying and measuring the level of similarity between texts. At the same time, the article argues that cosine similarity measurements should be performed on texts represented as feature vectors consisting of n-grams.


digital humanities, cosine similarity, n-grams, Tesserae, Psalm translations





Download files


Altmetric indicators

Cited by / Share

Roczniki Humanistyczne · ISSN 0035-7707 | eISSN 2544-5200 | DOI: 10.18290/rh
© The Learned Society of the John Paul II Catholic University of Lublin & The John Paul II Catholic University of Lublin, Faculty of Humanities

Articles are licensed under a Creative Commons  Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)