On Measuring Psalm Similarity: A Case for Word-Level n-Grams

Abstract

The article offers a comparison between Tesserae (a text-reuse detection tool) and cosine similarity (used here as a measure of similarity between texts) and assesses their applicability to tracking textual affinities of different versions of historical texts on the basis of Early Modern English versions of Psalm 6 found in publications printed between 1530 and 1557. It is shown that cosine similarity is a better tool for the task of identifying and measuring the level of similarity between texts. At the same time, the article argues that cosine similarity measurements should be performed on texts represented as feature vectors consisting of n-grams.

Keywords:

digital humanities, cosine similarity, n-grams, Tesserae, Psalm translations



Details

References

Statistics

Authors

Download files

pdf

Altmetric indicators


Cited by / Share


Roczniki Humanistyczne · ISSN 0035-7707 | eISSN 2544-5200 | DOI: 10.18290/rh
© The Learned Society of the John Paul II Catholic University of Lublin & The John Paul II Catholic University of Lublin, Faculty of Humanities

Articles are licensed under a Creative Commons  Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)