The article presents the concept and outcomes of the initial phase of the grant project entitled “Third-level digitisation of large 17th- and 18th-century dictionaries: The development of the Database of Historical Polish Lexicons”, carried out at the Department of the History of the 17th and 18th Century Polish, Institute of the Polish Language, PAS (NPRH 2024–2029). The first stage of the project focused on the digital edition of three dictionaries: Knapiusz’s Thesaurus (1643, 2nd edition), Troc’s Nowy dykcjonarz (1764, vol. III), and Ernesti’s Forytarz (1674), which will form the foundation of the Database. These lexicons are notable for their original scholarly methods and constitute a rich source of linguistic data. Their preparation required the integration of philological and computational methods: converting scans into machine-readable text, analyzing the microstructure of lexical entries, selecting appropriate TEI tags, structural markup, training OCR models, and developing rules for automatic TEI-based annotation to enable advanced search and comparative analysis within the Database. The projects aims to establish standards for incorporating further historical dictionaries into it and to develop a robust search engine. The article discusses individual tasks and difficulties in their implementation, and outlines the planned stages of work.
Cited by / Share
Licence

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Roczniki Humanistyczne · ISSN 0035-7707 | eISSN 2544-5200 | DOI: 10.18290/rh
© The Learned Society of the John Paul II Catholic University of Lublin & The John Paul II Catholic University of Lublin, Faculty of Humanities
Articles are licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)