skip to main content
10.1145/3078081acmotherconferencesBook PagePublication PagesdatechConference Proceedingsconference-collections
DATeCH2017: Proceedings of the 2nd International Conference on Digital Access to Textual Cultural Heritage
ACM2017 Proceeding
Publisher:
  • Association for Computing Machinery
  • New York
  • NY
  • United States
Conference:
DATeCH2017: 2nd International Conference on Digital Access to Textual Cultural Heritage Göttingen Germany June 1 - 2, 2017
ISBN:
978-1-4503-5265-9
Published:
01 June 2017

Bibliometrics
Abstract

No abstract available.

Skip Table Of Content Section
SESSION: Transcription
research-article
Enabling Annotation of Historical Corpora in an Asynchronous Collaborative Environment

Current research in Corpus Linguistics and related disciplines within the multi-disciplinary field of Digital Humanities, involves computer-aided manual processing of large text corpora. Typically, corpus instances are retrieved with the help of ...

research-article
Allegro: User-centered Design of a Tool for the Crowdsourced Transcription of Handwritten Music Scores

In this paper, we describe the challenge of transcribing a large corpus of handwritten music scores. We conducted an evaluation study of three existing optical music recognition (OMR) tools. The evaluation results indicate that OMR approaches do not ...

research-article
SESSION: Natural Language Processing
research-article
The RetroC challenge: how to guess the publication year of a text?

This article describes research in automatic content-based temporal classification of texts. Experiments are carried out on a set of texts coming from Polish digital libraries, dating between the years 1814 and 2013. Following successful research in the ...

research-article
Parsing Romanian Specialized Dictionaries Structured in Nests

This paper presents a tool for processing dictionaries in Word format and for obtaining the XML format which can be used in various applications. DEPAR (Dictionary Entry Parser) permits the introduction of a specific set of rules to describe the ...

research-article
Analysis of Part-Of-Speech Tagging of Historical German Texts

The amount of data in contemporary digital corpora is too large to be processed manually, which increases the necessity for computer linguistic tools in humanities. However, the processing of natural languages is a challenge for automatic tools, because ...

research-article
Dependency Parsing on Late-18th-Century German Aesthetic Writings: A Preliminary Inquiry into Schiller and F. Schlegel

Data-driven syntactic parsers are usually trained, tested and developed on web-news data. Little has been done to evaluate them on literary genres of different ages, which are still low-resource varieties in terms of syntactic annotation. In this paper, ...

research-article
In search of Poetic Rhythm: Poetry retrieval through text and metre

In this paper a search service developed for the exploitation of a TEI-based Spanish poetry corpus is presented. Besides a textual retrieval, the search service takes advantage of the metrical annotation to retrieve verses and poems with specific ...

SESSION: OCR & Postprocessing
research-article
Profiling of OCR'ed Historical Texts Revisited

In the absence of ground truth it is not possible to automatically determine the exact spectrum and occurrences of OCR errors in an OCR'ed text. Yet, for interactive postcorrection of OCR'ed historical printings it is extremely useful to have a ...

research-article
Clear-cut methodology for Arabic OCR and post-correction with low technical skilled annotators

This paper describes an efficient and straightforward methodology for OCR-ing and post-correcting Arabic text material on Islamic embryology collected for the COBHUNI project. As the target texts of the project include diverse diachronic stages of the ...

research-article
Poor Man's OCR Post-Correction: Unsupervised Recognition of Variant Spelling Applied to a Multilingual Document Collection

The accuracy of Optical Character Recognition (OCR) is sets the limit for the success of subsequent applications used in text analyzing pipeline. Recent models of OCR postprocessing significantly improve the quality of OCR-generated text but require ...

research-article
OCR of a Mixed Corpus: Early Printings and Manuscripts of Martianus Capella

This paper deals with the application of the digitization methods designed by LMU CIS team and with the encoding of the data obtained with the aim of building an edition of a Latin author based on the first printed editions and on two manuscripts of the ...

SESSION: Natural Language Processing on Latin and Greek
research-article
The Impact of Unassimilated Loanwords on the Latin Lexicon. A Qualitative and Quantitative Analysis

The recent enhancement of the morphological analyser for Latin Lemlat with a large Onomasticon enables us to analyse both the morphology and the distribution of loanwords in the Latin lexicon. In this paper, first we describe the categories of proper ...

research-article
Open Access
A Memory-Based Lemmatizer for Ancient Greek

In this paper we present the lemmatizer that we developed for Ancient Greek: GLEM. As far as we know, GLEM is the first publicly available lemmatizer for Ancient Greek that uses POS information to disambiguate and that also assigns output to unseen ...

research-article
Implementation of a Latin Grammar in Grammatical Framework

In this paper we present work in developing a computerized grammar for the Latin language.

It demonstrates the principles and challenges in developing a grammar for a natural language in a modern grammar formalism.

The grammar presented here provides a ...

research-article
Node Formation: Using Networks to Inspect Productivity in Affixal Derivation in Classical Latin

This paper investigates the distribution of word formation data through network visualisation, as an entry point for the exploration / analysis of productivity in affixal derivation in Classical Latin. This study uses data from the Word Formation Latin ...

SESSION: Infrastructure & Linked Open Data
research-article
Towards an extensible measurement of metadata quality

This paper describes the structure of an extensible metadata quality assessment framework, which supports multiple metadata schemas, and is flexible enough to work with new schemas. The software has to be scalable to be able to process huge amount of ...

research-article
Converting Latin Treebank Data into an SQL Database for Query Purposes

This paper describes how to turn a Latin dependency treebank into queryable information so that it can be browsed online using a tree query engine and its web interface. The annotation layers of the treebank are first introduced, then the query system ...

research-article
Porting past Classification Schemes for Narratives to a Linked Data Framework

In this paper we give an overview on a number of achieved and on-going efforts dealing with porting to the Linked Data framework electronic versions of past classification schemes in the field of folktale narratives. Three of those schemes are in the ...

research-article
A Software Pipeline for the Reception of Italian Literature in Nineteenth-Century England: Preliminary Testing

This paper presents and discusses a project design aimed at producing synthetic and intuitive visualizations of the reception of Italian literature in nineteenth-century England. In the first part, a processing pipeline is described which combines ...

SESSION: Digitisation & Layout Analysis
research-article
LAREX: A semi-automatic open-source Tool for Layout Analysis and Region Extraction on Early Printed Books

A semi-automatic open-source tool for layout analysis on early printed books is presented. LAREX uses a rule based connected components approach which is very fast, easily comprehensible for the user and allows an intuitive manual correction if ...

research-article
Digitization of Old Romanian Texts Printed in the Cyrillic Script

The paper discusses recognition of Romanian texts of the 17th--20th centuries printed in the Cyrillic script, and their conversion to the modern Latin script.

The elaborated technology and a tool pack include historical alphabets, sets of recognition ...

research-article
Unearthing the Recent Past: Digitising and Understanding Statistical Information from Census Tables

Censuses comprise a wealth of information at a large (national) scale that allow governments (who commission them) and the public to have a detailed snapshot of how people live (geographical distribution and characteristics). In addition to underpinning ...

research-article
Case Study of a highly automated Layout Analysis and OCR of an incunabulum: 'Der Heiligen Leben' (1488)

This paper provides the first thorough documentation of a high quality digitization process applied to an early printed book from the incunabulum period (1450-1500). The entire OCR related workflow including preprocessing, layout analysis and text ...

SESSION: Spatial Analysis
research-article
Open Access
The Ancient Graffiti Project: Geo-Spatial Visualization and Search Tools for Ancient Handwritten Inscriptions

This paper discusses how the Ancient Graffiti Project publishes the digital content of ancient epigraphic material and makes handwritten inscriptions from the first century AD more accessible through the use of geo-referenced, spatial interfaces, ...

research-article
Semantic Enrichment on Cultural Heritage collections: A case study using geographic information

Cultural heritage institutions have recently started to explore the added value of sharing their data, using Linked Open Data to integrate and enrich metadata of their collections. The catalogue of the Biblioteca Virtual Miguel de Cervantes contains ...

research-article
Toponym disambiguation in historical documents using semantic and geographic features

Historians are often interested in the locations mentioned in digitized collections. However, place names are highly ambiguous and may change over time, which makes it especially hard to automatically ground mentions of places in historical texts to ...

research-article
Names, Right or Wrong: Named Entities in an OCRed Historical Finnish Newspaper Collection

Named Entity Recognition (NER), search, classification and tagging of names and name like frequent informational elements in texts, has become a standard information extraction procedure for textual data. NER has been applied to many types of texts and ...

Index terms have been assigned to the content through auto-classification.

Recommendations

Acceptance Rates

DATeCH2017 Paper Acceptance Rate29of37submissions,78%Overall Acceptance Rate60of86submissions,70%
YearSubmittedAcceptedRate
DATeCH2017372978%
DATeCH '14493163%
Overall866070%