Heml

Heml.org, originally the 'Historical Event Markup and Linking' Project, is centered at the Dept. of Classics, Mount Allison University under the direction of Professor Bruce Robertson. It hosts his projects, including: OCR technologies for Classical languages; generalized visualization and markup tools for history; and web-based technologies for language learning and study.

Epub Conversion of Perseus Texts

Students in my "Digital Methods in the Study of Antiquity" course in 2011 worked on converting Perseus TEI texts into Epub documents. As a more difficult case, they concentrated on ancient Greek documents, and got them working at an alpha-level on an Android phone and Kobo reader.

Polytonic Greek OCR

Mount Allison's contribution to our Digging into Data challenge is a new approach to Greek OCR using Gamera and over twenty different classifiers. Quantitative results of running over 34,000 pages from 158 19th century editions can be found at the Dynamic Variorum Editions website.

Here are the summarizing pages for our best and worst results.

Classifiers and modified OCR code are available here at heml. heml.org also hosts early drafts of the documentation. Once stable, they will be moved to the permanent site. Additional observations posted on Google Plus.

Quercei: Easy Treebanking

Quercei is a Ruby on Rails application that provides a browser-based visual treebank composer.

Its introductory Video:

Text-Mining

Heml has become a member of the Atlantic Computational Excellence Network, whose computing facilities we are using in historical textmining experiments. We are starting with the full text of Wikipedia from which we hope to extract over 10,000 geolocated and temporally-defined events in order to test the tools developed in the Fawcett sub-project.

At present, a wiki page is keeping track of the code snippets and techniques we are using to divide and parse Wikipedia.

Heml-cocoon

heml-cocoon is a java web application built upon the Cocoon 2.1 framework. It transforms events marked up in an XML schema into timelines, maps and animated maps represented in SVG. Though it is no longer undergoing active development, its schema is the basis of the extensions to CIDOC-CRM in RDF Schema that define the data for Fawcett. Its SVN repository is still browsable.