Awesome Lorentz list of resources#

A curated list of resources mentioned during the Lorentz workshop. Resources are organised by type (datasets, models, tools) and possibly also by the processing task they help to perform. (If you are not familiar with the format, do check out existing awesome lists on GitHub).

Contents

Resources for evaluation

Models

Tools

Applications

Authority Lists

Resources for evaluation {#resources-for-evaluation}#

This includes datasets (e.g. for benchmarking), annotation guidelines, shared tasks, etc.

Models {#models}#

Document processing (OCR, page layout analysis, etc.)#

Model overviews

Tools {#tools}#

Annotation#

  • INCEpTION – A semantic annotation platform suitable for various types of textual annotations (NER, EL, etc.).

  • Recogito Studio - An Extensible Platform for Collaborative, Standards-Based Annotation of TEI Text, IIIF Images, and PDFs, including geotagging and reconciliation with different gazetteers (WHG, Pleiades, Wikidata, etc.).

  • Immarkus - open-source tool for semantic image annotation

  • Image Positions – Image Annotation platform inside the Wikidata environment

  • FairCopy - tool for reading, transcribing, and encoding text with custom annotations

  • CATma - mark-up and analysis tool

  • Prodi.gy - annotation tool for SpaCy (not open-source)

  • Liiive - Real-time collaborative viewing & annotation for IIIF image collections

See also:
ATRIUM T4.5.2 Annotation tools overview.xlsx

Named Entity Recognition#

  • GATE geotagger — This service identifies geographical named entities and disambiguates them against GeoNames. The service currently makes use of the Mordecai3 geoparser; more details on Mordecai3 can be found in this paper.

  • GATE Pleiades NER — This service identifies geographical named entities and disambiguates them against the Pleiades dataset. The approach taken is to use all the names from each entry in Pleiades (that contains a representative point) to build a simple gazetteer. Locations which are ambiguous (i.e. those where multiple lookups overlap) are disambiguated using a geometrical approach. We assume that, in a similar way to word sense disambiguation, a document is likely to be discussing a single area, and so we choose the set of locations which minimise the area covered by the set of selected points; this is currently done by calculating axis aligned bounding boxes for efficiency purposes.

#

Entity Linking & Reconciliation#

  • Spacyfishing – A spaCy Python wrapper for the entity-fishing tool for entity linking against Wikidata.

  • OpenRefine – open source tool to manipulate datasets, including semi-automatic entity linking and variant clustering.

  • TagMe – a tool to identify short phrases or entities and match them against Wikipedia pages.

  • Ariadne Services for Entity Linking and Disambiguation – …

Applications {#applications}#

  • britishlibrary/peripleo - a browser-based tool for the mapping of things related to place.

  • Vistorian - online environment to visualize spatial and networked data.

Formats#

  • LinkedPasts/linked-places-format - Linked Places format is used to describe attestations of places in a standard way, primarily for linking gazetteer datasets.

  • LinkedPasts/linked-traces-format - Patterns based on the W3C Web Annotation Model, primarily for use in linking resources describing historical phenomena with the places relevant to them.

#

Authority Lists (for Reconciliation) {#authority-lists-(for-reconciliation)}#

https://docs.google.com/spreadsheets/d/1AoGTRArfx7EodTrU8GPdCWJDqqt0fNMg3gcSui2LCxk/edit?gid=0#gid=0