|
Named Entity Recognition Task
In the Named Entity Recognition task, systems are required to recognize
the Named Entities occurring in a text.
In particular, the task will focus on the following types of Named Entities:
Person (PER), Organization (ORG), Location (LOC) and Geo-Political Entities (GPE) -
(see the annotation report for more details).
The evaluation will be based on the
Italian Content
Annotation Bank (I-CAB) Version 4.1, an annotated corpus developed in the context of
the Ontotext Project.
I-CAB 4.1 is annotated with Named Entities in
the IOB format (where "B-begin" and "I-inside" denote the tokens
belonging to Named Entities and "O-outside" is used for all other tokens).
Upon accepting the agreement terms for a free licence, participants will be provided
with development data (the development part of I-CAB 4.1, i.e. 335 news stories,
for a total of 113,000 words).
The test data on which the official evaluation will be performed consist of the test
part of I-CAB 1.4 (i.e. 190 news stories, for a total of 69,000 words).
All the data we provide will also be annotated with Part of Speech information
using the Elsnet tagset for Italian.
Organizer:
Manuela Speranza (FBK-irst, Trento, Italy - manspera itc.it)
Data Distribution
Detailed guidelines
Trial examples (input sample and output sample)
Download the CONLL 2002 Scorer from the CONLL website
|