File tree Expand file tree Collapse file tree
Expand file tree Collapse file tree Original file line number Diff line number Diff line change 1+ """
2+ A Czech-specific block to fix lemmas, UPOS and morphological features in UD.
3+ It should increase consistency across the Czech treebanks. It focuses on
4+ individual closed-class verbs (such as the auxiliary "být") or on entire classes
5+ of words (e.g. whether or not nouns should have the Polarity feature). It was
6+ created as part of the Hičkok project (while importing nineteenth-century Czech
7+ data) but it should be applicable on any other Czech treebank.
8+ """
9+ import udapi .block .ud .fixmorpho
10+ import re
11+
12+ class FixMorpho (udapi .block ):
13+
14+ def process_node (self , node ):
15+ # In Czech UD, "být" is always tagged as AUX and never as VERB, regardless
16+ # of the fact that it can participate in purely existential constructions
17+ # where it no longer acts as a copula. Czech tagsets typically do not
18+ # distinguish AUX from VERB, which means that converted data may have to
19+ # be fixed.
20+ if node .upos == 'VERB' and node .lemma == 'být' :
21+ node .upos = 'AUX'
You can’t perform that action at this time.
0 commit comments