Consider hyphenation during tokenization
We currently do not find "Hochwasserereignisse" when searching for "Hochwasser". To alleviate this, we could consider hyphenation to produce additional tokens from longer words with some limits to avoid useless stops, e.g. a minimum length for the source and target tokens. The implementation could be based on either hypher
or hyphenation
.
Edited by Adam Reichold