Regular expressions

Text fields in dlexDB

There are five categories of text fields in dlexDB:

All of these text fields can be queried via regular expressions. To do that, simply enter your query into the input field of the respective filter and mark it as a regular expression by enclosing it with two slashes.

Examples:

/gen/
Word must contain gen at any position; e.g., genug, irgendwo, morgen, gen
/^gen/
Word must start with gen; e.g., genug, gen. The special character ^ marks the beginning of a word.
/gen$/
Word must end with gen; e.g., morgen, gen. The special character $ marks the end of a word.

When querying a text field that is stored case-sensitively, the Ignore case checkbox also affects the interpretation of the characters in your regular expression (if checked).

Full regular expression syntax

dlexDB supports most of the so called extended regular expression syntax as described in Spencer, 2007. The most common operators are:

/gen/
Word must contain gen at any position; e.g., genug, irgendwo, morgen, gen
/^gen/
Word must start with gen; e.g., genug, gen. The special character ^ marks the beginning of a word.
/gen$/
Word must end with gen; e.g., morgen, gen. The special character $ marks the end of a word.
/Üb.*ung/
Word contains Üb, followed by any number - even zero - of arbitrary characters, followed by ung; e.g., Überlegung, Übung
/Üb.+ung/
Word contains Üb, followed by any number - but at least one - of arbitrary characters, followed by ung; e.g., Überlegung (but not: Übung)
/R.ck/
Word contains R, followed by exactly one arbitrary character, followed by ck; e.g., Reck, Rock, Ruck
/R[eo]ck/
Word contains R, followed by either e or o, followed by ck; e.g., Reck, Rock
/(Ober|Unter)ammergau/
finds Oberammergau and Unterammergau

Notes

Please note that there are a few multi-word types in dlexDB, in which spaces have been replaced by underscores (New_York). On the other hand, spaces are used in type bigrams and type trigrams to separate the constituent types from each other.