First try reading this: http://regex101.com/r/iP3lU8ladner wrote:"a" and "an" are ignored as expected but "at" is not ignored. This is what I want but I am curious how it works. Can you please translate the Regex dictionary item "an?" into English for me?
I suspect it does a better job of explaining than I can.
Plus it's colorful!
^ means the match must start at the beginning.
(...) is a group - usually used for capturing the contents but in this case it aids the OR.
the means match 'the'.
| means match either the left side or the right side (the group limits it to 'the' OR 'an?' instead of everything to the left/right of the |.
an? means match 'a' followed by zero or one instances of 'n'.
\b means the previous matches must take place at a word boundary - not really needed because of the next part.
\s+ means all of the previous must be follow by 1 or more white space characters.
http://www.regular-expressions.info/ is another great source.
EDIT: Curiously enough popular Ebook manager Calibre took a stab at listing the articles for other languages: https://github.com/kovidgoyal/calibre/b ... ks.py#L219