regex negate

negate something in regex using negative lookahead

If you wish to negate something in regex, one of the easiest and most reliable ways to do it is to employ negative lookahead assertion. Having a need to split the set of objects into different categories with some “other” / “default” / “the rest” option is one of the most common use cases where this approach can be used.

In “if-else” statement you can have specific condition covered by “if” and “elif”, leaving the “else” for “the rest”. In “case” statement, good practice is to have “*” as a final pattern, which would match everything if it was not matched by previous patterns. But what if you have similar need in the world of regexps?

What is negative lookahead?

A negative lookahead assertion (?!pattern) in regular expressions is a special syntax that allows you to match a pattern only if it is not followed by another specific pattern. So given that you have a pattern to be caught, but with intention to also catch anything else that does not match that pattern, you can achieve it by using negative lookahead.

Here is an example. You want to catch lines which has “status: OK” as one category and all “the rest” as another one. So first one is no brainer, but the second one has to negate “OK” and catch anything else. So negative lookahead would be used in this way:

status: (?!\bOK\b).+
regex negate

\b here stands for word boundary, it limits your results to exact word “OK” to be not matched. If used without it in previous example, “OKOK” would be incorrectly not caught.

.+ is mandatory here because without it, regular expression would only ensure that “OK” is not immediately following “status: “. However, it wouldn’t ensure that there are actually any characters following “status: ” that are not “OK”. In other words, including .+ ensures that the regular expression matches any sequence of characters following “status: ” as long as it doesn’t start with “OK”. Depending on your use cases, this .+ might be changed into \w or anything else suitable for particular situation.

Negate a more complex regex

The use case might become more complex than one static word in negate syntax. You might have not a single word as an initial category, but multiple of them. Or even some patterns, not just static words. Say instead of just OK, you would like to have OK, WARN and ERR being collected separately, and “everything else” as a fourth entity. Approach is nothing much more complex, just use alternation operator | inside negative lookahead, like:

status: (?!\bOK\b|\bWARN\b|\bERR\b).+
regex negate

You can also have some more complex patterns. Say you would like to catch HTTP 2xx status codes and “the rest”:

status: (?!2\d{2})\d{3}
regex negate

This construct works nicely in a places where you are limited just to regular expressions to implement your logic. For instance, this is the same example in a form of Zabbix data collection.

Item for 2xx:

log.count[/var/log/httpd/access_log,"HTTP\/\d\.\d.\s2\d{2}",,10000,skip]

Item for “the rest”:

log.count[/var/log/httpd/access_log,"HTTP\/\d\.\d.\s(?!2\d{2})\d{3}",,10000,skip]

And the result: