grep to output only needed capturing group

Say you have some text and some pattern that you want to provide for grep. Everything is easy, until you want to extract only the pattern-matching part, not the whole line that has the match. What to do? You can “grep” and use pipe for further processing, like sed or awk. But did you know that you can grep only capturing group output without using anything else? We will be using “grep -P” (Perl Compatible Regular Expressions – PCRE) for that.

Say you have this data file:

[root@linux ~]# cat data.txt
foo 1 bar
bar 2 foo
foo 3 bar
bar 4 foo
[root@linux ~]#

And you want to grep number only from those lines that begin with “foo” and ends with “bar”:

[root@linux ~]# grep -Po "^foo\s(\d+)\sbar$" data.txt
foo 1 bar
foo 3 bar
[root@linux ~]#

You almost did it! Just… Your wish was only the number, nothing before and nothing after. So your wish is that output would be only your first capturing group. Use lookahead and lookbehind zero-length assertions to achieve it.

Lookahead (?=bar) asserts that what immediately follows the current position in the string is “bar”. Lookbehind (?<=foo) asserts that what immediately precedes the current position in the string is “foo”. You can also use \K which is used to reset the starting point of the match. Essentially, it tells the regex engine to ignore everything matched before \K and only return the part of the match that comes after it.

[root@linux ~]# grep -Po "^foo\s\K(\d+)(?=\sbar$)" data.txt
1
3
[root@linux ~]#

or:

[root@linux ~]# grep -Po "(?<=^foo\s)(\d+)(?=\sbar$)" data.txt
1
3
[root@linux ~]#

The need to grep only capturing group appears often when writing more complex scripts. Using no other additional tools just grep is a perfect way to make some parts of your scripts more simple and efficient.